3 days ago
Dallas, TX, USA or Austin, TX, USASenior / Staff+
Responsibilities
- Build a fast-moving, high-growth service for enterprise travel and expense.
- Design, implement, and operate cloud infrastructure with a focus on infrastructure as code.
- Identify reliability anti-patterns and improve system visibility and reliability.
- Automate processes to reduce toil and empower users.
- Leverage AI tools to achieve autonomous operations and improve observability.
- Define and drive the adoption of system reliability standards across engineering teams.
- Drive the adoption of AI-assisted developer tools to enhance productivity.
Requirements
- 5+ years of experience as a Senior SRE or DevOps Lead.
- 2+ years in a production, 24x7 product environment.
- Strong problem-solving skills and eagerness to learn new technologies.
- Excellent communication skills for stakeholder collaboration.
- Experience mentoring junior engineers and leading infrastructure projects.
- Hands-on operational experience with Java applications and performance tuning.
- Experience with distributed systems in a public cloud environment, preferably AWS.
- Proficiency in microservice architecture and reliability patterns.
- Experience with Infrastructure as Code using Terraform or similar tools.
- Strong scripting skills in languages like Python or Bash.
- Experience with monitoring systems such as NewRelic or DataDog.
- Hands-on experience with deploying and monitoring AI/ML microservices.
- Ability to integrate AI-specific telemetry for predictive insights.