about 6 hours ago
Responsibilities
- Design and develop tooling, automation, and infrastructure services.
- Implement and operate cloud infrastructure with a focus on infrastructure as code.
- Identify and resolve reliability anti-patterns through data analysis.
- Automate processes to reduce manual toil and improve efficiency.
- Leverage AI tools to enhance system observability and operations.
- Define and promote system reliability standards across engineering teams.
- Mentor junior engineers and lead cross-functional infrastructure projects.
Requirements
- 5+ years of experience as a Senior SRE or DevOps Lead.
- 2+ years in a production, 24x7 product environment.
- Strong problem-solving skills and eagerness to learn new technologies.
- Excellent communication skills for stakeholder collaboration.
- Experience mentoring junior engineers and leading projects.
- Hands-on experience with Java applications and performance tuning.
- Experience with distributed systems in a public cloud environment, preferably AWS.
- Proficiency in CI/CD tools and automation using Maven and Jenkins.
- Familiarity with microservice architecture and reliability patterns.
- Experience with Infrastructure as Code tools like Terraform or CloudFormation.
- Scripting skills in languages such as Python or Bash.
- Experience with monitoring systems like NewRelic or DataDog.
- Hands-on experience with deploying and monitoring AI/ML microservices.
- Familiarity with AI/LLM platforms and managing infrastructure secrets.