about 2 hours ago
San Francisco, CA, USA
Senior
H1B Sponsor
Base Salary
$161k - $284k/yr
Responsibilities
- Build and extend platforms to improve system reliability.
- Work on team goals that encompass reliability for the entire company.
- Standardize reliability tools across multiple platforms and organizations.
- Triage, coordinate, and lead stabilization of sev 0–1 incidents.
- Serve as primary oncall, maintaining structured escalation paths.
- Drive platform-wide reliability improvements and shared operational tooling.
- Use AI-driven systems to improve signal detection and accelerate root cause analysis.
- Design and implement safe deployment patterns.
Requirements
- Drive to root cause systems with many moving parts.
- Demonstrated technical initiative and leadership on previous projects.
- Familiarity with AI-driven tooling for observability and incident analysis.
- Experience running production oncall for high-availability systems.
- Strong incident management skills including structured triage and blameless postmortems.
- Fluency with CI/CD pipelines and rollback automation.
- Monitoring and observability expertise.
- Ability to create and maintain evidence-based maturity assessments.
- Comfort with vendor/dependency management.
- Boundless curiosity and a strong sense of accountability.
- 5+ years of software development experience.
Benefits
- Healthcare coverage including Medical, Vision, and Dental insurance.
- Health Savings Account and Flexible Spending Account.
- Retirement Plans including company match.
- Employee Stock Purchase Program.
- Wellness programs including access to mental health resources.
- Paid parental and caregiving leave.
- Paid time off including 12 paid holidays.
- Learning and Development resources.
- Paid Life insurance, AD&D, and disability benefits.
Tech Stack
Amazon DynamoDBAmbassadorAWSDatadoggRPCIstioJavaKotlinKubernetesMySQLTerraform
Categories
AI & MLDevOps