about 1 hour ago
Base Salary
$215k - $280k/yr
Responsibilities
- Own production health, reliability, and operational support processes across critical systems and services.
- Lead incident response efforts, stakeholder communication, root cause analysis, and post-incident reviews.
- Identify patterns in production issues and drive improvements to reduce recurring incidents and operational overhead.
- Design and implement AI-driven agents and workflows that automate support and operational tasks.
- Partner with engineering, product, and AI orchestration teams to improve system resilience and operational efficiency.
- Build and maintain operational runbooks, documentation, and knowledge base content for both human and AI-assisted workflows.
- Support observability, monitoring, and troubleshooting efforts across cloud-based production environments.
- Participate in on-call rotations and continuously improve operational readiness and response processes.
Requirements
- 6–8 years of experience in production operations, site reliability engineering, technical support engineering, or similar operational roles.
- Strong background in incident management, root cause analysis, and production system troubleshooting.
- Experience working within modern SDLC, DevOps, and change management environments.
- Familiarity with operational tooling such as Jira, Confluence, and observability/monitoring platforms.
- Strong analytical and problem-solving skills with the ability to identify trends and drive operational improvements.
- Comfortable working cross-functionally with engineering, product, operations, and leadership teams.
- Strong communication skills and ability to operate effectively in fast-moving technical environments.
- Bachelor’s degree in Computer Science, Engineering, or equivalent relevant experience.
Benefits
- Wide variety of health, wellness, and other benefits including medical, dental, vision, and life insurance.
- A one-time payment of $2K for home office equipment and furniture.
- Four weeks of PTO in the first year and twelve weeks of fully paid parental leave.
- Up to $5000 each year for professional learning and career development.
- Remote-first work environment with core meeting hours from 9AM - 2PM Pacific time.
