3 days ago
Base Salary
$130k - $185k/yr
Responsibilities
- Own and resolve escalated cloud incidents end-to-end, including impact analysis and debugging.
- Collaborate with development, security, and operations to design and implement code/configuration fixes.
- Monitor system health, performance, and security via PagerDuty and enhance alerting to meet SLOs.
- Build diagnostic tools, dashboards, and documentation for effective incident resolution.
- Lead production service ownership by deploying critical fixes and responding to high-pressure events.
Requirements
- 5+ years of experience in troubleshooting, debugging, and root-cause analysis for high-priority incidents.
- Strong hands-on skills in Python, Bash, Java, and cloud platforms (GCP, AWS, Azure).
- Ability to write complex MySQL queries and generate business reports.
- Experience with authentication protocols such as SAML and OAuth.
- Solid networking fundamentals and proficiency in monitoring tools like Grafana.
Benefits
- Various health plans.
- Time off plans for vacation and sick time.
- Parental leave options.
- Retirement options.
- Education reimbursement.
- In-office perks, and more.
