about 3 hours ago
Remote, United States
Senior / Staff+
H1B Sponsor
Base Salary
$155k - $208k/yr
Responsibilities
- Design and build automated reliability and self-healing systems for production.
- Own and improve incident management tooling and reduce alert noise.
- Develop observability infrastructure for real-time visibility into system health.
- Contribute to AI-driven operational tooling for autonomous remediation.
- Drive incident prevention by identifying systemic patterns and eliminating operational toil.
- Partner with product engineering teams to diagnose reliability gaps.
- Define and champion operational excellence best practices across engineering.
- Embed Samsara’s cultural principles within the team.
Requirements
- 8+ years of experience in software engineering.
- Bachelor's Degree in Computer Science/Engineering or equivalent experience.
- 3+ years in infrastructure or platform engineering teams.
- Expertise in observability, operational metrics, and data analysis.
- Proven track record in architecting monitoring frameworks and automated response workflows.
- Experience with large-scale enterprise software applications.
- Familiarity with cloud platforms like AWS or GCP.
- Experience in implementing AI-driven automation across the SDLC.
- Proficient in writing high-quality code in Go, Python, or equivalent.
- Experience mentoring engineers and role modeling engineering practices.
- Proactive growth mindset focused on improving the status quo.
Benefits
- Flexible, employee-led remote working model.
- Professional development stipend.
- Comprehensive health and parental leave plans.
- Above-market total compensation including base salary, performance-based bonuses, and equity.
Tech Stack
AWSDatadogGoGoogle Cloud PlatformGrafanaPythonTerraform
Categories
AI & MLDevOps