5 days ago
Sunnyvale, CA, USA or New York, NY, USASenior
Base Salary
$139k - $220k/yr
Responsibilities
- Design, build, and maintain observability infrastructure.
- Develop reliable and scalable systems for metrics, logging, tracing, and telemetry.
- Collaborate with internal teams to implement observability best practices.
- Address performance and reliability challenges across GPU clusters.
- Contribute to platform strategy and participate in on-call rotations.
Requirements
- 5+ years of experience in software or infrastructure engineering.
- Proficient in Go or Python with experience in production code.
- Hands-on experience with Kubernetes and microservices architectures.
- Ability to design and deliver scalable, robust systems.
- Skilled in breaking down complex problems in distributed architectures.
- Familiar with Helm and YAML for service management.
- Experience in on-call rotations for critical production systems.
- Bachelor’s degree in Computer Science, Electrical Engineering, Mathematics, or related field.
Benefits
- 100% paid medical, dental, and vision insurance.
- Company-paid life insurance and short/long-term disability insurance.
- Flexible Spending Account and Health Savings Account.
- Tuition reimbursement and employee stock purchase program.
- Mental wellness benefits and family-forming support.
- Paid parental leave and flexible childcare support.
- 401(k) with generous employer match and flexible PTO.
- Catered lunch in office locations and a casual work environment.
