Software Engineer, AI Reliability

5 months ago

Seattle, WA, USA +2 moreMid Level / Senior

H1B Sponsor

Base Salary

$325k - $485k/yr

Responsibilities

Develop appropriate Service Level Objectives for large language model serving systems.
Design and implement monitoring and observability systems across the token path.
Assist in the design and implementation of high-availability serving infrastructure.
Lead incident response for critical AI services, ensuring rapid recovery.
Support the reliability of safeguard model serving.

Requirements

Strong background in distributed systems, infrastructure, or reliability.
Curiosity and bravery to jump into unfamiliar systems during incidents.
Holistic thinking about system composition and boundaries.
Ability to build lasting relationships across teams.
Ownership over outcomes, even for systems not directly owned.
Excellent communication and collaboration skills.

Benefits

Competitive compensation and benefits.
Optional equity donation matching.
Generous vacation and parental leave.
Flexible working hours.
Collaborative office space.

Categories