Anthropic

Senior Software Engineer, AI Reliability Engineering

Anthropic

Apply
21 days ago
Dublin, Ireland
Senior
H1B Sponsor

Responsibilities

  • Develop Service Level Objectives for language model serving and training systems.
  • Design and implement monitoring systems for availability and latency.
  • Assist in creating high-availability language model serving infrastructure.
  • Manage automated failover and recovery systems across multiple regions.
  • Lead incident response for critical AI services.
  • Build and maintain cost optimization systems for AI infrastructure.

Requirements

  • Extensive experience with distributed systems observability and monitoring.
  • Understanding of challenges in operating AI infrastructure.
  • Proven experience with SLO/SLA frameworks for critical services.
  • Comfortable with traditional and AI-specific metrics.
  • Experience with chaos engineering and resilience testing.
  • Ability to bridge gaps between ML engineers and infrastructure teams.
  • Excellent communication skills.

Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • Collaborative office space.

Categories

AI & MLDevOps