GrepJob
Anthropic

Research Engineer, RL Infrastructure and Reliability (Knowledge Work)

Anthropic
Apply
about 2 hours ago
San Francisco, CA, USA
Mid Level / Senior
H1B Sponsor

Base Salary

$350k - $850k/yr

Responsibilities

  • Serve as the dedicated reliability owner for the Knowledge Work training environments.
  • Own a clean, canonical set of evaluation tools and processes for Knowledge Work capabilities.
  • Build and automate observability, dashboards, and operational tooling for training environments.
  • Proactively harden environments and evaluation systems through load testing and fault injection.
  • Act as the primary point of contact for partner training and infrastructure teams.
  • Reduce the operational burden on researchers to allow focus on research.

Requirements

  • Highly experienced Python engineer who ships reliable, well-instrumented code.
  • Demonstrated experience operating ML or distributed systems at scale.
  • Strong SRE or production-engineering mindset focused on SLOs and load tests.
  • Foundational ML knowledge to understand training environments and evaluation metrics.
  • Ability to read research code and reason about evaluation integrity.

Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • Collaborative office space.

Tech Stack

Python

Categories

AI & MLData EngineeringDevOps