Anthropic

Research Engineer, Pretraining Scaling

Anthropic

Apply
5 months ago
San Francisco, CA, USA
Mid Level / Senior
H1B Sponsor

Base Salary

$315k - $560k/yr

Responsibilities

  • Own critical aspects of the production pretraining pipeline, including model operations and performance optimization.
  • Debug and resolve complex issues across the full stack, from hardware errors to training dynamics.
  • Design and run experiments to improve training efficiency and enhance model performance.
  • Respond to on-call incidents during model launches, diagnosing problems quickly.
  • Build and maintain production logging, monitoring dashboards, and evaluation infrastructure.
  • Add new capabilities to the training codebase, such as long context support.
  • Collaborate closely with teammates across locations and various teams.
  • Document systems, debugging approaches, and lessons learned.

Requirements

  • Hands-on experience training large language models or expertise with JAX, TPU, PyTorch, or large-scale distributed systems.
  • Enjoy both research and engineering work, ideally with a 50/50 split.
  • Excited about being on-call for production systems and solving problems under pressure.
  • Thrive on impactful work that may change day-to-day based on production needs.
  • Excel at debugging complex problems across multiple layers of the stack.
  • Communicate clearly and collaborate effectively across time zones.
  • Passionate about refining your craft as a research engineer.
  • Care about the societal impacts of AI and responsible scaling.

Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • Collaborative office space.

Tech Stack

PyTorch

Categories

AI & MLBackendData Science