Research Engineer, Pretraining Scaling
Anthropic
5 months ago
San Francisco, CA, USA
Mid Level / Senior
H1B Sponsor
Base Salary
$315k - $560k/yr
Responsibilities
- Own critical aspects of the production pretraining pipeline, including model operations and performance optimization.
- Debug and resolve complex issues across the full stack, from hardware errors to training dynamics.
- Design and run experiments to improve training efficiency and enhance model performance.
- Respond to on-call incidents during model launches, diagnosing problems quickly.
- Build and maintain production logging, monitoring dashboards, and evaluation infrastructure.
- Add new capabilities to the training codebase, such as long context support.
- Collaborate closely with teammates across locations and various teams.
- Document systems, debugging approaches, and lessons learned.
Requirements
- Hands-on experience training large language models or expertise with JAX, TPU, PyTorch, or large-scale distributed systems.
- Enjoy both research and engineering work, ideally with a 50/50 split.
- Excited about being on-call for production systems and solving problems under pressure.
- Thrive on impactful work that may change day-to-day based on production needs.
- Excel at debugging complex problems across multiple layers of the stack.
- Communicate clearly and collaborate effectively across time zones.
- Passionate about refining your craft as a research engineer.
- Care about the societal impacts of AI and responsible scaling.
Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Collaborative office space.
Tech Stack
PyTorch
Categories
AI & MLBackendData Science