Training: Process Management Engineer
OpenAI
about 3 hours ago
London, United Kingdom
Mid Level / Senior
Responsibilities
- Design, build, and maintain software for orchestrating machine learning workloads.
- Profile and optimize the software stack for computation orchestration at scale.
- Improve reliability, observability, and fault tolerance for long-running jobs.
- Debug complex issues in distributed systems across large clusters.
- Respond to the evolving needs of machine learning systems.
Requirements
- Experience developing distributed systems.
- Strong software engineering skills in Python and Rust or another systems programming language.
- Solid knowledge of Linux and systems-level debugging.
- Experience with asynchronous and concurrent systems.
- A passion for performance, correctness, and reliability.
Benefits
- Hybrid work model with 3 days in the office per week.
- Relocation assistance for new employees.
Tech Stack
C++LinuxPythonRust
Categories
AI & MLBackendData EngineeringDevOps