GrepJob
OpenAI

Training: Process Management Engineer

OpenAI

Apply
about 3 hours ago
London, United Kingdom
Mid Level / Senior

Responsibilities

  • Design, build, and maintain software for orchestrating machine learning workloads.
  • Profile and optimize the software stack for computation orchestration at scale.
  • Improve reliability, observability, and fault tolerance for long-running jobs.
  • Debug complex issues in distributed systems across large clusters.
  • Respond to the evolving needs of machine learning systems.

Requirements

  • Experience developing distributed systems.
  • Strong software engineering skills in Python and Rust or another systems programming language.
  • Solid knowledge of Linux and systems-level debugging.
  • Experience with asynchronous and concurrent systems.
  • A passion for performance, correctness, and reliability.

Benefits

  • Hybrid work model with 3 days in the office per week.
  • Relocation assistance for new employees.

Tech Stack

C++LinuxPythonRust

Categories

AI & MLBackendData EngineeringDevOps