GrepJob
OpenAI

Software Engineer, Compute Infrastructure

OpenAI
Apply
about 4 hours ago
London, United Kingdom +3 more
Mid Level / Senior

Base Salary

$230k - $405k/yr

Responsibilities

  • Spin up and scale large Kubernetes clusters with automation for provisioning and lifecycle management.
  • Build software abstractions to unify multiple clusters for seamless training workload interfaces.
  • Own node bring-up from bare metal through firmware upgrades for fast, repeatable deployment.
  • Improve operational metrics like reducing cluster restart times and accelerating upgrade cycles.
  • Integrate networking and hardware health systems for end-to-end reliability.
  • Develop monitoring and observability systems to maintain cluster stability under load.

Requirements

  • Experience as an infrastructure, systems, or distributed systems engineer in large-scale environments.
  • Strong knowledge of Kubernetes internals and cluster scaling patterns.
  • Proficiency in compute infrastructure concepts including networking, storage, and security.
  • Experience in automating cluster or data center operations.
  • Bonus: background with GPU workloads, firmware management, or high-performance computing.

Tech Stack

Kubernetes

Categories

AI & MLData EngineeringDevOps