GrepJob
OpenAI

Software Engineer, Workload Enablement

OpenAI
Apply
4 days ago
San Francisco, CA, USA or Seattle, WA, USA
Mid Level / Senior

Base Salary

$293k - $455k/yr

Responsibilities

  • Port and validate key inference and training workloads on new platforms.
  • Build benchmarks and stress tests to capture end-to-end behavior of workloads.
  • Deep-dive into performance on distributed training/inference.
  • Create repeatable test harnesses for CI/lab environments.
  • Collaborate with systems engineers to ensure platform stability and performance.
  • Produce clear bug reports and prioritized issue lists for stakeholders.

Requirements

  • BS in CS/EE or equivalent practical experience.
  • 5+ years in ML systems, performance engineering, distributed systems, or HPC.
  • Strong hands-on experience with PyTorch and modern LLM training/inference stacks.
  • Experience with large-scale distributed training concepts.
  • Proficiency in Python and comfort with performance-critical code (C++/CUDA/HIP is a plus).
  • Strong profiling/debugging skills using tools like Nsight and perf.

Tech Stack

C++KubernetesPythonPyTorch

Categories

AI & MLData ScienceDevOpsTesting