OpenAI

Software Engineer, Data Infrastructure - Research

OpenAI

Apply
5 months ago
San Francisco, CA, USA
Mid Level / Senior

Base Salary

$250k - $380k/yr

Responsibilities

  • Design and maintain standardized dataset APIs for multimodal data.
  • Build proactive testing and scale validation pipelines for dataset loading.
  • Collaborate with teammates to integrate datasets into training and inference pipelines.
  • Document and maintain dataset interfaces for discoverability and consistency.
  • Establish safeguards to ensure datasets remain reproducible and unchanged.
  • Debug and resolve performance bottlenecks in distributed dataset loading.
  • Provide visualization tools to surface errors and bottlenecks in datasets.

Requirements

  • Strong engineering fundamentals with experience in distributed systems or data pipelines.
  • Experience building APIs, modular code, and scalable abstractions.
  • Comfortable debugging bottlenecks across large fleets of machines.
  • Pride in building reliable infrastructure that 'just works'.
  • Collaborative and humble, with excitement for foundational parts of the ML stack.
  • Bonus: Background knowledge in data math, probability, or distributed data theory.
  • Bonus: Experience with GPU-scale distributed systems or dataset scaling for real-time data.

Categories

AI & MLData Engineering