Software Engineer, Data Infrastructure - Research
OpenAI
5 months ago
San Francisco, CA, USA
Mid Level / Senior
Base Salary
$250k - $380k/yr
Responsibilities
- Design and maintain standardized dataset APIs for multimodal data.
- Build proactive testing and scale validation pipelines for dataset loading.
- Collaborate with teammates to integrate datasets into training and inference pipelines.
- Document and maintain dataset interfaces for discoverability and consistency.
- Establish safeguards to ensure datasets remain reproducible and unchanged.
- Debug and resolve performance bottlenecks in distributed dataset loading.
- Provide visualization tools to surface errors and bottlenecks in datasets.
Requirements
- Strong engineering fundamentals with experience in distributed systems or data pipelines.
- Experience building APIs, modular code, and scalable abstractions.
- Comfortable debugging bottlenecks across large fleets of machines.
- Pride in building reliable infrastructure that 'just works'.
- Collaborative and humble, with excitement for foundational parts of the ML stack.
- Bonus: Background knowledge in data math, probability, or distributed data theory.
- Bonus: Experience with GPU-scale distributed systems or dataset scaling for real-time data.
Categories
AI & MLData Engineering