1 day ago
Responsibilities
- Own the technical direction for the MLOps platform.
- Serve as the primary technical point of contact for cross-functional teams.
- Design and operate the dataset layer end-to-end.
- Ensure every trained model can be traced back to its data and code.
- Build and operate a first-class model registry.
- Define the promotion path for models from training to deployment.
- Define offline benchmarks and policy-gating harnesses for model evaluation.
- Own the path from registered model to running inference on Apollo.
- Mentor mid-level and senior engineers through code and design reviews.
- Influence research workflows to standardize on platform primitives.
Requirements
- Deep proficiency in Python and at least one systems-level language (Go, Rust, or C++).
- Proven experience owning and delivering an MLOps platform end-to-end.
- Expertise across the model lifecycle including dataset versioning and experiment tracking.
- Strong background designing service-oriented systems on Kubernetes.
- Experience defining evaluation frameworks for ML models in high-stakes environments.
- Experience leading technical projects from architecture to implementation.
- Demonstrated ability to lead by influence across teams.
- Proficiency with cloud infrastructure (AWS, GCP, or Azure), Docker, Git, and CI/CD workflows.
- Master's degree in Computer Science, Machine Learning, or a related field preferred.
- 8+ years of professional software engineering experience in ML platforms or related infrastructure.
