about 2 hours ago
Remote, United StatesSenior / Staff+
H1B Sponsor
Base Salary
$217k - $303k/yr
Responsibilities
- Design and build large-scale offline ML experimentation platforms.
- Develop production-grade training orchestration frameworks for distributed training.
- Build infrastructure for experiment tracking, metadata management, and model registries.
- Partner with ML engineers to improve experimentation velocity and operational efficiency.
- Build automated workflows for model promotion and continuous evaluation.
- Design an agentic AI execution platform for autonomous workflows.
Requirements
- 5+ years in infrastructure/platform engineering or large-scale distributed systems.
- 2+ years of hands-on experience with production ML infrastructure.
- Experience building workflow orchestration systems or large-scale automation frameworks.
- Familiarity with distributed data processing systems like Spark or Flink.
- Experience with orchestration technologies such as Kubeflow or Airflow.
- Experience with offline ML experimentation platforms and model registries.
- Experience with agentic AI systems is a strong plus.
Benefits
- Flexible work arrangements with options for remote work.
- Medical, dental, and vision insurance.
- 401(k) program with employer match.
- Generous time off for vacation and parental leave.
Tech Stack
Categories
AI & MLData Engineering
