
ML Infrastructure Engineer
Maven Roboticsabout 3 hours ago
Responsibilities
- Own the architecture, implementation, reliability, and evolution of machine learning infrastructure.
- Build backend services for managing data, artifacts, jobs, logs, metadata, and compute resources.
- Design scalable systems for workload orchestration, storage, observability, security, and automation.
- Create intuitive internal tools that simplify complex infrastructure for engineers.
- Lead discussions with cloud and ML compute providers regarding capacity planning and performance.
Requirements
- Significant experience designing and operating production backend or compute infrastructure.
- Proven track record of managing complex infrastructure projects from architecture to deployment.
- Strong programming skills in Python, Go, Rust, C++, or similar languages.
- Experience with GPU compute infrastructure and orchestrating workloads using Kubernetes or similar systems.
- Familiarity with storage systems, observability platforms, and infrastructure-as-code.
- Experience managing large-scale GPU fleets or hybrid cloud environments.
- Ability to build internal developer platforms and self-service infrastructure tools.
- Strong technical judgment and communication skills to drive decisions across teams.
- Self-starter attitude with the ability to prioritize and deliver solutions in a startup environment.