Senior AI Infrastructure Engineer

3 months ago

Mountain View, CA, USASenior

H1B Sponsor

Base Salary

$180k - $240k/yr

Responsibilities

Design and build high-performance AI infrastructure for autonomous driving models.
Enable distributed training of complex models across multi-node setups.
Optimize multi-GPU setups for efficient model and data parallelism.
Implement intelligent resource scheduling for hardware utilization.
Deploy and scale optimized model artifacts for inference performance.
Architect self-healing AI infrastructure for automated hardware monitoring.
Develop agent-driven automation for infrastructure and data tasks.
Automate the end-to-end model lifecycle using ML infrastructure tools.
Collaborate with data teams to scale ETL pipelines for dataset management.
Define and track key ML system metrics for performance monitoring.

5+ years of experience in ML infrastructure, MLOps, or DevOps.
Deep understanding of multi-GPU training strategies and high-performance networking.
Mastery of Kubernetes, Terraform, and Helm for infrastructure automation.
Experience with AI agent frameworks for infrastructure automation.
Expertise in MLFlow, Argo Workflows, and Kubernetes.
Strong experience with Docker and containerization technologies.
Proficiency in Apache Airflow, Kafka, Spark, and GitOps automation.
Core programming skills in Python and Bash; experience with Go or Rust is a plus.