GrepJob
Allen Control Systems

CV/ML Platform Engineer

Allen Control Systems
Apply
about 2 months ago
Austin, TX, USAMid Level / Senior

Responsibilities

  • Deploy and operate Kubernetes clusters on bare-metal infrastructure with NVIDIA GPUs.
  • Manage NVIDIA GPU clusters for ML training.
  • Own the ACS CV/ML CI/CD pipeline.
  • Improve and maintain core ML infrastructure, including model registration and versioning.
  • Enhance ML model testing, performance analysis, and reporting tools.
  • Automate repetitive model training and testing tasks.
  • Coordinate with Software Team Platform Engineers to minimize duplication in infrastructure.
  • Collaborate with the Software Team to optimize models for deployment on edge hardware.

Requirements

  • 2+ years of experience in Platform Engineering or DevOps/MLOps.
  • Strong programming skills for automating ML lifecycles and building CLI tools.
  • Hands-on experience with NVIDIA GPU infrastructure and CUDA libraries.
  • Experience implementing and maintaining MLOps platforms like Kubeflow or MLflow.
  • Familiarity with high-performance storage solutions and data orchestration tools.
  • Proven track record in building CI/CD pipelines for model validation and performance benchmarking.
  • Experience with model optimization toolchains for ARM targets like NVIDIA Jetson.
  • Proficiency with observability stacks adapted for ML.
  • Strong Linux systems knowledge, including networking and security hardening.

Benefits

  • Competitive salary.
  • Health, Dental, Vision Insurance.
  • Paid Time Off.

Tech Stack

AWSDVCGrafanaKubernetesMLflowPrometheus

Categories

AI & MLData EngineeringDevOps