GrepJob
Allen Control Systems

Platform Engineer

Allen Control Systems
Apply
2 months ago
Austin, TX, USAMid Level / Senior

Responsibilities

  • Deploy and operate Kubernetes clusters on bare-metal infrastructure with NVIDIA GPUs.
  • Manage NVIDIA GPU clusters for machine learning training.
  • Own the full CI/CD pipeline from source to deployment.
  • Build and maintain the observability stack for real-time system performance monitoring.
  • Define and enforce infrastructure-as-code practices using tools like Terraform and Ansible.
  • Manage network configuration, storage provisioning, and security hardening.

Requirements

  • Proficiency in Python programming and Bash scripting.
  • 2+ years of experience in platform engineering or DevOps with Kubernetes.
  • Deep expertise in bare-metal Kubernetes administration.
  • Hands-on experience with NVIDIA GPU infrastructure and ML orchestration tools.
  • Strong CI/CD experience with build automation and pipeline tooling.
  • Proficiency with observability tooling for log aggregation and metrics.
  • Experience building C++ and Python toolchains on Linux.

Benefits

  • Competitive salary.
  • Health, Dental, Vision Insurance.
  • Paid Time Off.

Tech Stack

AnsibleAWSBashC++CMakeGitHub ActionsGitLab CI/CDHelmJenkinsKubernetesLinuxPythonTerraform

Categories

AI & MLData EngineeringDevOpsSecurity