
Staff Machine Learning Systems Engineer (MLOps)
Hims & Hersabout 5 hours ago
Base Salary
$210k - $250k/yr
Responsibilities
- Own and evolve the containerized application deployment platform for AI workloads.
- Build and maintain GitOps-based deployment pipelines for safe AI service shipping.
- Design ephemeral environments and nightly release pipelines for AI changes validation.
- Operate and scale inference infrastructure and multi-provider LLM AI gateways.
- Own the observability and tracing stack for AI behavior auditing and debugging.
- Define SLOs, alerting, and incident response for AI infrastructure reliability.
- Improve CI/CD pipelines and developer tooling for AI workloads.
- Build IAM and secrets management as first-class infrastructure for security.
- Drive multi-quarter infrastructure initiatives and mentor engineers on best practices.
Requirements
- 8+ years of experience in infrastructure, platform, DevOps, or SRE engineering.
- 3+ years focused on ML/AI systems in production.
- Deep experience with Kubernetes and the cloud-native ecosystem.
- Strong infrastructure-as-code skills, particularly with Terraform.
- Proficiency in Python for building production infrastructure tooling.
- 2+ years of experience operating LLM-based systems in production.
- Experience with observability/tracing stacks like Datadog and OpenTelemetry.
- Experience designing and maintaining CI/CD pipelines for engineering teams.
- Strong collaboration skills across various teams in the organization.
- Appreciation for safety, privacy, and security in regulated domains.
Benefits
- Competitive salary & equity compensation for full-time roles.
- Unlimited PTO, company holidays, and quarterly mental health days.
- Comprehensive health benefits including medical, dental & vision.
- Employee Stock Purchase Program (ESPP).
- 401k benefits with employer matching contribution.
- Offsite team retreats.
Tech Stack
Apache SparkAWSBazelClickHouseDatabricksDatadogDockerGoogle Cloud PlatformHelmKubernetesMLflowPythonTerraform