Anthropic

Staff/Sr Software Engineer, Compute Capacity

Anthropic

Apply
6 days ago
New York, NY, USA or San Francisco, CA, USA
Senior / Staff+
H1B Sponsor

Base Salary

$405k - $485k/yr

Responsibilities

  • Build and operate data pipelines that ingest accelerator occupancy, utilization, and cost data from multiple cloud providers into BigQuery.
  • Develop and maintain observability infrastructure, including Prometheus recording rules and Grafana dashboards.
  • Instrument and analyze compute efficiency metrics across training, inference, and eval workloads.
  • Build internal tooling and platforms for capacity planning and workload attribution.
  • Operate Kubernetes-native systems at scale, managing workload labeling infrastructure.
  • Normalize and reconcile data across heterogeneous sources, including AWS, GCP, and Azure.
  • Collaborate across organizational boundaries with various teams to gather requirements and communicate trade-offs.

Requirements

  • 5+ years of software engineering experience with a strong track record in production systems.
  • Kubernetes fluency at operational depth, with experience in scheduling and debugging cluster-level issues.
  • Experience in designing and building production data pipelines, preferably with BigQuery.
  • Familiarity with observability tooling such as Prometheus and Grafana.
  • Proficiency in Python and SQL at production quality.
  • Familiarity with at least one major cloud provider (AWS, GCP, or Azure) at the infrastructure level.
  • Strong cross-team communication skills and ability to navigate ambiguity.

Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • A collaborative office space.

Tech Stack

AWSAzureClickHouseGoogle BigQueryGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonRustSQLTerraform

Categories

AI & MLData EngineeringDevOps