GrepJob
CoreWeave

HPC Performance Engineer

CoreWeave
Apply
5 days ago
Bellevue, WA, USA +2 moreSenior

Base Salary

$165k - $242k/yr

Responsibilities

  • Develop and maintain tools for establishing systems performance baselines.
  • Develop and maintain performance regression analysis testing automation.
  • Design and maintain performance regression test pipelines for HPC workloads.
  • Debug and tune fabric-level performance for low-latency, high-throughput configurations.
  • Develop telemetry for performance analysis across distributed clusters.
  • Triage and fix performance issues in Linux.
  • Collect data and produce metrics and visualizations for performance communication.
  • Define Linux and OS requirements and specifications in collaboration with teams.

Requirements

  • 5+ years of professional experience in Systems/HPC Performance Engineering.
  • Strong experience with MPI workloads and distributed system performance analysis.
  • Familiarity with RoCE, InfiniBand, and GPUDirect/Data Direct I/O.
  • Hands-on use of public HPC benchmarks.
  • Extensive experience in Linux internals.
  • Fluency in a programming language geared toward automation, preferably Python.
  • Experience writing robust, testable code.
  • Experience diagnosing and fixing systems performance issues.
  • Experience with implementing automation testing.
  • Strong analytical and problem-solving abilities.

Benefits

  • 100% paid medical, dental, and vision insurance.
  • Company-paid life insurance.
  • Short and long-term disability insurance.
  • Flexible Spending Account and Health Savings Account.
  • Tuition reimbursement.
  • Employee Stock Purchase Program participation.
  • Mental wellness benefits.
  • Paid parental leave.
  • Flexible childcare support.
  • 401(k) with generous employer match.
  • Flexible PTO and catered lunch each day.

Tech Stack

BashCDockerGoGrafanaKubernetesLinuxPrometheusPython

Categories

AI & MLBackendData EngineeringDevOps