5 days ago
Bellevue, WA, USA +2 moreSenior
Base Salary
$165k - $242k/yr
Responsibilities
- Develop and maintain tools for establishing systems performance baselines.
- Develop and maintain performance regression analysis testing automation.
- Design and maintain performance regression test pipelines for HPC workloads.
- Debug and tune fabric-level performance for low-latency, high-throughput configurations.
- Develop telemetry for performance analysis across distributed clusters.
- Triage and fix performance issues in Linux.
- Collect data and produce metrics and visualizations for performance communication.
- Define Linux and OS requirements and specifications in collaboration with teams.
Requirements
- 5+ years of professional experience in Systems/HPC Performance Engineering.
- Strong experience with MPI workloads and distributed system performance analysis.
- Familiarity with RoCE, InfiniBand, and GPUDirect/Data Direct I/O.
- Hands-on use of public HPC benchmarks.
- Extensive experience in Linux internals.
- Fluency in a programming language geared toward automation, preferably Python.
- Experience writing robust, testable code.
- Experience diagnosing and fixing systems performance issues.
- Experience with implementing automation testing.
- Strong analytical and problem-solving abilities.
Benefits
- 100% paid medical, dental, and vision insurance.
- Company-paid life insurance.
- Short and long-term disability insurance.
- Flexible Spending Account and Health Savings Account.
- Tuition reimbursement.
- Employee Stock Purchase Program participation.
- Mental wellness benefits.
- Paid parental leave.
- Flexible childcare support.
- 401(k) with generous employer match.
- Flexible PTO and catered lunch each day.
