OpenAI

Software Engineer, Fleet Hardware Health

OpenAI

Apply
8 months ago
San Francisco, CA, USA
Entry Level / Mid Level

Base Salary

$230k - $490k/yr

Responsibilities

  • Build and maintain automation systems for provisioning and managing server fleets.
  • Develop tools to monitor server health, performance, and lifecycle events.
  • Collaborate with clusters, networking, and infrastructure teams.
  • Partner with external operators to ensure a high level of quality.
  • Identify and fix performance bottlenecks and inefficiencies.
  • Continuously improve automation to reduce manual work.

Requirements

  • Experience managing large-scale server environments.
  • Proficiency in Python, Go, or similar programming languages.
  • Strong knowledge of Linux, networking, and server hardware.
  • Comfort with data analysis using SQL, PromQL, and Pandas.
  • Prior hardware expertise is not required.

Tech Stack

GoGrafanaLinuxPandasPrometheusPythonSQL

Categories

Data EngineeringDevOps