OpenAI

Software Engineer, Infrastructure Reliability

OpenAI

Apply
23 days ago
San Francisco, CA, USA
Mid Level / Senior

Base Salary

$255k - $385k/yr

Responsibilities

  • Design, build, and operate reliable and performant systems used across engineering.
  • Identify and fix performance bottlenecks and inefficiencies.
  • Resolve complex issues through deep investigation.
  • Continuously improve automation to reduce manual work.
  • Contribute to incident response and postmortems.
  • Develop best practices around system reliability and scalability.

Requirements

  • 4+ years of relevant industry experience.
  • 2+ years leading large scale, complex projects or teams.
  • Deep understanding of distributed systems principles.
  • Experience with orchestration systems like Kubernetes.
  • Proficiency in cloud infrastructure and IaC tools like Terraform.
  • Strong problem-solving skills in fast-paced environments.
  • Knowledge of security best practices in cloud environments.

Tech Stack

AWSAzureDatadogGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheusSplunkTerraform

Categories

AI & MLData EngineeringDevOpsSecurity