Cerebras Systems

Deployment Engineer, AI Inference

Cerebras Systems

Apply
5 months ago
Sunnyvale, CA, USA or Toronto, Canada
Mid Level / Senior
H1B Sponsor

Responsibilities

  • Deploy AI inference replicas and cluster software across multiple datacenters.
  • Operate across heterogeneous datacenter environments undergoing rapid growth.
  • Maximize capacity allocation and optimize replica placement using constraint-solver algorithms.
  • Operate bare-metal inference infrastructure while supporting transition to K8S-based platform.
  • Develop and extend telemetry, observability and alerting solutions to ensure deployment reliability at scale.
  • Develop and extend a fully automated deployment pipeline to support fast software updates and capacity reallocation at scale.
  • Translate technical and customer needs into actionable requirements for the Dev Infra, Cluster, Platform and Core teams.
  • Stay up to date with the latest advancements in AI compute infrastructure and related technologies.

Requirements

  • 2-5 years of experience in operating on-prem compute infrastructure or developing and managing complex AWS plane infrastructure for hybrid deployments.
  • Strong proficiency in Python for automation, orchestration, and deployment tooling.
  • Solid understanding of Linux-based systems and command-line tools.
  • Extensive knowledge of Docker containers and container orchestration platforms like K8S.
  • Familiarity with spine-leaf (Clos) networking architecture.
  • Proficiency with telemetry and observability stacks such as Prometheus, InfluxDB and Grafana.
  • Strong ownership mindset and accountability for complex deployments.
  • Ability to work effectively in a fast-paced environment.

Benefits

  • Opportunity to build a breakthrough AI platform beyond the constraints of the GPU.
  • Ability to publish and open source cutting-edge AI research.
  • Work on one of the fastest AI supercomputers in the world.
  • Enjoy job stability with startup vitality.
  • Experience a simple, non-corporate work culture that respects individual beliefs.

Tech Stack

AWSDockerGrafanaInfluxDBKubernetesLinuxPrometheusPython

Categories

AI & MLDevOps