GrepJob
Reflection

Member of Technical Staff - Compute Platform

Reflection
Apply
4 months ago
London, United Kingdom +2 moreMid Level / Senior
H1B Sponsor

Responsibilities

  • Build and maintain tools for automatic remediation and capacity planning.
  • Design and iterate on cluster management stacks for multi-GPU workloads.
  • Implement comprehensive monitoring for cluster durability and performance.
  • Prepare infrastructure for next-generation GPU deployments and larger clusters.
  • Own multi-cloud storage and petabyte-scale data replication.

Requirements

  • Systems-level engineering experience focused on cluster maintenance.
  • Strong coding ability with a focus on systems or GPU infrastructure.
  • Deep knowledge of GPU hardware and familiarity with NCCL.
  • Experience with K8s-first architecture.
  • Expertise in managing high-performance cloud storage across multiple data centers.

Benefits

  • Top-tier compensation including salary and equity.
  • Comprehensive medical, dental, vision, life, and disability insurance.
  • Fully paid parental leave for all new parents.
  • Paid time off and relocation support.
  • Daily provided meals and opportunities for team connection.

Tech Stack

Categories

AI & MLData EngineeringDevOps