GrepJob
CoreWeave

Staff Software Engineer, Applied Training

CoreWeave
Apply
about 7 hours ago
Bellevue, WA, USA or Sunnyvale, CA, USAStaff+
H1B Sponsor

Base Salary

$207k - $275k/yr

Responsibilities

  • Contribute to the roadmap for Applied Training to identify essential workloads.
  • Collaborate with customers and internal teams to build cloud-native primitives.
  • Design and build a complete research cluster experience addressing researchers' challenges.
  • Own the Python SDK for sandbox infrastructure, enabling large-scale RL training runs.
  • Write documentation for popular OSS training frameworks to assist customers.
  • Engage directly with infrastructure teams and customers to enhance system design.

Requirements

  • 8–12+ years of experience in building distributed systems or ML infrastructure.
  • Proven experience with Kubernetes, including custom controllers and workload orchestration.
  • Understanding of researcher productivity and the importance of efficient workflows.
  • Familiarity with distributed job scheduling and large-scale training challenges.
  • Experience shipping production systems relied upon by users.
  • Strong communication skills to translate customer needs into system designs.

Benefits

  • 100% paid medical, dental, and vision insurance.
  • Company-paid life insurance and voluntary supplemental options.
  • Short and long-term disability insurance.
  • Flexible Spending Account and Health Savings Account.
  • Tuition reimbursement and participation in Employee Stock Purchase Program.
  • Mental wellness benefits and family-forming support.
  • Paid parental leave and flexible childcare support.
  • 401(k) with generous employer match and flexible PTO.
  • Catered lunch in office locations and a casual work environment.

Categories

AI & MLData EngineeringDevOps