GrepJob
Chime

Software Engineer, Machine Learning Platform

Chime
Apply
about 2 hours ago

Base Salary

$187k - $259k/yr

Responsibilities

  • Design, build, and operate scalable ML infrastructure on AWS.
  • Develop distributed training and batch processing systems using Ray.
  • Build and maintain infrastructure-as-code using Terraform.
  • Support and evolve the feature store and feature pipelines.
  • Develop data ingestion and streaming systems using technologies like Kinesis and Kafka.
  • Improve CI/CD workflows for ML models and platform components.
  • Enhance observability, reliability, and cost visibility across ML workloads.
  • Collaborate with Data Science and ML Engineering teams to improve developer experience.
  • Contribute to platform architecture decisions and technical roadmaps.
  • Participate in on-call rotations to support production systems.

Requirements

  • 5+ years of experience in ML infrastructure, platform engineering, or production ML systems.
  • Knowledge of the machine learning model development lifecycle.
  • Experience with distributed systems, cloud computing, or large-scale data processing.
  • Strong foundation in computer science and software engineering principles.
  • Hands-on experience with CI/CD pipelines and DevOps practices.
  • Experience with containerization technologies such as Docker and Kubernetes.
  • Knowledge of cloud platforms like AWS and distributed computing frameworks.
  • Experience with GPU programming and optimization.
  • Strong programming skills in Python, Go, Scala, Java, or similar languages.
  • Familiarity with infrastructure-as-code tools like Terraform.

Benefits

  • In-office work policy with four days a week in the office and Fridays from home.
  • In-office perks including backup child, elder, and pet care.
  • Competitive salary based on experience.
  • 401k match plus comprehensive medical, dental, vision, life, and disability benefits.
  • Generous vacation policy and company-wide paid days off.
  • 1% of your time off to support local community organizations.
  • Annual wellness stipend for eligible wellness-related expenses.
  • Up to 24 weeks of paid parental leave for birthing parents.
  • Access to family planning tools with significant reimbursement options.
  • Opportunities for in-person and virtual team events.

Tech Stack

Apache FlinkApache KafkaApache SparkAWSDockerGoJavaKubernetesPythonScalaTerraform

Categories

AI & MLData EngineeringDevOps