GrepJob
Achira

SWE - Distributed

Achira
Apply
7 months ago
San Francisco, CA, USA or New York, NY, USAMid Level / Senior

Base Salary

$165k - $259k/yr

Responsibilities

  • Design and implement distributed compute infrastructure for ML data processing.
  • Improve cluster observability, scheduling, and resource utilization.
  • Research and implement cost-efficient compute solutions.
  • Develop tools for monitoring and performance tuning of ML workloads.
  • Collaborate with ML engineers to enhance training pipelines.
  • Stay current with emerging technologies in distributed computing.

Requirements

  • Experience with distributed computing frameworks like Ray, Dask, or Celery.
  • Strong understanding of parallel computing and job scheduling.
  • Ability to identify and resolve performance issues in distributed systems.
  • Experience with cloud compute platforms such as AWS, GCP, or Azure.
  • Familiarity with ML frameworks like PyTorch, TensorFlow, or JAX.

Tech Stack

Apache SparkAWSAzureGoogle Cloud PlatformKubernetesPyTorchTensorFlow

Categories

AI & MLData ScienceDevOps