Faire

Staff Machine Learning Platform Engineer

Faire

Apply
9 days ago
San Francisco, CA, USA
Staff+

Base Salary

$224k - $308k/yr

Responsibilities

  • Design and operate ML infrastructure, including workspaces, clusters, jobs, and workflows.
  • Productionize ML workloads using Spark, Delta Lake, MLflow, and Databricks Workflows.
  • Teach data scientists how to utilize the ML platform for model development.
  • Implement Unity Catalog for data governance and secure multi-tenant usage.
  • Build CI/CD pipelines for ML using Terraform and Git-based workflows.
  • Optimize performance, reliability, and cost across training and inference workloads.
  • Configure IAM and RBAC for sensitive data sets.
  • Establish observability for data quality, model performance, and platform health.
  • Build and maintain ML Platform technical documentation.

Requirements

  • 8+ years of experience building production ML or data platforms.
  • A degree in Computer Science, Engineering, Statistics, or a related technical field.
  • Strong hands-on expertise with Databricks, Spark, Delta Lake, and MLflow.
  • Proficiency in Python, SQL, and distributed systems concepts.
  • Experience with cloud platforms and infrastructure-as-code.
  • Solid understanding of MLOps best practices: CI/CD, monitoring, reproducibility, and security.
  • Experience supporting multiple ML teams in a shared platform environment.
  • Willingness to take ownership of orphaned problems and learn as needed.

Benefits

  • Hybrid work model with in-office attendance three days a week.
  • Flexibility to work remotely up to four weeks per year.
  • Equity and benefits eligibility.

Tech Stack

Apache AirflowApache KafkaApache SparkAWSDatabricksDatadogDockerGitHub ActionsKotlinKubernetesMLflowMySQLPythonPyTorchSnowflakeSQLTerraform

Categories

AI & MLData ScienceDevOps