Databricks

Software Engineer - Gen AI inference

Databricks

Apply
5 months ago
San Francisco, CA, USA
Mid Level / Senior
H1B Sponsor

Base Salary

$142k - $205k/yr

Responsibilities

  • Contribute to the design and implementation of the inference engine for large-scale LLMs.
  • Collaborate with researchers to integrate new model architectures and features.
  • Optimize latency, throughput, memory efficiency, and hardware utilization.
  • Build and maintain profiling and tracing tools to identify bottlenecks.
  • Develop scalable routing, batching, scheduling, and memory management mechanisms.
  • Support reliability and fault tolerance in inference pipelines.
  • Integrate with distributed inference infrastructure and manage load balancing.
  • Document and share learnings to contribute to best practices.

Requirements

  • BS/MS/PhD in Computer Science or a related field.
  • 3+ years of experience in performance-critical systems.
  • Strong understanding of ML inference internals.
  • Hands-on experience with CUDA and GPU programming.
  • Experience designing and operating distributed systems.
  • Ability to uncover and solve performance bottlenecks.
  • Experience building instrumentation and profiling tools for ML models.
  • Ability to work closely with ML researchers.
  • Ownership mindset and eagerness to tackle complex challenges.
  • Bonus: published research or open-source contributions in ML systems.

Tech Stack

Apache SparkDatabricksMLflow

Categories

AI & MLData Engineering