Software Engineer - Gen AI inference
Databricks
5 months ago
San Francisco, CA, USA
Mid Level / Senior
H1B Sponsor
Base Salary
$142k - $205k/yr
Responsibilities
- Contribute to the design and implementation of the inference engine for large-scale LLMs.
- Collaborate with researchers to integrate new model architectures and features.
- Optimize latency, throughput, memory efficiency, and hardware utilization.
- Build and maintain profiling and tracing tools to identify bottlenecks.
- Develop scalable routing, batching, scheduling, and memory management mechanisms.
- Support reliability and fault tolerance in inference pipelines.
- Integrate with distributed inference infrastructure and manage load balancing.
- Document and share learnings to contribute to best practices.
Requirements
- BS/MS/PhD in Computer Science or a related field.
- 3+ years of experience in performance-critical systems.
- Strong understanding of ML inference internals.
- Hands-on experience with CUDA and GPU programming.
- Experience designing and operating distributed systems.
- Ability to uncover and solve performance bottlenecks.
- Experience building instrumentation and profiling tools for ML models.
- Ability to work closely with ML researchers.
- Ownership mindset and eagerness to tackle complex challenges.
- Bonus: published research or open-source contributions in ML systems.
Tech Stack
Apache SparkDatabricksMLflow
Categories
AI & MLData Engineering