7 months ago
San Francisco, CA, USA or New York, NY, USAMid Level / Senior
Base Salary
$180k - $360k/yr
Responsibilities
- Design, build, and operate the Model APIs surface with advanced inference capabilities.
- Profile and optimize TensorRT-LLM kernels and analyze CUDA kernel performance.
- Productionize performance improvements across runtimes with a deep understanding of their internals.
- Build comprehensive benchmarking frameworks to measure real-world performance.
- Instrument deep observability and build repeatable benchmarks for speed and reliability.
- Implement platform fundamentals such as API versioning and authentication.
- Collaborate closely with other teams to enhance developer-friendly model serving experiences.
Requirements
- 3+ years experience building and operating distributed systems or large-scale APIs.
- Proven track record of owning low-latency, reliable backend services.
- Strong infra instincts with performance sensibilities like profiling and capacity planning.
- Comfortable debugging complex systems from runtime internals to GPU execution traces.
- Strong written communication skills for producing clear design docs.
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents.
- Flexible PTO policy including a company-wide Winter Break.
- Paid parental leave and fertility/family-building stipend.
- Company-facilitated 401(k) and exposure to various ML startups.
