6 months ago
San Francisco, CA, USAMid Level / Senior
Responsibilities
- Optimize models for speed and efficiency through low-level engineering.
- Collaborate with research teams to productionize new architectures.
- Enhance performance of training and inference workloads.
- Work on CUDA kernels and GPU scheduling for improved performance.
- Manage memory layouts and parallelization for large-scale ML systems.
Requirements
- Strong background in systems-level ML engineering.
- Experience with CUDA and GPU kernel optimization.
- Fluency in Python and at least one systems language (C++ or Rust preferred).
- Familiarity with distributed training frameworks like PyTorch or JAX.
- Experience with large-scale training or inference infrastructure.
- Understanding of memory management and hardware-aware model optimization.
- 2+ years of experience in ML infrastructure or performance-critical environments.
- Willingness to work in-person from the SF office in FiDi.
