13 days ago
Responsibilities
- Analyze ML models to identify and resolve performance bottlenecks.
- Incorporate OSS tools to enable ML engineers to self-sufficiently profile and optimize models.
- Deliver solutions to streamline model deployment across various hardware platforms.
- Collaborate with ML researchers to balance model accuracy and speed.
- Implement optimizations using CUDA, Triton, and custom kernels.
- Promote engineering excellence within the team.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 5+ years of experience, including GPU programming and optimization.
- Strong programming skills in C++ and Python.
- Proven experience in GPU programming and optimization.
- Familiarity with deep learning frameworks, especially PyTorch.
- Experience with CUDA programming and Triton language for GPU kernels.
- Knowledge of PyTorch optimization techniques and TensorRT implementation.
- Experience with ONNX model conversion and deployment.
- Deep understanding of GPU architectures and performance optimization.
- Strong analytical and problem-solving skills.
- Excellent verbal and written communication skills.
- Experience with autonomous vehicles (AV) is a bonus.