about 2 hours ago
Remote, WorldwideMid Level / Senior
Responsibilities
- Profile and analyze GPU performance at the system and kernel level.
- Evaluate and compare GPU performance across different platforms and software stacks.
- Debug and optimize ML workloads for efficient GPU execution.
- Perform acceptance testing for new GPU clusters to ensure performance and compatibility.
- Conduct experiments on GPU configurations to assess performance impacts.
- Develop tools and dashboards to visualize performance metrics and trends.
- Contribute to internal tooling, frameworks, and best practices.
Requirements
- Profound understanding of theoretical foundations of machine learning.
- Deep knowledge of performance aspects of large neural networks.
- Experience with modern deep learning frameworks like PyTorch and JAX.
- Good understanding of the GPU stack including CUDA and relevant libraries.
- Familiarity with containerized environments such as Docker and Kubernetes.
- Strong communication skills and ability to work independently.
Benefits
- Competitive compensation.
- Career growth and learning opportunities.
- Flexibility and work-life balance.
- Collaborative and innovative culture.
- Opportunity to work on impactful AI projects.
- International environment and talented teams.
Tech Stack
AWSDockerGoogle Cloud PlatformKubernetesPyTorch
Categories
AI & MLData Engineering