5 months ago
Toronto, Canada +4 moreSenior / Staff+
H1B Sponsor
Responsibilities
- Design and implement mechanisms to optimize GPU and cluster utilization for AI models.
- Build scalable frameworks for managing large compute jobs and data processing.
- Develop observability and visualization tools for performance diagnostics.
- Collaborate with AI teams to integrate acceleration techniques into pipelines.
- Champion the use of modern cloud and container technologies for system scaling.
Requirements
- Bachelor's degree in Computer Science, Engineering, or related field.
- 5+ years of experience in large-scale MLOps, AI infrastructure, or HPC systems.
- Experience with data frameworks like Ray, Apache Spark, or LanceDB.
- Strong proficiency in Python and C++ for infrastructure development.
- Hands-on experience with orchestration frameworks like Kubernetes and Ray.
- Familiarity with core ML frameworks such as PyTorch, TensorFlow, or JAX.
Benefits
- Competitive salary and benefits package.
- Dynamic and inclusive work environment.
- Opportunities for professional growth and advancement.
- Collaborative culture that values innovation and creativity.
- Access to the latest technologies and tools.