
Senior Machine Learning Engineer
TensorWave6 months ago
Las Vegas, NV, USASenior
Responsibilities
- Design, operate, and improve ML infrastructure systems for distributed training and inference.
- Build reliable workload execution and orchestration patterns in shared GPU environments.
- Troubleshoot performance, reliability, and scalability issues across the ML stack.
- Collaborate with ML, systems, and platform teams to enhance developer experience and operational efficiency.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, or a related field, or equivalent experience.
- Expertise in supporting production ML systems using SLURM and Kubernetes.
- Strong understanding of GPU-accelerated workloads and distributed systems.
- Solid Linux fundamentals and experience debugging infrastructure-level issues.
- Ability to build automation and tooling using Python, Go, etc.
Benefits
- Competitive Salary
- Stock Options
- 100% paid Medical, Dental, and Vision insurance
- Flexible PTO
- Paid Holidays
- 401(k)
- Parental Leave
- Flexible Spending Account
- Short Term Disability Insurance
- Life and Voluntary Supplemental Insurance
- Mental Health Benefits through Spring Health