about 3 hours ago
Remote, United States
Senior
Responsibilities
- Tune the performance of GPU clusters and InfiniBand networks for optimal operation.
- Analyze and troubleshoot issues related to GPUs and InfiniBand networks.
- Integrate new hardware into the existing infrastructure, supporting new GPU hardware.
- Enhance automation systems for proactive monitoring and issue resolution.
- Configure and manage GPU devices and InfiniBand fabrics.
Requirements
- 5+ years of professional experience in system-level software development.
- 3+ years of hands-on experience with Linux systems.
- In-depth understanding of server architecture and HPC systems.
- Strong proficiency in performance-oriented programming languages like C/C++, Go, or Python.
Benefits
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within Nebius.
- Flexible working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.
Tech Stack
CC++GoLinuxPythonPyTorchTensorFlow
Categories
AI & MLData EngineeringDevOps