5 days ago
Milpitas, CA, USAMid Level / Senior
H1B Sponsor
Responsibilities
- Analyze ML models’ compute and memory requirements using roofline analysis and simulations.
- Collaborate across hardware and software teams to optimize large-scale AI workloads.
- Benchmark, monitor, and troubleshoot system performance across distributed systems.
- Optimize communication stacks including MPI, NCCL, UCX, RDMA, and networking fabrics.
- Profile and optimize AI workloads, focusing on performance bottlenecks.
- Develop high-quality, ARM-compatible code and documentation.
Requirements
- BS/MS in Computer Science, Electrical Engineering, or related field.
- Experience with distributed systems and communication libraries (MPI, NCCL, UCX, libfabric).
- Strong programming skills in C++ and Python.
- Experience profiling and optimizing HPC or AI/ML workloads.
- Familiarity with ML benchmarks such as MLPerf.
- Desirable: Experience with GPUs or accelerated computing architectures.
- Desirable: Knowledge of HPC networking and interconnect technologies (InfiniBand, RoCE).
- Desirable: Familiarity with ML frameworks such as PyTorch or TensorFlow.
- Desirable: Understanding of ARM architectures and toolchains.
- Desirable: Strong debugging, profiling, and performance optimization skills.