Performance Reliability Engineer
Cerebras Systems
3 months ago
Sunnyvale, CA, USA or Toronto, Canada
Mid Level / Senior
H1B Sponsor
Responsibilities
- Characterize and enhance the performance and reliability of advanced ML hardware/software systems.
- Analyze ML workloads, software kernels, and hardware architecture for power and performance impacts.
- Develop creative software solutions to improve reliability and performance.
- Influence the design of Cerebras' next-generation AI architecture and software stack.
- Partner with ML engineers, researchers, and reliability specialists to understand model behavior.
- Collaborate with teams in architecture, silicon, and research to advance computational platforms.
Requirements
- BS, MS, or PhD in Computer Science, Electrical Engineering, or a related field.
- 3+ years of relevant experience in performance engineering, reliability, computer architecture, and/or software design.
- Proficiency in Python or other scripting languages.
- Experience with C/C++ and assembly programming.
- Demonstrated expertise with system-level performance and reliability optimization.
- Strong verbal and written communication skills.
- Nice to have: Hands-on experience with ML models and frameworks.
- Nice to have: Understanding of thermal management principles and power delivery for advanced semiconductors.
Benefits
- Opportunity to build a breakthrough AI platform beyond the constraints of the GPU.
- Ability to publish and open source cutting-edge AI research.
- Work on one of the fastest AI supercomputers in the world.
- Enjoy job stability with startup vitality.
- Experience a simple, non-corporate work culture that respects individual beliefs.
Tech Stack
CC++Python
Categories
AI & MLData Engineering