ML Software Tool Development Engineer
Cerebras Systems
9 days ago
Toronto, Canada
Mid Level / Senior
H1B Sponsor
Responsibilities
- Lead the design and implementation of system-level debugging, validation, and observability platforms.
- Develop automated systems for collecting and analyzing numerical and execution anomalies.
- Create visualization and analysis tools to enable efficient root-cause investigation.
- Build frameworks for failure classification, regression detection, and anomaly monitoring.
- Extend compilers, runtimes, and programming interfaces to support advanced profiling and instrumentation.
- Improve system bring-up, low-level debug, and validation workflows.
- Partner cross-functionally with compiler, hardware, firmware, runtime, and infrastructure teams.
- Establish best practices for debuggability, reliability, and operational excellence.
- Lead high-impact initiatives.
- Support incident response and drive long-term corrective actions.
Requirements
- Strong proficiency in C++ and Python, with a track record of building reliable, high-performance systems and tooling.
- Demonstrated experience debugging complex hardware/software systems and driving issues to root cause.
- Experience analyzing system-level data structures, execution graphs, or dependency networks for diagnostics and validation.
- Proven ability to design and build intuitive visualization and analysis tools for complex technical data.
- Experience with compiler internals, custom hardware interfaces, or low-level protocol design.
- Strong written and verbal communication skills, with the ability to explain technical concepts to diverse stakeholders.
- Ability to work independently and lead complex technical projects end-to-end.
Benefits
- Opportunity to build a breakthrough AI platform beyond the constraints of the GPU.
- Ability to publish and open source cutting-edge AI research.
- Work on one of the fastest AI supercomputers in the world.
- Enjoy job stability with startup vitality.
- Experience a simple, non-corporate work culture that respects individual beliefs.
Tech Stack
C++Python
Categories
AI & MLData Engineering