Cerebras Systems

ML Software Tool Development Engineer

Cerebras Systems

Apply
9 days ago
Toronto, Canada
Mid Level / Senior
H1B Sponsor

Responsibilities

  • Lead the design and implementation of system-level debugging, validation, and observability platforms.
  • Develop automated systems for collecting and analyzing numerical and execution anomalies.
  • Create visualization and analysis tools to enable efficient root-cause investigation.
  • Build frameworks for failure classification, regression detection, and anomaly monitoring.
  • Extend compilers, runtimes, and programming interfaces to support advanced profiling and instrumentation.
  • Improve system bring-up, low-level debug, and validation workflows.
  • Partner cross-functionally with compiler, hardware, firmware, runtime, and infrastructure teams.
  • Establish best practices for debuggability, reliability, and operational excellence.
  • Lead high-impact initiatives.
  • Support incident response and drive long-term corrective actions.

Requirements

  • Strong proficiency in C++ and Python, with a track record of building reliable, high-performance systems and tooling.
  • Demonstrated experience debugging complex hardware/software systems and driving issues to root cause.
  • Experience analyzing system-level data structures, execution graphs, or dependency networks for diagnostics and validation.
  • Proven ability to design and build intuitive visualization and analysis tools for complex technical data.
  • Experience with compiler internals, custom hardware interfaces, or low-level protocol design.
  • Strong written and verbal communication skills, with the ability to explain technical concepts to diverse stakeholders.
  • Ability to work independently and lead complex technical projects end-to-end.

Benefits

  • Opportunity to build a breakthrough AI platform beyond the constraints of the GPU.
  • Ability to publish and open source cutting-edge AI research.
  • Work on one of the fastest AI supercomputers in the world.
  • Enjoy job stability with startup vitality.
  • Experience a simple, non-corporate work culture that respects individual beliefs.

Tech Stack

C++Python

Categories

AI & MLData Engineering