7 days ago
Responsibilities
- Design and evolve an ML application-level simulator that models workload execution, system behaviour, and resource usage.
- Translate real-world workloads and benchmarking data into accurate simulation models.
- Validate simulator outputs against real system measurements and continuously improve fidelity.
- Build scalable, maintainable components that support evolving hardware and software capabilities.
- Work closely with performance engineers, ML teams, and silicon architects to ensure the simulator reflects real use cases.
- Drive adoption of the simulator as a decision-making tool across the organisation.
Requirements
- Strong C++ development experience, especially in performance-sensitive or large-scale systems.
- Strong Python skills and familiarity with ML frameworks (e.g. PyTorch) and execution graphs.
- Solid understanding of computer architecture, memory hierarchy, and heterogeneous systems.
- Familiarity with ML accelerator concepts (e.g. tensor cores, compute tiles, high-bandwidth memory).
- Understanding of modern ML architectures (e.g. transformers, MoE) in training and inference is desirable.
- Experience with simulators or performance models (hardware or distributed systems) is desirable.
- Knowledge of parallelism models and their mapping to hardware is desirable.
- Experience with profiling and debugging complex, multi-layer systems is desirable.
Benefits
- Competitive salary.
- Annual leave policy.
- Medical and dental health plans.
- Gym card.
- Employee pension (matched up to 4%).
- Yearly review of benefits to ensure value and reward for employees.
- Commitment to building an inclusive work environment.