Staff Engineer, HPC Systems Software
Tenstorrent
3 months ago
Austin, TX, USA or Santa Clara, CA, USA
Staff+
H1B Sponsor
Base Salary
$100k - $500k/yr
Responsibilities
- Design and maintain automated OS deployment pipelines for bare-metal HPC clusters globally.
- Manage large-scale configuration management using Ansible.
- Deploy and lifecycle manage RHEL and Ubuntu systems across diverse hardware platforms.
- Implement infrastructure-as-code for repeatable, version-controlled system configurations.
- Troubleshoot OS-level issues and optimize system performance.
- Collaborate with hardware design teams to standardize system configurations.
- Build automation and tooling to streamline provisioning and system updates.
Requirements
- Experienced in RHEL and Ubuntu administration in HPC or large-scale compute environments.
- Highly skilled in Ansible for automation and configuration management.
- Proficient with bare-metal provisioning systems like MAAS or Foreman.
- Deep understanding of Linux system internals and performance troubleshooting.
- Familiar with HPC cluster architecture and infrastructure-as-code practices.
- Capable of diagnosing complex infrastructure issues independently.
Tech Stack
AnsibleBashDockerGrafanaLinuxPrometheusPython
Categories
AI & MLData EngineeringDevOpsSecurity