Site Reliability Engineer, Metal
Tenstorrentabout 3 hours ago
Toronto, Canada
Entry Level / Mid Level
H1B Sponsor
Base Salary
$100k - $500k/yr
Responsibilities
- Ensure reliability and operational health of Tenstorrent systems across internal and customer environments.
- Troubleshoot complex issues across compute, networking, and software layers.
- Partner with engineering teams and customers to resolve production incidents.
- Design and improve monitoring, observability, and alerting systems.
- Build automation to reduce operational toil and improve system reliability.
Requirements
- Experienced in site reliability, infrastructure, or systems engineering in distributed environments.
- Strong Linux systems knowledge with the ability to troubleshoot complex multi-layer issues.
- Proficient with observability tools such as Prometheus, Grafana, and alerting systems.
- Comfortable with scripting and automation using Python, Go, or similar languages.
- Solid understanding of networking fundamentals and how systems behave at scale.
Tech Stack
GoGrafanaLinuxPrometheusPython
Categories
DevOps