GrepJob
Tenstorrent

Site Reliability Engineer, Metal

Tenstorrent
Apply
about 3 hours ago
Toronto, Canada
Entry Level / Mid Level
H1B Sponsor

Base Salary

$100k - $500k/yr

Responsibilities

  • Ensure reliability and operational health of Tenstorrent systems across internal and customer environments.
  • Troubleshoot complex issues across compute, networking, and software layers.
  • Partner with engineering teams and customers to resolve production incidents.
  • Design and improve monitoring, observability, and alerting systems.
  • Build automation to reduce operational toil and improve system reliability.

Requirements

  • Experienced in site reliability, infrastructure, or systems engineering in distributed environments.
  • Strong Linux systems knowledge with the ability to troubleshoot complex multi-layer issues.
  • Proficient with observability tools such as Prometheus, Grafana, and alerting systems.
  • Comfortable with scripting and automation using Python, Go, or similar languages.
  • Solid understanding of networking fundamentals and how systems behave at scale.

Tech Stack

GoGrafanaLinuxPrometheusPython

Categories

DevOps