Principal SRE

about 2 hours ago

H1B Sponsor

Base Salary

$180k - $240k/yr

Responsibilities

Own the reliability, scalability, and operational health of Gradial’s production platform.
Lead the evolution of Kubernetes, CI/CD, observability, and infrastructure as code across the stack.
Set the standard for how we design, ship, and operate reliable systems.
Build the tooling and automation that help engineers move faster with more confidence.
Drive improvements in monitoring, alerting, incident response, and service readiness.
Partner with engineering to spot scaling risks early and solve them before they slow us down.
Influence the long-term direction of our platform across reliability, security, performance, and cost.

5+ years of experience in SRE, DevOps, platform engineering, or infrastructure roles with direct ownership of production systems.
Proven success designing and operating production-grade infrastructure in fast-moving, high-growth environments.
Deep expertise in Kubernetes, cloud-native architecture, and container orchestration.
Strong experience with infrastructure as code, GitOps, CI/CD workflows, and modern deployment practices.
Strong command of observability and reliability fundamentals across metrics, logging, tracing, alerting, and incident response.
A track record of leading through influence, making sound technical decisions, and raising the bar across engineering teams.