about 2 hours ago
Base Salary
$180k - $240k/yr
Responsibilities
- Own the reliability, scalability, and operational health of Gradial’s production platform.
- Lead the evolution of Kubernetes, CI/CD, observability, and infrastructure as code across the stack.
- Set the standard for how we design, ship, and operate reliable systems.
- Build the tooling and automation that help engineers move faster with more confidence.
- Drive improvements in monitoring, alerting, incident response, and service readiness.
- Partner with engineering to spot scaling risks early and solve them before they slow us down.
- Influence the long-term direction of our platform across reliability, security, performance, and cost.
Requirements
- 5+ years of experience in SRE, DevOps, platform engineering, or infrastructure roles with direct ownership of production systems.
- Proven success designing and operating production-grade infrastructure in fast-moving, high-growth environments.
- Deep expertise in Kubernetes, cloud-native architecture, and container orchestration.
- Strong experience with infrastructure as code, GitOps, CI/CD workflows, and modern deployment practices.
- Strong command of observability and reliability fundamentals across metrics, logging, tracing, alerting, and incident response.
- A track record of leading through influence, making sound technical decisions, and raising the bar across engineering teams.
Benefits
- Meaningful equity and competitive salary.
- Comprehensive health, dental and vision coverage.
- Fast-paced environment with autonomy and ownership.
- Real impact, zero bureaucracy.
- A front-row seat to building category-defining AI infrastructure.
