about 4 hours ago
Responsibilities
- Lead solution discovery and delivery for complex reliability and infrastructure problems.
- Contribute to the platform's architecture, tooling, and roadmap.
- Define and operate reliability practices, including SLOs/SLIs and alerting.
- Resolve cross-team requests and identify systemic issues.
- Operationalize AI workflows for team efficiency.
- Mentor junior engineers and participate in hiring and onboarding.
- Collaborate with Security on platform hardening and incident response.
Requirements
- Solid professional experience in SRE, DevOps, or Platform Engineering.
- Hands-on experience with Kubernetes and container tooling.
- Experience building and managing cloud infrastructure on AWS.
- Strong infrastructure-as-code practice with Terraform.
- Familiarity with reliability frameworks like SLOs and SLIs.
- Solid observability background with tools like OpenTelemetry and Grafana.
- Proficiency with CI/CD and deployment automation.
- Comfortable with Golang and scripting languages.
- Practical use of AI in infrastructure and operations.
- Clear communication skills in an async-first environment.
- Proactive and collaborative mindset.
Benefits
- Work from anywhere.
- Flexible paid time off.
- Flexible working hours in an async environment.
- 16 weeks paid parental leave.
- Mental health support services.
- Stock options.
- Learning budget.
- Home office budget and IT equipment.
- Budget for local in-person social events or co-working spaces.
