about 2 months ago
Base Salary
$190k - $220k/yr
Responsibilities
- Own the infrastructure for the Data Replication platform, including Kubernetes clusters and CI/CD pipelines.
- Collaborate with product engineers to integrate features reliably with infrastructure.
- Enhance observability, alerting, and anomaly detection with a focus on LLM automation.
- Develop AI-augmented release and internal tooling for automated deployments and rollbacks.
- Set infrastructure standards by creating self-serve tooling and coaching engineers.
Requirements
- 7+ years of experience in infrastructure, platform engineering, SRE, or DevOps.
- Hands-on experience with Kubernetes, Helm, and Terraform in production.
- Deep knowledge of observability stacks like Prometheus, Grafana, and Datadog.
- Experience managing CI/CD pipelines and developer tooling.
- Ability to read backend code to troubleshoot and instrument systems.
- Fluency with AI tools and frameworks for automation and debugging.
- A startup-ready mindset, comfortable with ambiguity and fast-paced problem-solving.
