Site Reliability Engineer

3 months ago

Austin, TX, USA +2 moreMid Level / Senior

H1B Sponsor

Responsibilities

Design, build, and maintain cloud infrastructure for the distributed build acceleration platform.
Automate deployment pipelines, monitoring, and recovery processes.
Manage scalability and reliability for high-throughput, low-latency systems.
Implement and maintain observability practices including logging, metrics, tracing, and alerting.
Collaborate with product and engineering teams to ensure reliability in features.
Diagnose and resolve production incidents quickly, incorporating learnings into system design.
Optimize cost, performance, and resilience across multi-cloud environments.

4+ years in SRE, DevOps, or Production Engineering roles.
Experience managing Kubernetes in production environments.
Strong background in cloud infrastructure (GCP or AWS) and Infrastructure as Code (Terraform preferred).
Solid knowledge of networking, security, and distributed systems.
Proven track record of improving system availability and developer productivity.
Ability to debug complex, cross-system issues under pressure.

AWSGoogle Cloud PlatformKubernetesTerraform