GrepJob
EngFlow

Site Reliability Engineer

EngFlow
Apply
3 months ago
Austin, TX, USA +2 moreMid Level / Senior
H1B Sponsor

Responsibilities

  • Design, build, and maintain cloud infrastructure for the distributed build acceleration platform.
  • Automate deployment pipelines, monitoring, and recovery processes.
  • Manage scalability and reliability for high-throughput, low-latency systems.
  • Implement and maintain observability practices including logging, metrics, tracing, and alerting.
  • Collaborate with product and engineering teams to ensure reliability in features.
  • Diagnose and resolve production incidents quickly, incorporating learnings into system design.
  • Optimize cost, performance, and resilience across multi-cloud environments.

Requirements

  • 4+ years in SRE, DevOps, or Production Engineering roles.
  • Experience managing Kubernetes in production environments.
  • Strong background in cloud infrastructure (GCP or AWS) and Infrastructure as Code (Terraform preferred).
  • Solid knowledge of networking, security, and distributed systems.
  • Proven track record of improving system availability and developer productivity.
  • Ability to debug complex, cross-system issues under pressure.

Benefits

  • Comprehensive medical, dental, and vision benefits.
  • 401k/pension plan.
  • Parental leave and generous vacation policy.
  • Fully remote work environment with team meetups at exciting destinations.
  • Engaging team events such as tastings and monthly games.

Tech Stack

AWSGoogle Cloud PlatformKubernetesTerraform

Categories