GrepJob
Air Apps

Site Reliability Engineer (SRE)

Air Apps
Apply
about 1 year ago
San Francisco, CA, USAMid Level / Senior

Base Salary

$116k - $200k/yr

Responsibilities

  • Design and implement scalable, reliable, and fault-tolerant systems across cloud environments.
  • Develop and maintain observability tools, including monitoring, logging, and alerting.
  • Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools.
  • Optimize system performance, scalability, and incident response workflows.
  • Work closely with development and DevOps teams to improve system design for reliability.
  • Conduct root cause analysis (RCA) and implement preventative measures to minimize failures.
  • Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies.
  • Improve CI/CD pipelines to enhance deployment speed while maintaining stability.
  • Optimize cloud cost and resource utilization for AWS, Azure, or GCP.
  • Participate in on-call rotations to quickly address system failures and minimize downtime.

Requirements

  • Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering.
  • Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures.
  • Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic).
  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi.
  • Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm).
  • Strong Linux system administration and networking fundamentals.
  • Experience with incident management, debugging, and root cause analysis.
  • Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring.
  • Knowledge of load balancing, failover strategies, and distributed systems.
  • Understanding of security best practices, access control, and compliance requirements.
  • Strong communication skills and the ability to collaborate with cross-functional teams.

Benefits

  • Apple hardware ecosystem for work.
  • Annual Bonus.
  • Medical Insurance (including vision & dental).
  • Disability insurance - short and long-term.
  • 401k up to 4% contribution.
  • Air Conference – an opportunity to meet the team, collaborate, and grow together.
  • Transportation budget.
  • Free meals at the hub.
  • Gym membership.

Tech Stack

AWSAzureBashDatadogDockerGoGoogle Cloud PlatformGrafanaHelmKubernetesPrometheusPythonTerraform

Categories