GrepJob
Attain

Sr/Staff Site Reliability Engineer, Consumer Apps

Attain
Apply
4 months ago
Chicago, IL, USASenior / Staff+
H1B Sponsor

Responsibilities

  • Write Terraform modules for deploying infrastructure resources via GitLab pipelines.
  • Develop Helm charts for deploying services and jobs in Kubernetes.
  • Define metrics, network policies, and routing rules for the Istio service mesh.
  • Monitor and maintain GCP BigQuery and Spanner databases.
  • Pipe metrics to Google-managed Prometheus and build Grafana dashboards.
  • Experiment with GCP offerings and open-source tools for automation.
  • Leverage LLM models in developing infrastructure and tooling.
  • Pair with engineering leads to instrument and monitor critical functionality.
  • Add automation to reduce reliance on manual processes.
  • Participate in architecture design and capacity planning discussions.
  • Build, maintain, and improve the CI/CD pipeline.

Requirements

  • 6+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP).
  • Experience with containerization technologies like Docker, Kubernetes, and Istio.
  • Familiarity with SQL database technologies such as MySQL, Google BigQuery, and Google Spanner.
  • Experience with stream technologies like Kafka and Amazon Kinesis.
  • Knowledge of pub/sub technologies such as AWS SNS and Google Pub/Sub.
  • Experience with serverless computing technologies like AWS Lambda and Google Cloud Functions.
  • Proficiency in infrastructure-as-code tools such as Terraform.
  • Experience with observability tools like Datadog, Prometheus, and Grafana.
  • Strong computer science and software engineering fundamentals.
  • Familiarity with SOC2 Compliance processes and requirements.

Tech Stack

Apache KafkaAWSDatadogDockerGoogle BigQueryGoogle Cloud PlatformGrafanaHelmIstioKubernetesPrometheusTerraform

Categories