GrepJob
Gatik AI

Senior/Staff Site Reliability Engineer

Gatik AI
Apply
2 months ago
Mountain View, CA, USASenior / Staff+
H1B Sponsor

Base Salary

$180k - $260k/yr

Responsibilities

  • Upgrade and maintain both physical and cloud infrastructure for data offloading from autonomous vehicles.
  • Collaborate with infrastructure and platform teams to monitor and troubleshoot on-premises data offload and CI systems.
  • Design and maintain business intelligence dashboards and ETL pipelines for infrastructure performance insights.
  • Architect and deploy test environments for internal and customer-facing infrastructure solutions.
  • Automate deployment, scaling, and upgrading of remote monitoring software.
  • Analyze infrastructure performance to identify optimization opportunities.

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
  • Strong knowledge of networking fundamentals, including protocols and troubleshooting.
  • Hands-on experience with Docker and related tools.
  • Expertise in Kubernetes deployments and Helm package management.
  • Proficiency with relational and time-series databases like Postgres and InfluxDB.
  • Familiarity with workflow orchestration tools such as Argo and Airflow.
  • Experience managing upgrades and rollbacks for SaaS environments.
  • Scripting experience in Python and Bash for automation.
  • Experience building and maintaining dashboards with Grafana.

Tech Stack

Apache AirflowArgo CDBashDockerGrafanaHelmInfluxDBKubernetesPostgreSQLPython

Categories

AI & MLData EngineeringDevOpsSecurity