GrepJob
Zuora

Site Reliability Engineer II

Zuora
Apply
about 23 hours ago
San José, Costa RicaMid Level / Senior
H1B Sponsor

Responsibilities

  • Design and implement intelligent automation for infrastructure lifecycle management.
  • Apply AI/ML techniques for predictive monitoring and performance optimization.
  • Lead complex incident response efforts and root cause analyses.
  • Improve system reliability through dynamic scaling and automated performance tuning.
  • Enhance operational runbooks by eliminating manual processes through automation.
  • Evaluate and adopt emerging AIOps and cloud-native technologies.
  • Partner cross-functionally to deliver exceptional customer experiences.

Requirements

  • 2–4 years of experience in Linux systems administration and/or Python development.
  • Strong Linux administration skills including troubleshooting and performance tuning.
  • Experience developing Python scripts for operational workflows.
  • Hands-on experience with Docker and familiarity with Kubernetes.
  • At least one year of experience supporting SaaS or cloud-native environments.
  • Working knowledge of messaging platforms and databases like Kafka and MySQL.
  • Experience contributing to CI/CD pipelines and deployment automation.
  • Hands-on experience with monitoring platforms such as Prometheus and Grafana.
  • Experience in incident response and root cause analysis.
  • A demonstrated passion for automation and operational efficiency.

Benefits

  • Competitive compensation and performance-based rewards.
  • Medical, dental, and vision insurance.
  • Generous flexible time off and paid holidays.
  • Paid parental leave for eligible employees.
  • Learning and development stipend for ongoing growth.
  • Opportunities to volunteer and charitable donation matching.
  • Mental wellbeing resources and support.

Tech Stack

AnsibleApache KafkaAWSDockerGrafanaJenkinsKubernetesLinuxMySQLPrometheusPuppetPythonRedisTerraform

Categories