GrepJob
Oscilar

Sr./Staff - Infrastructure/Site Reliability Engineer (SRE)

Oscilar
Apply
about 3 hours ago
Remote, Canada or Remote, WorldwideSenior / Staff+
H1B Sponsor

Responsibilities

  • Architect and operate resilient cloud infrastructure using AWS, Pulumi, and Kubernetes.
  • Lead initiatives to improve availability, latency, and performance at scale.
  • Design and evolve CI/CD pipelines for speed, safety, and repeatability.
  • Define metrics, alerts, and runbooks for observability.
  • Run chaos experiments and failure simulations to enhance platform resilience.
  • Mentor engineers and establish best practices for SRE across the company.

Requirements

  • Proven track record as a senior SRE or Infrastructure Engineer in high-scale environments.
  • Expert-level skills in AWS and Infrastructure as Code tools like Pulumi and Terraform.
  • Strong programming skills in Go or Python, with a preference for Go.
  • Deep understanding of distributed systems such as Kafka and ClickHouse.
  • Mastery of container orchestration with Kubernetes and production debugging.
  • Strong sense of ownership and ability to balance velocity with reliability.

Benefits

  • Competitive salary and equity packages, including a 401k plan.
  • Remote-first culture allowing work from anywhere.
  • 100% employer-covered comprehensive health, dental, and vision insurance.
  • Unlimited PTO policy for work-life balance.
  • Family-friendly environment with regular team events and offsites.
  • Unparalleled learning and professional development opportunities.
  • Opportunity to make the internet safer by protecting online transactions.

Tech Stack

Apache KafkaAWSClickHouseGoKubernetesPythonTerraform

Categories

AI & MLData EngineeringDevOps