Senior Site Reliability Engineer

about 2 months ago

Tel Aviv-Yafo, IsraelSenior

H1B Sponsor

Responsibilities

Own reliability as a product capability by defining SLOs and incident response practices.
Build and improve platform foundations for safe and quick engineering workflows.
Design and operate resilient systems for real-time ingestion and detection.
Lead improvements in scalability, performance, and operational maturity.
Drive modernization of the infrastructure stack, including Kubernetes and infrastructure as code.
Enhance developer experience through internal tooling and operational standards.
Collaborate with backend and platform engineers to reduce toil and improve reliability.
Participate in incident management and root cause analysis.

5+ years of experience in SRE, infrastructure engineering, or platform engineering.
Strong experience with cloud-native systems in production, especially in demanding environments.
Deep practical experience with Kubernetes, AWS, and infrastructure as code tools.
Strong understanding of observability, monitoring, and performance tuning for distributed systems.
Experience with CI/CD systems and operational automation.
Solid coding ability in Python, Go, or Rust.
Sound judgment around reliability trade-offs and safe delivery practices.
Collaborative mindset with a focus on enabling product teams and mentoring engineers.