Senior Site Reliability Engineer

about 2 months ago

Melbourne, AustraliaSenior

H1B Sponsor

Responsibilities

Build and extend platforms to improve system reliability.
Work on team goals that encompass reliability for the entire company.
Standardize reliability tools across multiple platforms and organizations.
Triage, coordinate, and lead stabilization of sev 0–1 incidents.
Serve as primary oncall, maintaining structured escalation paths.
Drive platform-wide reliability improvements and shared operational tooling.
Use AI-driven systems to improve signal detection and accelerate root cause analysis.
Design and implement safe deployment patterns.

Drive to root cause systems with many moving parts.
Demonstrated technical initiative and leadership on previous projects.
Familiarity with AI-driven tooling for observability and incident analysis.
Experience running production oncall for high-availability systems.
Strong incident management skills including structured triage and blameless postmortems.
Fluency with CI/CD pipelines and rollback automation.
Monitoring and observability expertise.
Ability to create and maintain evidence-based maturity assessments.
Comfort with vendor/dependency management.
Boundless curiosity, autonomy, and a strong sense of accountability.
5+ years of software development experience.

Amazon DynamoDBAmbassadorAWSDatadoggRPCIstioJava Kotlin KubernetesMySQLTerraform