
Senior Site Reliability Engineer- Remote
ClickHouseabout 1 month ago
Remote, AustraliaSenior / Staff+
Responsibilities
- Collaborate with engineering teams to design and implement scalable, secure systems.
- Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
- Ensure infrastructure components have monitoring and alerting for incident detection.
- Enhance incident response processes and conduct post-mortem analysis.
- Continuously improve the reliability and performance of ClickHouse services.
- Plan and drive Chaos initiatives across engineering teams.
- Manage on-call processes for performance and reliability issues.
Requirements
- Bachelor’s or Master’s degree in Computer Science or a related field.
- At least 8 years of experience in Site Reliability Engineering or a related field.
- Hands-on experience with Go and/or Python.
- Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud.
- Excellent understanding of distributed databases and SQL, particularly ClickHouse.
- Experience with container orchestration tools like Kubernetes or Docker Swarm.
- Strong experience with automation and configuration management tools like Ansible, Terraform, or Puppet.
- Strong problem-solving skills and production debugging capabilities.
- Passionate about efficiency, availability, scalability, and data governance.
- Ability to thrive in a fast-paced environment and partner with the business.
- High level of responsibility, ownership, and accountability.
- Excellent communication and interpersonal skills.
Benefits
- Flexible work environment with remote-friendly policies.
- Employer contributions towards healthcare.
- Equity in the company with stock options for new team members.
- Flexible time off in the US and generous entitlement in other countries.
- A $500 home office setup for remote employees.
- Opportunities for global gatherings and in-person connections.