Senior Software Engineer - SRE

about 2 months ago

Bengaluru, IndiaSenior

H1B Sponsor

Responsibilities

Facilitate blameless post-incident reviews to identify root causes and prioritize reliability improvements.
Implement chaos engineering practices to validate system resilience and recovery procedures.
Establish core SRE principles and frameworks across the organization.
Manage error budgets to balance feature velocity with system reliability.
Automate repetitive operational tasks to reduce toil.
Implement capacity planning processes to ensure systems meet SLOs.
Build observability systems for deep visibility into service health and performance.
Create SRE dashboards for real-time visibility into reliability metrics.
Partner with development teams to implement reliability from the design phase.
Drive continuous improvement through SRE feedback loops and documentation.

8+ years of experience in DevOps/SRE roles with expertise in SRE principles.
Deep experience with observability and monitoring platforms like Prometheus and Grafana.
Strong background in incident management and conducting blameless postmortems.
Understanding of distributed systems and reliability engineering concepts.
Experience with Kubernetes, Docker, and service mesh technologies.
Proficiency in cloud-focused software development, preferably in Go or Python.
Experience with Infrastructure as Code tools like Terraform or Ansible.
Hands-on experience with cloud platforms such as AWS, GCP, or Azure.
Ability to communicate effectively with technical and non-technical stakeholders.
BS Degree in Computer Science or equivalent.