
Site Reliability Engineer (SRE)
New Era Technology29 days ago
Delhi, IndiaMid Level / Senior
H1B Sponsor
Responsibilities
- Maintain reliable, scalable, and secure production environments.
- Implement and manage monitoring, alerting, and logging solutions.
- Contribute to defining and tracking SLIs/SLOs and support error budget practices.
- Automate operational tasks to improve efficiency and reduce manual effort.
- Perform troubleshooting and Root Cause Analysis (RCA) for production incidents.
- Optimize system performance, availability, and capacity.
- Maintain runbooks, SOPs, and incident documentation in Confluence.
- Adhere to change management, deployment governance, and disaster recovery standards.
- Support incident response for critical production services.
- Coordinate with external vendors and internal cross-functional teams.
- Work closely with Engineering, Product Owners, and Operations teams.
- Manage incidents and changes using ServiceNow & JIRA.
- Collaborate through Slack and structured communication channels.
Requirements
- 3+ years of experience in Site Reliability Engineering or related field.
- Strong knowledge of Windows and Linux/Unix systems.
- Solid understanding of networking fundamentals (DNS, TCP/IP, Load Balancing, Firewalls).
- Experience with at least one cloud platform (AWS, Azure, or GCP).
- Proficiency in one scripting/programming language (Python, Go, Bash, PowerShell, or Java).
- Understanding of CI/CD pipelines and automation practices.
- Hands-on experience with Docker and Kubernetes.
- Experience with monitoring tools such as Grafana or Power BI.
- Experience with ServiceNow & JIRA (incident/change/problem workflows).
- Working knowledge of Confluence for technical documentation.