Senior Cloud Site Reliability Engineer
NICE
3 months ago
Pune, India
Senior
Responsibilities
- Act as part of a team of SREs managing production and developing reliability improvements.
- Lead investigations into root cause outages, performance, and cost issues.
- Develop automation for low-value tasks while balancing project delivery demands.
- Provide technical leadership to Cloud Operations and Support teams.
- Collaborate with DevOps and engineering teams to establish and enforce SLOs, SLAs, and error budgets.
- Develop and configure monitoring dashboards and alerts using tools like Grafana and Azure Monitor.
- Install and configure observability platforms including Grafana, Prometheus, and Azure Monitor.
- Develop Bicep modules for monitoring infrastructure.
- Optimize system performance, cost, and security through regular reviews and tuning.
Requirements
- Must have 5+ years of experience in Site Reliability Engineering.
- Excellent technical, analytical, and troubleshooting skills.
- In-depth knowledge of databases and data handling (MS-SQL, Elasticsearch, YML, JSON, XML).
- Significant experience in programming or advanced scripting (Python, PowerShell, C#).
- Experience with infrastructure/configuration as code and version control (ARM, BICEP, Git).
- Strong experience managing monitoring, alerting, and dashboarding platforms (Azure Monitor, Prometheus, Grafana).
- Demonstrable experience supporting live cloud services and platforms.
- Expertise in developing queries for dashboards and alerting for microservices.
- Expertise in developing custom metrics for microservices.
- Production experience with Kubernetes and containerization.
- Exposure to commercial cloud providers (Ideally Azure, others considered).
- Exposure to Azure DevOps pipelines is desirable (CI/CD).
- Exposure to test frameworks is desirable (NUnit, Jasmine, Selenium).
- Strong experience in infrastructure as code, design, and implementation strategies.
- Efficient, effective, and respectful communication skills with customers and internal departments.
Benefits
- Join a global company with endless internal career opportunities.
- Work in a fast-paced, collaborative, and creative environment.
- Enjoy the NiCE-FLEX hybrid model with 2 days in-office and 3 days remote work.
Tech Stack
AzureC#ElasticsearchGitGrafanaKubernetesPowerShellPrometheusPython
Categories
Data EngineeringDevOpsSecurity