1 day ago
Responsibilities
- Support the development and maintenance of observability tools, including monitoring, logging, and tracing solutions.
- Deploy and manage platforms such as Azure Application Insights, New Relic, Prometheus, and Grafana.
- Work with OpenTelemetry to collect and export telemetry data and enable distributed tracing.
- Assist with implementing structured logging and optimizing metrics collection.
- Help build and refine alerting mechanisms to improve Mean Time to Resolution (MTTR).
- Contribute to scalable monitoring solutions in AWS and Azure cloud environments.
- Collaborate with DevOps and SRE teams to integrate observability best practices into CI/CD pipelines.
- Document runbooks, dashboards, and best practice guides.
Requirements
- 5+ years of experience in observability, monitoring, or SRE roles.
- Hands-on proficiency with Azure AppInsights, New Relic, Prometheus, and Grafana.
- Experience with OpenTelemetry and distributed tracing.
- Proficiency in Kubernetes/container monitoring and infrastructure automation.
- Scripting skills for automation in Python, Go, Bash, or PowerShell.
- Strong understanding of SLIs, SLOs, and incident management frameworks.
- Excellent communication and collaboration skills.
Benefits
- Friendly flexible working model to support work-life balance.
- Competitive compensation and total rewards including health and wellness plans.
- Opportunity to collaborate with global, diverse teams.
- Access to best-in-class learning tools and development programs.
- Commitment to equity and belonging within the workplace.
