1 day ago
Bengaluru, IndiaSenior / Staff+
H1B Sponsor
Responsibilities
- Design, implement, and maintain observability solutions for datacenter infrastructure.
- Develop, deploy, and operate large-scale observability and telemetry platforms.
- Own and contribute to the full lifecycle of observability services.
- Build and enhance monitoring systems for high availability and performance.
- Create and manage dashboards, alerts, and reports for system health visibility.
- Apply SRE principles to improve reliability and operational efficiency.
- Develop and maintain automation for infrastructure provisioning and management.
- Lead root cause analysis and post-incident reviews.
- Analyze system performance to identify bottlenecks and improvement areas.
- Partner with cross-functional teams to deliver effective observability solutions.
- Ensure solutions adhere to security policies and industry standards.
- Provide hands-on support for observability and reliability issues.
- Continuously enhance the scalability and operational efficiency of services.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 12+ years of progressive software engineering experience.
- Proven experience in managing and optimizing large-scale datacenter environments.
- Strong proficiency in Go or Python with a deep understanding of networked systems.
- Expert-level knowledge of Kubernetes internals and containerization ecosystems.
- Proven experience with load balancing and service mesh at scale.
- Proficiency in observability tools like Prometheus and Grafana.
- Experience with SRE practices and tools such as Kubernetes and Terraform.
- Familiarity with cloud platforms like AWS, Azure, or GCP.
Benefits
- Hybrid work model allowing flexibility to work from home 2 days a week.
- Collaborative culture that enriches employee experience.