about 2 hours ago
Bengaluru, India
Intern
H1B Sponsor
Responsibilities
- Assist in implementing and maintaining monitoring and alerting systems using cloud monitoring services.
- Respond and remediate operational issues impacting performance or availability.
- Configure Prometheus to scrape and store metrics from cloud resources.
- Develop dashboards and alerts in Grafana for proactive incident detection.
- Collaborate with teams to identify key reliability and performance indicators.
- Support incident response by tuning alert thresholds and diagnosing alerts.
- Document monitoring procedures and best practices.
- Learn and apply SRE principles focused on reliability, scalability, and automation.
Requirements
- Currently pursuing or recently completed a degree in Computer Science, Engineering, or related field.
- Basic understanding of cloud computing concepts (AWS, Azure, or GCP).
- Familiarity with monitoring and observability tools like Prometheus and Grafana is a plus.
- Knowledge of scripting or programming languages (e.g., Python, Bash) is desirable.
- Strong problem-solving skills and eagerness to learn.
- Good communication skills and ability to work collaboratively.
Benefits
- Mentorship from experienced SRE professionals.
- Hands-on exposure to cutting edge technology.
- Opportunity to contribute to meaningful projects with a direct impact on system reliability.
- Flexible work arrangements.
- Potential for full-time opportunities after successful completion.
Tech Stack
AWSAzureBashGoogle Cloud PlatformGrafanaPrometheusPython
Categories
DevOpsSecurity
