1 day ago
Atlanta, GA, USA
Mid Level / Senior
H1B Sponsor
Responsibilities
- Act as a primary escalation point for critical production application/product issues.
- Rapidly troubleshoot complex problems across the application stack using observability tools.
- Coordinate with development, infrastructure, and technical teams during incidents.
- Communicate incident status, impact, and resolution steps to stakeholders.
- Improve monitoring tools and alerting mechanisms for proactive issue detection.
- Monitor application and system health to ensure high availability.
- Implement automation tools/scripts to streamline operational tasks.
- Conduct system tests to validate performance and reliability.
- Recommend design and process enhancements for application reliability.
- Participate in post major incident reviews to analyze disruptions.
- Contribute to a culture of learning from incidents.
- Participate in a 24x7 on-call rotation for critical issues.
Requirements
- 3 - 5 years experience in SRE/DevOps/Tier 3.
- Strong troubleshooting skills with a systematic problem-solving approach.
- Extensive experience resolving critical incidents in production environments.
- Proficiency in Linux and operational scripting (Bash, Powershell, Python).
- Experience with database querying and automated configuration management.
- Familiarity with cloud platforms and container orchestration.
- Understanding of application environments for troubleshooting.
- Excellent verbal and written communication skills.
- Strong analytical skills and ability to manage multiple tasks.
- Experience with incident management processes.
Benefits
- Flex working arrangements.
- Home office reimbursement program.
- Baby bonus and parental leave top-up program.
- Online learning and networking opportunities.
- Electric vehicle purchase incentive program.
- Competitive medical and dental benefits.
- Retirement savings program.
Tech Stack
AnsibleApache SupersetArgo CDAWSAzureBashC#Google BigQueryGoogle Cloud PlatformGrafanaKubernetesLinux.NETPostgreSQLPowerShellPrometheusPython
Categories
DevOps
