Sr. Site Reliability Engineer I

2 months ago

Manhattan, NY, USASenior / Mid Level

H1B Sponsor

Base Salary

$89k - $178k/yr

Responsibilities

Build and maintain the reliability, scalability, and performance of digital media measurement platforms.
Implement observability best practices for proactive reliability improvements.
Reduce MTTR for critical incidents through automation and improved monitoring.
Respond to incidents and manage Sev1/Sev2 situations.
Monitor and maintain high availability infrastructure across various environments.
Lead technical projects from planning through deployment.
Build and deploy automations to improve operational efficiency.
Leverage AI-assisted development tools for automation and problem resolution.
Implement Infrastructure-as-Code using Terraform and other tools.
Create and maintain documentation and runbooks for consistent incident response.
Participate in on-call rotations and post-incident reviews.

4+ years in Site Reliability Engineering, DevOps, or related operational roles.
Proficiency in Linux/Unix systems administration and scripting languages like Python, Bash, or Go.
Strong experience with cloud infrastructure across GCP, AWS, and OCI.
Expertise in monitoring and observability tools such as Prometheus and Grafana.
Hands-on experience with Infrastructure-as-Code tools like Terraform and Ansible.
Proven ability to develop and track SLIs, SLOs, and SLAs.

AnsibleAWSBashGo Google Cloud PlatformGrafanaHelmKubernetesMongoDBNagiosPrometheusPythonSnowflakeSplunkSQL Terraform