Sr. Site Reliability Engineer - 11293
Coupa Software14 days ago
Mexico City, Mexico
Senior / Staff+
H1B Sponsor
Responsibilities
- Administer Linux machines, web servers, application servers, and databases.
- Provide application support for Java and Ruby applications.
- Own end-to-end availability and performance of mission-critical services.
- Develop tools and automation to enhance availability and performance.
- Ensure reliability, fault-tolerance, and cost-effectiveness of data and services.
- Collaborate with product and release engineering for new product releases.
- Coordinate incident, problem, and change management.
- Participate in on-call rotation for after-hours emergencies.
Requirements
- Bachelor's Degree with 8+ years of experience in large scale production systems.
- Experience with AWS or comparable cloud providers, with certification.
- Experience in designing and migrating services to cloud environments.
- Hands-on experience with Terraform and configuration management tools.
- Experience in application support/development on Java or Ruby.
- Scripting experience with Python or Bash.
- Excellent knowledge of large scale web applications and distributed systems.
- Experience with Kubernetes, Docker, and cloud deployment technologies.
- Familiarity with observability tools like NewRelic or Datadog.
- Strong problem-solving skills and ability to analyze global scale systems.
- Excellent written and verbal communication skills.
- Critical thinking and a drive for continuous improvement.
Tech Stack
AnsibleAWSBashChefDatadogDockerJavaKubernetesLinuxPythonRubyTerraform
Categories
BackendDevOps