Site Reliability Engineer
Dropbox
about 1 month ago
Remote, Mexico
Senior / Staff+
H1B Sponsor
Responsibilities
- Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services.
- Collaborate with cross-functional teams to develop and maintain best practices for monitoring, logging, and incident response.
- Build, implement, and maintain automations and infrastructure-as-code tooling, specifically Terraform, Ansible, and GitHub Actions.
- Utilize container orchestration platforms, such as Kubernetes, Amazon ECS, and Red Hat Openshift, to manage containers at scale.
- Manage and optimize monitoring and logging pipelines using tools like Datadog and Cribl LogStream.
- Drive improvement projects related to service health and visibility for stakeholders.
- Develop and maintain custom tooling and automation scripts in Bash, Python, and other scripting languages.
- Participate in on-call work to address bugs, outages, or operational issues.
Requirements
- 5+ years of experience in site reliability engineering or similar engineering roles with hands-on coding experience.
- Strong knowledge of AWS services, including EC2, S3, RDS, R53, Lambda, and others.
- Strong knowledge of Linux administration, internals, filesystems, and specific distributions such as Ubuntu and RHEL.
- Experience with monitoring and logging tools, Datadog, and logging pipeline tools such as Vector or Cribl LogStream.
- Experience driving transformational programs related to metrics and observability.
- Experience with scripting in a higher-level language, preferably Python.
- Experience developing automation to solve infrastructure-related tasks with tools such as Chef, Ansible, or Terraform.
- Experience with log analysis and building metrics, alerts, and visuals from log data.
- Strong proficiency in infrastructure-as-code tools, such as Terraform.
- Strong proficiency in configuration management tools, specifically Ansible Automation Platform and Chef.
- Experience with containerization technologies, such as Docker, and container orchestration platforms like Kubernetes or Amazon ECS.
- Knowledge of LDAP, REST APIs, and current authentication methods.
- Familiarity with GitHub and Git-based workflows.
- Understanding of RDS databases and network security technologies, such as WAF.
- Strong problem-solving skills and the ability to work well in a fast-paced, collaborative environment.
- Excellent written and verbal communication skills.
Tech Stack
AnsibleAWSBashChefDatadogDockerGitHub ActionsKubernetesLinuxOpenShiftPythonTerraform
Categories
DevOpsSecurity