Dropbox

Site Reliability Engineer

Dropbox

Apply
about 1 month ago
Remote, Mexico
Senior / Staff+
H1B Sponsor

Responsibilities

  • Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services.
  • Collaborate with cross-functional teams to develop and maintain best practices for monitoring, logging, and incident response.
  • Build, implement, and maintain automations and infrastructure-as-code tooling, specifically Terraform, Ansible, and GitHub Actions.
  • Utilize container orchestration platforms, such as Kubernetes, Amazon ECS, and Red Hat Openshift, to manage containers at scale.
  • Manage and optimize monitoring and logging pipelines using tools like Datadog and Cribl LogStream.
  • Drive improvement projects related to service health and visibility for stakeholders.
  • Develop and maintain custom tooling and automation scripts in Bash, Python, and other scripting languages.
  • Participate in on-call work to address bugs, outages, or operational issues.

Requirements

  • 5+ years of experience in site reliability engineering or similar engineering roles with hands-on coding experience.
  • Strong knowledge of AWS services, including EC2, S3, RDS, R53, Lambda, and others.
  • Strong knowledge of Linux administration, internals, filesystems, and specific distributions such as Ubuntu and RHEL.
  • Experience with monitoring and logging tools, Datadog, and logging pipeline tools such as Vector or Cribl LogStream.
  • Experience driving transformational programs related to metrics and observability.
  • Experience with scripting in a higher-level language, preferably Python.
  • Experience developing automation to solve infrastructure-related tasks with tools such as Chef, Ansible, or Terraform.
  • Experience with log analysis and building metrics, alerts, and visuals from log data.
  • Strong proficiency in infrastructure-as-code tools, such as Terraform.
  • Strong proficiency in configuration management tools, specifically Ansible Automation Platform and Chef.
  • Experience with containerization technologies, such as Docker, and container orchestration platforms like Kubernetes or Amazon ECS.
  • Knowledge of LDAP, REST APIs, and current authentication methods.
  • Familiarity with GitHub and Git-based workflows.
  • Understanding of RDS databases and network security technologies, such as WAF.
  • Strong problem-solving skills and the ability to work well in a fast-paced, collaborative environment.
  • Excellent written and verbal communication skills.

Tech Stack

AnsibleAWSBashChefDatadogDockerGitHub ActionsKubernetesLinuxOpenShiftPythonTerraform

Categories

DevOpsSecurity