about 2 hours ago
Base Salary
$218k - $257k/yr
Responsibilities
- Build and launch reliability projects and features to improve resiliency across the service environment.
- Work closely with critical T0/T1 services to enhance scalability and reliability.
- Build and enhance systems for secure management of service configurations and secrets.
- Improve canary-based release systems for safer deployments.
- Expand deployment capabilities to support thousands of services and hundreds of daily deployments.
- Promote reliability best practices and strengthen reliability culture across teams.
Requirements
- 7+ years of software engineering experience.
- Experience designing, building, scaling, and maintaining production services.
- Strong system design and coding skills with a track record of high-quality code.
- Strong observability, debugging, and performance tuning skills.
- Excellent written and verbal communication skills.
- Sound judgment under pressure and willingness to debug any layer of the stack.
- Ability to participate in an on-call rotation and respond to issues outside normal business hours.
- Experience building reliable, high-throughput, low-latency systems.
- Familiarity with observability tools like Kibana and Datadog.
- Experience with Ruby, Go, Terraform, and cloud platforms such as AWS, GCP, or Azure.
- Utilizes generative AI responsibly with human oversight.