
Staff Site Reliability Engineer
Blink Health
21 days ago
Remote, Worldwide
Staff+
H1B Sponsor
Responsibilities
- Establish and evolve SRE best practices across the organization.
- Define and drive observability strategy for system health and performance.
- Design and implement software-driven solutions within the infrastructure domain.
- Act as a technical leader and force multiplier across core cloud infrastructure.
- Take ownership of large, ambiguous initiatives from concept to delivery.
- Combine knowledge of software development, infrastructure, and security to improve platform resilience.
- Proactively identify systemic risks and recommend platform upgrades.
- Partner with engineering teams to improve developer workflows and operational maturity.
- Provide technical mentorship and high-quality design and code reviews.
- Lead by example in documentation and knowledge sharing.
- Participate in and help mature incident response and post-incident learning.
Requirements
- Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.
- 7+ years of experience in site reliability engineering, infrastructure engineering, or platform engineering roles.
- Expert-level troubleshooting across the entire stack from application to kernel to network.
- Strong command-line proficiency and deep expertise in Linux systems.
- Advanced understanding of networking concepts including load balancing and service-to-service communication.
- Experience working across multiple programming languages such as Python, Go, and Bash.
- Strong track record of automating operational work to reduce toil.
- Deep experience with cloud platforms, preferably AWS.
- Strong expertise in Kubernetes and container orchestration.
- Experience designing and maintaining company-wide Infrastructure as Code codebases.
Tech Stack
AnsibleAWSAzureBashGoGoogle Cloud PlatformHelmKubernetesLinuxPythonReactTerraform
Categories
DevOpsSecurity