Blink Health

Staff Site Reliability Engineer

Blink Health

Apply
21 days ago
Remote, Worldwide
Staff+
H1B Sponsor

Responsibilities

  • Establish and evolve SRE best practices across the organization.
  • Define and drive observability strategy for system health and performance.
  • Design and implement software-driven solutions within the infrastructure domain.
  • Act as a technical leader and force multiplier across core cloud infrastructure.
  • Take ownership of large, ambiguous initiatives from concept to delivery.
  • Combine knowledge of software development, infrastructure, and security to improve platform resilience.
  • Proactively identify systemic risks and recommend platform upgrades.
  • Partner with engineering teams to improve developer workflows and operational maturity.
  • Provide technical mentorship and high-quality design and code reviews.
  • Lead by example in documentation and knowledge sharing.
  • Participate in and help mature incident response and post-incident learning.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.
  • 7+ years of experience in site reliability engineering, infrastructure engineering, or platform engineering roles.
  • Expert-level troubleshooting across the entire stack from application to kernel to network.
  • Strong command-line proficiency and deep expertise in Linux systems.
  • Advanced understanding of networking concepts including load balancing and service-to-service communication.
  • Experience working across multiple programming languages such as Python, Go, and Bash.
  • Strong track record of automating operational work to reduce toil.
  • Deep experience with cloud platforms, preferably AWS.
  • Strong expertise in Kubernetes and container orchestration.
  • Experience designing and maintaining company-wide Infrastructure as Code codebases.

Tech Stack

AnsibleAWSAzureBashGoGoogle Cloud PlatformHelmKubernetesLinuxPythonReactTerraform

Categories

DevOpsSecurity