Senior Site Reliability Engineer

6 months ago

H1B Sponsor

Base Salary

$210k - $240k/yr

Responsibilities

Design, build, and maintain scalable infrastructure for real-time analytics and machine learning workloads.
Improve system reliability and performance through automation and observability.
Own and evolve CI/CD pipelines, deployment automation, and config management.
Implement and maintain monitoring, alerting, and incident response processes.
Collaborate with engineering and data science teams to promote performance and reliability.
Ensure security, compliance, and operational readiness of cloud infrastructure.
Drive post-incident analysis and continuous improvement initiatives.

8+ years of experience in SRE, DevOps, or infrastructure engineering roles.
5+ years of experience with datacenter operations or system and network administration.
Experience with containerization (Docker) and orchestration (Kubernetes).
Strong knowledge of Linux systems, networking, and performance tuning.
Solid understanding of infrastructure-as-code tools like Terraform and Ansible.
Good programming skills in languages such as Terraform, Ansible, Bash, or Python.
Experience with monitoring and observability stacks like Prometheus or Grafana.
Proficiency with CI/CD tools and pipelines such as GitHub Actions.

Ownership of mission-critical infrastructure in a company solving real-world problems.
A front-row seat to a high-performance engineering culture.
The ability to influence platform scaling from deployment to incident management.
An environment that values curiosity, accountability, and impact.

AnsibleApache AirflowApache Kafka Apache Spark AWSBashDatadogDockerGitHub ActionsGrafanaKubernetes LinuxPrometheusPython Terraform