about 3 hours ago
Base Salary
$186k - $219k/yr
Responsibilities
- Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services.
- Build automation and tooling to streamline operational IT workflows.
- Partner with the Infrastructure team to extend CI/CD frameworks.
- Strengthen observability and documentation standards across IT engineering.
- Develop full-stack applications that power internal AI products and infrastructure.
Requirements
- 5+ years of experience automating and supporting cloud infrastructure (AWS).
- Proven experience deploying and managing containerized workloads using Docker and Kubernetes.
- Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go).
- Track record of leading incident response in environments with strict SLAs.
- Utilizes generative AI responsibly, maintaining human oversight.