about 1 month ago
Base Salary
$130k - $200k/yr
Responsibilities
- Design and maintain scalable, secure, and resilient infrastructure for Gradial’s AI platform.
- Lead Kubernetes cluster management, CI/CD pipelines, and observability tooling.
- Anticipate scaling needs and evolve infrastructure architecture proactively.
- Take ownership of real-time, compute-intensive services with minimal oversight.
- Establish best practices for system reliability, performance monitoring, and disaster recovery.
- Evaluate and implement infrastructure automation tools to enhance deployment velocity.
- Act as a strategic voice on infrastructure investment and long-term scalability planning.
Requirements
- 5+ years of experience in DevOps, SRE, or platform engineering roles.
- Proven track record in designing and operating large-scale, production-grade infrastructure.
- Deep expertise in Kubernetes and cloud-native architecture.
- Proficiency with infrastructure-as-code tools like Terraform and CI/CD tooling.
- Experience in high-growth environments, especially scaling infrastructure.
- Strong communication skills and a collaborative mindset.
Benefits
- Meaningful equity and competitive salary.
- Comprehensive health, dental, and vision coverage.
- Fast-paced environment with autonomy and ownership.
- Real impact with zero bureaucracy.
- A front-row seat to building category-defining AI infrastructure.
