about 3 hours ago
New York, NY, USA +2 more
Staff+
H1B Sponsor
Base Salary
$320k - $4050k/yr
Responsibilities
- Own the technical strategy and roadmap for agent-driven cluster lifecycle management.
- Partner across teams to ensure new compute capacity is ingested on time.
- Align with partner teams on physical build-out and leverage cloud solutions for high-bandwidth connectivity.
- Collaborate with security owners to ensure clusters are provisioned secure-by-default.
- Define and drive strategy on cluster scalability, homogeneity, and fault tolerance.
- Work closely with cloud providers and internal teams to shape long-term infrastructure strategy.
- Establish and evolve operational-excellence practices including incident response and postmortem culture.
- Support the growth of engineers through technical mentorship and coaching.
Requirements
- Deep expertise in distributed systems, reliability, and cloud platforms.
- Strong proficiency in at least one systems language and IaC proficiency with Terraform.
- Track record of leading complex, multi-quarter technical initiatives.
- Ability to build alignment across senior stakeholders and communicate effectively.
Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Collaborative office space.
Tech Stack
AmbassadorAWSAzureGoGoogle Cloud PlatformIstioKubernetesPythonRustTerraform
Categories
AI & MLDevOpsSecurity