
Principal Member of Technical Staff, Platform Infrastructure
Edison Scientificabout 3 hours ago
Base Salary
$200k - $350k/yr
Responsibilities
- Architect, implement, and operate Kubernetes clusters for high availability and efficient resource utilization.
- Design and develop custom resource definitions and Kubernetes operators for AI agent lifecycles and research pipelines.
- Drive strategies for cluster scaling, node pool management, and autoscaling policies.
- Build and maintain infrastructure-as-code for reproducible environment management.
- Design robust scheduling and placement strategies for heterogeneous workloads.
- Establish best practices for observability, monitoring, and incident response.
- Own storage and networking strategy within Kubernetes.
- Troubleshoot complex infrastructure issues in distributed environments.
- Collaborate with backend, ML, and research teams to understand workload requirements.
Requirements
- 10+ years of professional infrastructure or platform engineering experience.
- Deep hands-on Kubernetes expertise in production environments.
- Experience designing and implementing custom resource definitions and Kubernetes operators.
- Track record of operating and scaling Kubernetes clusters with stateful workloads.
- Deep understanding of Kubernetes internals and behavior at scale.
- Expertise with cloud infrastructure (AWS EKS, GCP GKE, or Azure AKS).
- Proficiency in at least one systems or backend language for operator development.
- Hands-on experience with infrastructure-as-code tools and GitOps workflows.
- Strong knowledge of container networking, storage, and security.
Benefits
- Competitive salary and equity.
- Full healthcare coverage for you and your dependents.
- Support for growing families, including a yearly new parent stipend.
- 401(k) company matching.
- $300 health and wellness benefit.
- Daily lunch and dinner for late workdays.
- Regular team offsites and company events.
- A fast-moving, mission-driven culture.