GrepJob
Edison Scientific

Principal Member of Technical Staff, Platform Infrastructure

Edison Scientific
Apply
about 3 hours ago

Base Salary

$200k - $350k/yr

Responsibilities

  • Architect, implement, and operate Kubernetes clusters for high availability and efficient resource utilization.
  • Design and develop custom resource definitions and Kubernetes operators for AI agent lifecycles and research pipelines.
  • Drive strategies for cluster scaling, node pool management, and autoscaling policies.
  • Build and maintain infrastructure-as-code for reproducible environment management.
  • Design robust scheduling and placement strategies for heterogeneous workloads.
  • Establish best practices for observability, monitoring, and incident response.
  • Own storage and networking strategy within Kubernetes.
  • Troubleshoot complex infrastructure issues in distributed environments.
  • Collaborate with backend, ML, and research teams to understand workload requirements.

Requirements

  • 10+ years of professional infrastructure or platform engineering experience.
  • Deep hands-on Kubernetes expertise in production environments.
  • Experience designing and implementing custom resource definitions and Kubernetes operators.
  • Track record of operating and scaling Kubernetes clusters with stateful workloads.
  • Deep understanding of Kubernetes internals and behavior at scale.
  • Expertise with cloud infrastructure (AWS EKS, GCP GKE, or Azure AKS).
  • Proficiency in at least one systems or backend language for operator development.
  • Hands-on experience with infrastructure-as-code tools and GitOps workflows.
  • Strong knowledge of container networking, storage, and security.

Benefits

  • Competitive salary and equity.
  • Full healthcare coverage for you and your dependents.
  • Support for growing families, including a yearly new parent stipend.
  • 401(k) company matching.
  • $300 health and wellness benefit.
  • Daily lunch and dinner for late workdays.
  • Regular team offsites and company events.
  • A fast-moving, mission-driven culture.

Tech Stack

DatadogGrafanaKubernetesPrometheusTerraform

Categories