GrepJob
PsiQuantum

Senior Software Platform Engineer

PsiQuantum
Apply
6 days ago
Remote, Worldwide or Palo Alto, CA, USASenior
H1B Sponsor

Responsibilities

  • Own AWS infrastructure end-to-end and actively shape its evolution.
  • Reduce friction in the deployment pipeline for developers.
  • Harden systems by securing IAM roles, container images, and authentication flows.
  • Implement monitoring and alerting to catch production issues proactively.
  • Make deployments faster, easier to roll back, and less prone to failure.
  • Lead incident response and post-mortems as necessary.
  • Make GPU clusters invisible to researchers and manage CUDA compatibility.
  • Build standardized SLURM job submission workflows for researchers.
  • Package and containerize Python simulation code for reproducibility.
  • Monitor job health across utilization, cost, and runtime efficiency.

Requirements

  • 5+ years of experience in Platform Engineering, DevOps, or SRE roles.
  • Production AWS experience with ECS/EKS and multi-account networking.
  • Proficient in Infrastructure as Code, particularly Terraform or Pulumi/CDK.
  • Experience improving CI/CD pipelines in production environments.
  • Supported GPU workloads in production, including code optimization and job scheduler setup.

Tech Stack

AWSGitHub ActionsGitLab CI/CDPythonTerraform

Categories

AI & MLData EngineeringDevOpsSecurity