about 5 hours ago
Boston, MA, USA or San Francisco, CA, USASenior / Mid Level
Responsibilities
- Design and maintain Pulumi modules for reliable cloud resource provisioning.
- Own infrastructure end-to-end without using consoles.
- Instrument systems for quick failure detection and data-driven debugging.
- Build observability into systems to proactively identify problems.
- Automate deployments, scaling, and backups to reduce manual tasks.
- Partner with product engineering to design resilient services and optimize deployment pipelines.
Requirements
- 3 to 5+ years of experience with distributed systems in AWS.
- Strong skills in Python and Infrastructure as Code using Pulumi or Terraform.
- Experience with frontier AI models like Claude, Codex, or Gemini.
- Hands-on experience with monitoring tools such as Prometheus or Datadog.
- Proven ability to debug production issues under pressure.
- Strong documentation habits for team clarity and system stability.
- Ability to explain complex technical issues to non-technical stakeholders.
