about 1 month ago
Boston, MA, USA or San Francisco, CA, USASenior / Mid Level
Responsibilities
- Design and maintain Pulumi modules for reliable cloud resource provisioning.
- Own infrastructure end-to-end without using console interfaces.
- Instrument systems for quick failure detection and data-driven debugging.
- Build observability into systems to proactively identify issues.
- Automate deployments, scaling, and backups to reduce manual tasks.
- Collaborate with product engineering to design resilient services and optimize deployment pipelines.
Requirements
- 3 to 5+ years of experience in building and operating distributed systems in AWS.
- Strong skills in Python and Infrastructure as Code using Pulumi or Terraform.
- Experience with frontier AI models such as Claude, Codex, or Gemini.
- Hands-on experience with monitoring tools like Prometheus or Datadog.
- Proven ability to debug production issues under pressure.
- Strong documentation habits for team clarity and system stability.
- Ability to explain complex technical issues to non-technical stakeholders.
