Postman

Member of Technical Staff, AI Reliability & Monitoring Engineering Lead

Postman

Apply
5 months ago
San Francisco, CA, USA
Mid Level / Senior / Staff+
H1B Sponsor

Base Salary

$256k - $276k/yr

Responsibilities

  • Develop and manage reliability metrics (SLOs) for AI-driven API services.
  • Implement comprehensive observability and monitoring systems for real-time performance.
  • Design automated failover, recovery, and incident response strategies.
  • Optimize resource utilization, particularly GPU/accelerator efficiency.
  • Collaborate with engineering, platform, and product teams on reliability efforts.
  • Lead the development of internal tooling and automation for AI system stability.
  • Drive continuous improvement in deployment practices and incident management.

Requirements

  • Strong background in AI reliability engineering, SRE, or DevOps for distributed systems.
  • Understanding of challenges in maintaining large-scale AI systems.
  • Experience with cloud platforms, monitoring tools, and incident response automation.
  • Ability to collaborate across teams to influence best practices.
  • Comfortable in dynamic, fast-paced environments focused on reliable AI services.

Benefits

  • Comprehensive medical coverage.
  • Flexible PTO and wellness reimbursement.
  • Monthly lunch stipend.
  • Hybrid work model with in-office collaboration.
  • Frequent team-building events and donation-matching program.

Categories

AI & MLDevOps