DevOps Engineer

5 months ago

Responsibilities

Improve the reliability, availability, and operational health of production systems.
Set observability standards across services including metrics, logs, and errors.
Establish SLOs/SLIs, alerting, and on-call readiness with a focus on signal quality.
Collaborate with engineers to design resilient systems and reduce operational risk early.
Build internal tooling to enhance system safety, debugging, and developer velocity.
Manage infrastructure using Pulumi across GCP, AWS, and Firebase.

5+ years of SRE, DevOps, or production operations experience.
At least 2 years of TypeScript web app development experience.
Proven experience operating and scaling production systems with uptime and latency goals.
Strong hands-on experience with observability stacks like Datadog or Sentry.
Experience defining SLOs/SLIs and building effective alerting strategies.
Proficiency with CI/CD systems and infrastructure-as-code.
Experience with cloud-native and serverless platforms such as GCP and AWS.
Strong cross-system debugging and incident response skills.