1 day ago
Responsibilities
- Design, write, and ship production-grade code to fix bugs and improve performance.
- Tackle complex coding challenges in live services.
- Identify and resolve application issues across the stack.
- Profile APIs and services to identify bottlenecks.
- Design and evaluate infrastructure solutions with clear documentation.
- Collaborate with US-based engineering teams during overlap hours.
- Communicate updates and technical context clearly.
- Build tools and automation for traffic simulation and validation.
- Improve observability through logs, metrics, and alerts.
- Define and measure SLIs, SLOs, and SLAs.
- Contribute to incident response runbooks and operational processes.
- Participate in incident response and root cause analysis.
- Support cost optimization efforts.
- Research new monitoring technologies for system reliability.
Requirements
- 7–8+ years of software development experience in backend or SRE roles.
- Strong coding skills in Python, Go, or similar languages.
- Proven ability to write and debug production code.
- Deep understanding of PostgreSQL or similar relational databases.
- Hands-on experience with distributed systems, Docker, and Kubernetes.
- Experience with observability and monitoring platforms.
- Strong written and verbal communication skills.
- Comfortable working with US-based counterparts during overlap hours.
- Demonstrates a structured execution approach to problem-solving.
- Knowledge of SLIs, SLOs, and SLAs.
- Familiarity with cloud platforms, preferably AWS.
- Comfortable working across unfamiliar codebases.
- Strong analytical and collaboration skills.
- Prior experience in IaC or performance engineering is a plus.
Benefits
- Employer paid group health insurance for you and your dependents.
- 401(k) plan with employer match.
- Flexible paid time off.
- Regular company-wide in-person events.
- Home office stipend.
