about 20 hours ago
Responsibilities
- Define and operate against SLOs and SLIs with product and engineering partners.
- Build the observability layer to improve metrics, logs, and alerting.
- Lead incident response and drive blameless postmortems.
- Reduce operational toil through automation and better tooling.
- Stress-test designs for resilience against failures and traffic spikes.
- Improve release safety through progressive delivery and feature flags.
- Enhance developer experience by addressing operational friction.
- Collaborate with product, platform, security, and compliance teams.
- Mentor engineers and shape technical standards across the squad.
Requirements
- Proven experience owning services in production and managing their performance.
- Experience defining and operating against SLOs/SLIs and using error budgets.
- Experience leading incident response and writing effective postmortems.
- Hands-on experience with observability tooling for diagnosing production issues.
- Deep system design experience in distributed services and API design.
- Significant experience in building and operating production software systems.
- Comfort with modern cloud environments and CI/CD pipelines.
- Demonstrated technical leadership and mentoring capabilities.
- Ability to balance high reliability standards with fast-paced development.
Benefits
- Opportunity to earn equity at Nu.
- Comprehensive medical, dental, and vision insurance.
- Life insurance and AD&D coverage.
- Extended maternity and paternity leaves.
- Access to a learning platform and language learning program.
- Mental health and wellness assistance program.
- 401K and saving plans including Health Saving Account.
- Work-from-home allowance.
- Relocation assistance package if applicable.