about 1 month ago
Responsibilities
- Triage difficult technical problems and implement solutions.
- Improve our observability stack including monitoring, logging, and profiling.
- Respond to and resolve incidents in a timely manner, conducting post-incident reviews.
- Collaborate with development teams to ensure reliability and scalability in new features.
- Monitor system capacity and performance, making recommendations for future growth.
Requirements
- 5+ years of experience in Site Reliability Engineering or similar roles.
- 5+ years of experience with multi-terabyte scale PostgreSQL clusters.
- Proven track record of managing large-scale, high-availability PostgreSQL databases.
- Experience designing and implementing SLIs, SLOs, and SLAs.
- Experience troubleshooting PostgreSQL performance problems and slow queries.
- Extensive experience with efficient schema and query design.
- Experience migrating multi-terabyte tables into efficient schemas.
- Proficient with Go, Prometheus, and Linux.
- Knowledgeable in trading/fintech domains.
- Experience with low-latency systems and distributed tracing.
- Experience scaling PostgreSQL clusters rapidly.
- Experience with pgx, gorm, or sqlc.
Benefits
- Competitive Salary & Stock Options.
- Health Benefits.
- New Hire Home-Office Setup: One-time USD $500.
- Monthly Stipend: USD $150 per month via a Brex Card.
