Staff Site Reliability Engineer, Database

3 months ago

Remote, United StatesStaff+

H1B Sponsor

Responsibilities

Triage difficult technical problems and implement solutions.
Improve our observability stack including monitoring, logging, and profiling.
Respond to and resolve incidents in a timely manner, conducting post-incident reviews.
Collaborate with development teams to ensure reliability and scalability in new features.
Monitor system capacity and performance, making recommendations for future growth.

5+ years of experience in Site Reliability Engineering or similar roles.
5+ years of experience with multi-terabyte scale PostgreSQL clusters.
Proven track record of managing large-scale, high-availability PostgreSQL databases.
Experience designing and implementing SLIs, SLOs, and SLAs.
Experience troubleshooting PostgreSQL performance problems and slow queries.
Extensive experience with efficient schema and query design.
Experience migrating multi-terabyte tables into efficient schemas.
Proficient with Go, Prometheus, and Linux.
Knowledgeable in trading/fintech domains.
Experience with low-latency systems and distributed tracing.
Experience scaling PostgreSQL clusters rapidly.
Experience with pgx, gorm, or sqlc.