GrepJob
Alpaca

Staff Site Reliability Engineer, Database

Alpaca
Apply
about 1 month ago
Remote, United StatesStaff+
H1B Sponsor

Responsibilities

  • Triage difficult technical problems and implement solutions.
  • Improve our observability stack including monitoring, logging, and profiling.
  • Respond to and resolve incidents in a timely manner, conducting post-incident reviews.
  • Collaborate with development teams to ensure reliability and scalability in new features.
  • Monitor system capacity and performance, making recommendations for future growth.

Requirements

  • 5+ years of experience in Site Reliability Engineering or similar roles.
  • 5+ years of experience with multi-terabyte scale PostgreSQL clusters.
  • Proven track record of managing large-scale, high-availability PostgreSQL databases.
  • Experience designing and implementing SLIs, SLOs, and SLAs.
  • Experience troubleshooting PostgreSQL performance problems and slow queries.
  • Extensive experience with efficient schema and query design.
  • Experience migrating multi-terabyte tables into efficient schemas.
  • Proficient with Go, Prometheus, and Linux.
  • Knowledgeable in trading/fintech domains.
  • Experience with low-latency systems and distributed tracing.
  • Experience scaling PostgreSQL clusters rapidly.
  • Experience with pgx, gorm, or sqlc.

Benefits

  • Competitive Salary & Stock Options.
  • Health Benefits.
  • New Hire Home-Office Setup: One-time USD $500.
  • Monthly Stipend: USD $150 per month via a Brex Card.

Tech Stack

GoLinuxPostgreSQLPrometheus

Categories

Data EngineeringDevOpsSecurity