5 months ago
Base Salary
$190k - $225k/yr
Responsibilities
- Operate and maintain production blockchain infrastructure, including validators and RPC services.
- Ensure high availability and performance for AI-enabled developer platforms.
- Build and maintain monitoring, alerting, and dashboards for system health.
- Write high-quality automation and infrastructure code to improve reliability.
- Participate in on-call rotations, incident response, and post-incident reviews.
- Collaborate with engineering teams to embed reliability and security best practices.
- Improve Kubernetes reliability across cloud and bare-metal environments.
- Continuously refine deployment, rollback, and recovery strategies.
Requirements
- 3+ years of experience in Platform Engineering, Infrastructure Engineering, or Site Reliability Engineering.
- Strong software engineering foundation with production experience in Go and/or Python.
- Deep expertise operating Linux systems in production environments.
- Proven track record running Kubernetes at scale in high-availability environments.
- Experience supporting distributed systems with demanding uptime requirements.
- Comfort working in fast-moving startup environments with evolving requirements.
- Strong security mindset for infrastructure operating on public networks.
- Excellent collaboration and communication skills across technical and non-technical stakeholders.
Benefits
- Build and operate infrastructure for cutting-edge AI and blockchain systems.
- Make architectural decisions that impact reliability at scale.
- Work with modern, best-in-class tooling across the entire infrastructure stack.
- Collaborate with talented engineers solving novel distributed systems challenges.
- Competitive compensation with meaningful benefits.
- Flexible hybrid work environment in San Francisco.
- Backed by top-tier investors who believe in our vision.
