2 days ago
San Francisco, CA, USA
Staff+
Base Salary
$230k - $385k/yr
Responsibilities
- Own critical infrastructure across design, implementation, rollout, operation, and iteration.
- Build and operate performant backend systems in Rust or C++ that support core research workflows.
- Design and improve distributed data and serving systems, focusing on tradeoffs around partitioning and consistency.
- Debug real production bottlenecks across latency, throughput, and overload behavior.
- Operate business-critical services through on-call incidents and observability.
- Improve reliability of services running on Kubernetes, including resource tuning.
- Collaborate closely with engineers and researchers to deliver reliable systems.
Requirements
- Track record of owning operationally critical systems end to end.
- Strong hands-on experience building performance-sensitive backend systems in Rust or C++.
- Comfort working below typical service abstractions, including concurrency and I/O.
- Experience designing or operating distributed systems at meaningful scale.
- Preferably, experience with analytics infrastructure like ClickHouse.
- Hands-on experience operating production-critical systems, including incident management.
- Strong judgment in balancing engineering quality, speed, and business impact.
Benefits
- Hybrid work model of 3 days in the office per week.
- Relocation assistance for new employees.
Tech Stack
C++ClickHouseKubernetesRust
Categories
AI & MLBackendData EngineeringDevOps