7 months ago
Base Salary
$209k - $253k/yr
Responsibilities
- Lead the architecture and strategy for Crusoe’s observability platform.
- Design and operate large-scale telemetry systems for AI/ML workloads.
- Architect end-to-end telemetry pipelines for data ingestion and visualization.
- Drive the adoption of monitoring tools like Prometheus and Grafana.
- Design scalable log ingestion and processing pipelines.
- Build and evolve distributed tracing platforms integrated with service meshes.
- Establish observability standards including SLIs and SLOs.
- Drive reliability and cost efficiency improvements across telemetry pipelines.
- Automate provisioning and lifecycle management of observability infrastructure.
- Embed security and multi-tenancy into the observability platform.
- Partner with engineering teams to integrate observability into workflows.
- Mentor engineers and provide technical leadership.
- Influence platform architecture decisions and roadmap.
Requirements
- 10+ years of experience in infrastructure or platform engineering.
- Deep expertise in observability platforms and telemetry systems.
- Strong programming skills in Go or Python.
- Experience operating observability platforms in large Kubernetes environments.
- Understanding of distributed systems and reliability engineering.
- Proven ability to design and scale telemetry pipelines.
- Experience driving technical standards across engineering organizations.
- Strong leadership and mentorship capabilities.
Benefits
- Industry competitive pay.
- Restricted Stock Units in a fast-growing technology company.
- Health insurance options including HDHP and PPO.
- Employer contributions to HSA accounts.
- Paid Parental Leave.
- Paid life insurance and disability coverage.
- 401(k) with a 100% match up to 4% of salary.
- Generous paid time off and holiday schedule.
- Tuition reimbursement and cell phone reimbursement.
- Company paid commuter benefit of $300 per month.
