5 days ago
Bellevue, WA, USA +2 moreStaff+
Base Salary
$188k - $250k/yr
Responsibilities
- Lead and mentor engineers, fostering a culture of collaboration and continuous improvement.
- Scale logging, tracing, and metrics platforms to support a global datacenter footprint.
- Develop and refine monitoring and alerting to enhance system reliability.
- Advise engineers across CoreWeave on optimal usage of Observability systems.
- Automate interactions with CoreWeave’s Compute Infrastructure layer.
- Manage production clusters and ensure development teams follow best practices for deployments.
Requirements
- 7+ years of experience in Software Engineering, Site Reliability Engineering, DevOps, or a related field.
- Deep expertise across all observability pillars using tools like ClickHouse, Elastic, Loki, Victoria Metrics, Prometheus, Thanos, and/or Grafana.
- Expertise in Kubernetes, containerization, and microservices architectures.
- Proven track record of leading incident management and post-mortem analysis.
- Excellent problem-solving, analytical, and communication skills.
Benefits
- Medical, dental, and vision insurance - 100% paid for by CoreWeave.
- Company-paid Life Insurance.
- Voluntary supplemental life insurance.
- Short and long-term disability insurance.
- Flexible Spending Account.
- Health Savings Account.
- Tuition Reimbursement.
- Ability to Participate in Employee Stock Purchase Program (ESPP).
- Mental Wellness Benefits through Spring Health.
- Family-Forming support provided by Carrot.
- Paid Parental Leave.
- Flexible, full-service childcare support with Kinside.
- 401(k) with a generous employer match.
- Flexible PTO.
- Catered lunch each day in our office and data center locations.
- A casual work environment.
- A work culture focused on innovative disruption.
