
Observability Engineer
TensorWave4 months ago
Las Vegas, NV, USAMid Level / Senior
Responsibilities
- Own and evolve the observability and monitoring platform with Grafana and Prometheus.
- Design, build, and maintain high-quality metrics pipelines using Prometheus.
- Create clear, actionable Grafana dashboards that effectively communicate system health.
- Define and maintain meaningful, actionable, and low-noise alerts.
- Establish and enforce observability standards across services.
- Partner with engineering teams to ensure proper application instrumentation.
- Lead improvements to alerting strategies, SLOs, and SLIs.
- Support incident response by helping teams quickly diagnose issues.
- Continuously evaluate and improve signal quality and cost.
- Identify and eliminate observability gaps to prevent outages.
Requirements
- Strong hands-on experience with Grafana and Prometheus.
- Deep understanding of metrics-based observability.
- Experience designing monitoring and alerting systems at scale.
- Strong knowledge of alerting best practices.
- Experience with distributed systems and cloud or Kubernetes environments.
- Ability to analyze system behavior using telemetry.
- Comfortable collaborating across teams to enhance visibility.
Benefits
- Mission driven company.
- Competitive Salary.
- Stock Options.
- 100% paid Medical, Dental, and Vision insurance.
- Life and Voluntary Supplemental Insurance.
- Short Term Disability Insurance.
- Flexible Spending Account.
- 401(k).
- Flexible PTO.
- Paid Holidays.
- Parental Leave.
- Mental Health Benefits through Spring Health.
Tech Stack
GrafanaHelmKubernetesPrometheusTerraform