
Infrastructure Engineer (Observability)
Lightning AI4 days ago
Remote, Worldwide +3 moreSenior
Base Salary
$180k - $200k/yr
Responsibilities
- Own and evolve a scalable observability platform spanning metrics, logs, traces, and events.
- Drive the productization of observability capabilities for internal teams and external customers.
- Design multi-tenant observability systems with scoped access and customer-facing visibility.
- Continuously improve observability systems to keep pace with rapid infrastructure buildouts.
- Design and operate telemetry pipelines ingesting data from various sources.
- Build systems to correlate signals across infrastructure layers for faster debugging.
- Implement streaming and real-time data pipelines using tools like Kafka and OTEL.
- Design and implement noise-resistant alerting systems to improve signal quality.
- Create dashboards and alerting for InfraOps, Engineering, and Customer Success teams.
- Build automated insights for proactive detection and system health visibility.
- Contribute to broader infrastructure engineering projects beyond observability.
- Partner with infrastructure and platform teams to embed observability into core systems.
- Support large-scale, distributed systems across compute, networking, and storage environments.
- Work closely with customer-facing teams to deliver external observability experiences.
- Collaborate with engineering, operations, and support teams to improve system transparency.
- Help define best practices for observability across the organization.
Requirements
- 5+ years of experience in infrastructure engineering, SRE, or observability-focused roles.
- Strong experience with monitoring systems such as Prometheus, Grafana, ELK, or VictoriaMetrics.
- Experience building and operating observability platforms at scale.
- Proficiency in Python, Go, or bash for automation and data integration.
- Familiarity with containerized environments and Kubernetes observability.
- Experience with streaming telemetry pipelines like Kafka, OTEL, or Promtail.
- Experience with multi-tenant monitoring architectures.
- Strong written and verbal communication skills.
Benefits
- Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.).
- Retirement and financial wellness support (U.S.); Pension contribution (U.K.).
- Generous paid time off, plus holidays.
- Paid parental leave.
- Professional development support.
- Wellness and work-from-home stipends.
- Flexible work environment.