about 5 hours ago
Base Salary
$200k - $288k/yr
Responsibilities
- Design and build large scale telemetry pipelines for metrics, logs, and traces.
- Architect AI-driven observability systems for anomaly detection and predictive insights.
- Collaborate with teams to embed observability into all layers of the platform.
- Define standards for instrumentation, tracing, and telemetry across services.
- Build tools that provide engineers with visibility into system behavior and performance.
- Optimize observability systems for high scale, low latency, and cost efficiency.
- Mentor engineers and provide technical leadership in observability and AI diagnostics.
- Stay updated on industry trends in observability and AI-based monitoring.
Requirements
- 7+ years of experience in software engineering with a focus on distributed systems.
- Deep experience in building and operating large scale cloud services.
- Strong programming skills in Java, Scala, C++, or Python.
- Solid understanding of system performance, debugging, and reliability engineering.
- Experience with cloud platforms such as AWS, Azure, or GCP.
- Proven ability to lead complex technical projects and influence architecture decisions.
- Strong problem-solving skills and ability to work in a fast-paced environment.