Senior Site Reliability Engineer (Observability)
Iterable
2 months ago
Lisbon, Portugal
Senior
H1B Sponsor
Responsibilities
- Collaborate with product teams to define meaningful observability frameworks.
- Own the long-term roadmap for observability tools and define SLIs/SLOs.
- Design and automate scalable telemetry pipelines for production visibility.
- Drive upgrades and policy enforcement for observability-focused clusters.
- Contribute production-quality Go or Python services to enhance reliability.
- Partner with service owners to embed observability into the SDLC.
- Design cost-efficient telemetry architectures and high-signal alerting.
- Participate in on-call duties and lead post-incident reviews.
Requirements
- Proven ability to architect and manage production-grade Kubernetes clusters.
- Proficiency in Infrastructure-as-Code, including Terraform.
- Deep production experience with Elasticsearch, Prometheus, or OpenTelemetry.
- Proficiency in Go or Python for building custom tools and automation.
- Ability to optimize data ingestion and storage for logs and metrics.
- Ability to influence engineering culture through mentorship and collaboration.
- A humble, collaborative approach to problem-solving.
Benefits
- Competitive salaries and meaningful equity.
- Private Medical Insurance.
- Life/Risk Assurance.
- Meal Allowance of 8.55€ per day.
- Community Days (additional paid holidays).
- Paid Annual Leave of 22 days.
- Paid Sabbatical after 4 years of tenure.
- Initial laptop workstation setup.
- Teleworking Allowance.
Tech Stack
DatadogElasticsearchGoGrafanaKubernetesPrometheusPythonTerraform
Categories
BackendDevOps