Sr Platform Monitoring Engineer
Databricks
20 days ago
Amsterdam, Netherlands
Senior
H1B Sponsor
Responsibilities
- Lead platform incident investigations and coordinate cross-functional teams for rapid resolution.
- Conduct post-incident root cause analyses to identify and prevent future issues.
- Design and implement customer-focused alerting pipelines and observability workflows.
- Build automation tools and establish reusable monitoring patterns to enhance reliability.
Requirements
- Minimum of 5 years of experience in SRE, DevOps, or a similar role.
- Production-level experience with at least one major cloud provider (AWS, Azure, GCP).
- Proficiency in container and orchestration technologies like Docker and Kubernetes.
- Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, and Grafana.
- Strong proficiency in Python or similar programming languages.
- Experience managing the incident lifecycle from detection to resolution.
- BS, Master's, or PhD in Computer Science, Computer Engineering, or a related field.
Benefits
- Comprehensive benefits and perks tailored to meet employee needs.
Tech Stack
Apache SparkAWSAzureDockerGoogle Cloud PlatformGrafanaKubernetesMLflowPrometheusPython
Categories
Data EngineeringDevOps