Databricks

Sr Platform Monitoring Engineer

Databricks

Apply
20 days ago
Amsterdam, Netherlands
Senior
H1B Sponsor

Responsibilities

  • Lead platform incident investigations and coordinate cross-functional teams for rapid resolution.
  • Conduct post-incident root cause analyses to identify and prevent future issues.
  • Design and implement customer-focused alerting pipelines and observability workflows.
  • Build automation tools and establish reusable monitoring patterns to enhance reliability.

Requirements

  • Minimum of 5 years of experience in SRE, DevOps, or a similar role.
  • Production-level experience with at least one major cloud provider (AWS, Azure, GCP).
  • Proficiency in container and orchestration technologies like Docker and Kubernetes.
  • Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, and Grafana.
  • Strong proficiency in Python or similar programming languages.
  • Experience managing the incident lifecycle from detection to resolution.
  • BS, Master's, or PhD in Computer Science, Computer Engineering, or a related field.

Benefits

  • Comprehensive benefits and perks tailored to meet employee needs.

Tech Stack

Apache SparkAWSAzureDockerGoogle Cloud PlatformGrafanaKubernetesMLflowPrometheusPython

Categories

Data EngineeringDevOps