GrepJob
Graphcore

Observability, Infrastructure Engineer

Graphcore
Apply
3 days ago
Gdańsk, PolandMid Level / Senior
H1B Sponsor

Responsibilities

  • Contribute to all phases of product development from definition to early customer support.
  • Design and implement fault-remediation solutions at scale.
  • Implement multi-component integrations for seamless management and monitoring.
  • Create reference designs including documentation and source code.
  • Deploy solutions internally for engineering teams to aid in various analyses.
  • Maintain and improve deployed infrastructure for optimal customer service.
  • Ensure solutions are properly tested and enhance unit testing with QA teams.
  • Mentor and guide junior engineers to foster continuous learning.

Requirements

  • BSc or MSc degree in Computer Engineering, Computer Science, or equivalent experience.
  • Experience in architecting and implementing scalable cluster management systems.
  • Experience managing large-scale datacenters with a focus on hardware observability.
  • Familiarity with observability stacks like Prometheus, Grafana, and Elastic Stack.
  • Understanding of secure telemetry practices and data exposure controls.
  • Working knowledge of Datadog, Dynatrace, or Splunk.
  • Experience with large-scale telemetry datasets and actionable dashboards.
  • Familiarity with automation technologies like Ansible or Terraform.
  • Experience in containerization using Docker and Kubernetes.
  • Experience in Linux environments.
  • Strong skills in C/C++/Go and Python.
  • Excellent written and verbal communication skills.

Benefits

  • Competitive salary and annual leave policy.
  • Medical and dental health plans.
  • Gym card and employee pension matched up to 4%.
  • Yearly review of benefits to ensure value and reward for employees.
  • Commitment to building an inclusive work environment.

Tech Stack

AnsibleApache KafkaApache SupersetCC++ClickHouseDatadogDockerGoGrafanaKubernetesLinuxPrometheusPythonSplunkTerraform

Categories

AI & MLData EngineeringDevOpsTesting