Observability, Staff Infrastructure Engineer

3 days ago

Gdańsk, PolandStaff+

H1B Sponsor

Responsibilities

Contribute to all phases of product development, from definition to early customer support.
Design and implement fault-remediation solutions at scale.
Implement multi-component integrations using Graphcore and third-party technologies.
Create reference designs including documentation and source code.
Deploy solutions for engineering teams to aid in debugging and performance analysis.
Maintain and improve deployed infrastructure for optimal customer service.
Ensure solutions are tested by collaborating with development and QA teams.
Mentor and guide junior engineers to foster continuous learning.

BSc or MSc degree in Computer Engineering, Computer Science, or equivalent experience.
Proven experience in architecting and implementing scalable cluster management systems.
Experience managing large-scale datacenters with a focus on hardware observability.
Familiarity with observability stacks like Prometheus, Grafana, and Elastic Stack.
Understanding of secure telemetry practices and data exposure controls.
Working knowledge of Datadog, Dynatrace, or Splunk.
Experience with large-scale telemetry datasets and actionable dashboards.
Proficiency in automation technologies such as Ansible or Terraform.
Experience in containerization with Docker and Kubernetes.
Strong programming skills in C/C++/Go and Python.
Excellent written and verbal communication skills.

AnsibleApache SupersetCC++ClickHouseDatadogDocker GoGrafanaKubernetes LinuxPrometheusPythonSplunkTerraform