GrepJob
Nebius

Site Reliability Engineer (SRE) AI Infrastructure (Early Career)

Nebius
Apply
about 10 hours ago
Amsterdam, Netherlands
Entry Level

Responsibilities

  • Assist in day-to-day SRE operations tasks in NetInfra.
  • Deploy tested and approved changes following clear instructions.
  • Execute small and well-defined tasks from the backlog.
  • Work on small SRE projects from the backlog.
  • Create tests for changes/projects as applicable.
  • Write technical documentation about work done.
  • Track and update Jira tasks according to progress.
  • Actively study network and system fundamentals.
  • Learn internal tools, workflows, and operational standards.
  • Learn container and platform fundamentals like Kubernetes and Terraform.

Requirements

  • Demonstrate strong interest and foundational knowledge in cloud providers, operations, networking, hardware, and software.
  • Programming experience in Python, Go, or C++.
  • Basic understanding of networking concepts like Ethernet, IP, and routing fundamentals.
  • Familiarity with Git.
  • Basic understanding of containers, Kubernetes, and Infrastructure as Code (IAC).

Benefits

  • Competitive salary and comprehensive benefits package.
  • Mentorship from experienced AI, ML, and cloud infrastructure professionals.
  • Hands-on experience with real customer workloads and production systems.
  • Opportunities for professional growth within Nebius.
  • A dynamic and collaborative work environment that values initiative and innovation.
  • Opportunity to be considered for a full-time role after the Early Talent Program.

Tech Stack

C++GitGoHelmKubernetesLinuxPythonTerraform

Categories

AI & MLDevOps