1 day ago
Responsibilities
- Own software engineering efforts across the full SDLC for the rack management solution.
- Drive issues to resolution while collaborating effectively across teams.
- Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure-as-Code.
- Work with Datacenter Operations Engineers to maintain and operate AI systems at peak performance.
- Drive corrective actions for systems that are not operating correctly.
Requirements
- Bachelor's degree or equivalent practical experience in a relevant subject.
- Experience with RESTful API development.
- Experience building, deploying, and operating containerized workloads using Kubernetes and Docker or Podman.
- Experience managing production Kubernetes clusters and workloads.
- Programming experience with Go.
- Hands-on experience with Infrastructure-as-Code and CI/CD automation tools.
- Experience with Redfish for datacenter hardware management.
- Experience in AGILE and SCRUM frameworks for work planning.
- Strong Linux systems engineering experience, including administration and scripting.