GrepJob
xAI

Sr. Software Engineer (Data Center Automation)

xAI
Apply
about 6 hours ago
Memphis, TN, USASenior / Mid Level
H1B Sponsor

Responsibilities

  • Design, develop, and deploy scalable code and services to automate reliability workflows.
  • Implement and maintain observability tools and practices for real-time system health insights.
  • Collaborate with cross-functional teams to identify and automate solutions for reliability bottlenecks.
  • Troubleshoot and resolve complex issues in data center environments.
  • Optimize Linux-based systems for performance, security, and reliability.
  • Understand network topologies to troubleshoot connectivity and performance issues.
  • Participate in on-call rotations and post-incident reviews to enhance site reliability.
  • Mentor junior team members and document processes for knowledge sharing.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in site reliability engineering, infrastructure engineering, or DevOps.
  • Strong programming skills in Python and familiarity with Rust or willingness to learn.
  • Experience with Linux systems administration and performance tuning.
  • Knowledge of containerization and orchestration technologies like Docker and Kubernetes.
  • Experience implementing observability solutions and troubleshooting complex distributed systems.
  • Understanding of networking fundamentals in large-scale environments.
  • Experience with on-call rotations and incident response practices.
  • Ability to collaborate effectively with cross-functional teams.

Tech Stack

Categories

AI & MLData EngineeringDevOps