13 days ago
Memphis, TN, USAMid Level / Senior
H1B Sponsor
Responsibilities
- Design, develop, and deploy scalable code and services to automate reliability workflows.
- Implement and maintain observability tools and practices for real-time system health insights.
- Collaborate with cross-functional teams to identify and automate solutions for reliability bottlenecks.
- Troubleshoot and resolve complex issues in data center environments.
- Optimize Linux-based systems for performance, security, and reliability.
- Understand network topologies in multi-data center environments for effective troubleshooting.
- Participate in on-call rotations and post-incident reviews to enhance site reliability.
- Mentor junior team members and document processes for knowledge sharing.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- 5+ years of hands-on experience in site reliability engineering, infrastructure engineering, or DevOps.
- Strong programming skills in Python and familiarity with Rust or willingness to learn.
- Solid experience with Linux systems administration and performance tuning.
- Practical knowledge of containerization and orchestration technologies like Docker and Kubernetes.
- Experience implementing observability solutions and troubleshooting complex issues.
- Understanding of networking fundamentals in large-scale environments.
- Experience with on-call rotations and incident response practices.
- Ability to collaborate effectively with cross-functional teams.