about 6 hours ago
Memphis, TN, USASenior / Mid Level
H1B Sponsor
Responsibilities
- Design, develop, and deploy scalable code and services to automate reliability workflows.
- Implement and maintain observability tools and practices for real-time system health insights.
- Collaborate with cross-functional teams to identify and automate solutions for reliability bottlenecks.
- Troubleshoot and resolve complex issues in data center environments.
- Optimize Linux-based systems for performance, security, and reliability.
- Understand network topologies to troubleshoot connectivity and performance issues.
- Participate in on-call rotations and post-incident reviews to enhance site reliability.
- Mentor junior team members and document processes for knowledge sharing.
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 3+ years of experience in site reliability engineering, infrastructure engineering, or DevOps.
- Strong programming skills in Python and familiarity with Rust or willingness to learn.
- Experience with Linux systems administration and performance tuning.
- Knowledge of containerization and orchestration technologies like Docker and Kubernetes.
- Experience implementing observability solutions and troubleshooting complex distributed systems.
- Understanding of networking fundamentals in large-scale environments.
- Experience with on-call rotations and incident response practices.
- Ability to collaborate effectively with cross-functional teams.