Member of Technical Staff

13 days ago

Memphis, TN, USAMid Level / Senior

H1B Sponsor

Responsibilities

Design, develop, and deploy scalable code and services to automate reliability workflows.
Implement and maintain observability tools and practices for real-time system health insights.
Collaborate with cross-functional teams to identify and automate solutions for reliability bottlenecks.
Troubleshoot and resolve complex issues in data center environments.
Optimize Linux-based systems for performance, security, and reliability.
Understand network topologies in multi-data center environments for effective troubleshooting.
Participate in on-call rotations and post-incident reviews to enhance site reliability.
Mentor junior team members and document processes for knowledge sharing.

Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
5+ years of hands-on experience in site reliability engineering, infrastructure engineering, or DevOps.
Strong programming skills in Python and familiarity with Rust or willingness to learn.
Solid experience with Linux systems administration and performance tuning.
Practical knowledge of containerization and orchestration technologies like Docker and Kubernetes.
Experience implementing observability solutions and troubleshooting complex issues.
Understanding of networking fundamentals in large-scale environments.
Experience with on-call rotations and incident response practices.
Ability to collaborate effectively with cross-functional teams.

DockerGrafanaKubernetesLinuxPrometheusPython Rust