
Site Reliability Engineer
OXIO Corporationabout 8 hours ago
Remote, WorldwideMid Level / Senior
Responsibilities
- Design and implement platform on the cloud to support OXIO backend services.
- Automate technical operations such as deployments, scaling, and recovery.
- Monitor and maintain mission-critical production infrastructure to ensure maximum uptime.
- Participate in an on-call rotation and promote a culture of continuous improvement through blameless postmortems.
- Enable Engineering/Telecom/Data Engineering teams by providing operational tools.
Requirements
- Understanding of Linux/Unix systems, primarily Linux-based.
- Familiarity with Linux/Unix system internals like process management and networking.
- Proficiency in at least one programming language (Python, Go, or Ruby) and strong scripting skills (Bash, Perl).
- Experience with infrastructure provisioning tools such as Terraform or Ansible.
- Familiarity with containerization (Docker) and orchestration tools (Kubernetes).
- Experience with monitoring tools like Prometheus or Grafana.
- Knowledge of incident management practices and experience in on-call rotations.
- Hands-on experience with cloud providers (AWS, Google Cloud, Azure).
- Understanding of TCP/IP, DNS, HTTP/HTTPS, load balancing, and firewalls.