GrepJob
OXIO Corporation

Site Reliability Engineer

OXIO Corporation
Apply
about 8 hours ago
Remote, WorldwideMid Level / Senior

Responsibilities

  • Design and implement platform on the cloud to support OXIO backend services.
  • Automate technical operations such as deployments, scaling, and recovery.
  • Monitor and maintain mission-critical production infrastructure to ensure maximum uptime.
  • Participate in an on-call rotation and promote a culture of continuous improvement through blameless postmortems.
  • Enable Engineering/Telecom/Data Engineering teams by providing operational tools.

Requirements

  • Understanding of Linux/Unix systems, primarily Linux-based.
  • Familiarity with Linux/Unix system internals like process management and networking.
  • Proficiency in at least one programming language (Python, Go, or Ruby) and strong scripting skills (Bash, Perl).
  • Experience with infrastructure provisioning tools such as Terraform or Ansible.
  • Familiarity with containerization (Docker) and orchestration tools (Kubernetes).
  • Experience with monitoring tools like Prometheus or Grafana.
  • Knowledge of incident management practices and experience in on-call rotations.
  • Hands-on experience with cloud providers (AWS, Google Cloud, Azure).
  • Understanding of TCP/IP, DNS, HTTP/HTTPS, load balancing, and firewalls.

Tech Stack

AnsibleApache CassandraApache KafkaAWSAzureBashCircleCIDatadogDockerElasticsearchGitLab CI/CDGoGoogle CloudGrafanaJenkinsKubernetesLinuxPerlPrometheusPythonRubySplunkSQLTerraform

Categories

BackendData EngineeringDevOps