Senior Service Reliability Engineer

about 16 hours ago

Singapore, SingaporeSenior

H1B Sponsor

Responsibilities

Improve site reliability by building fault-tolerant mechanisms and architectures.
Drive the integration of observability automation into the CI/CD pipeline.
Handle production incidents and manage communication with clients.
Monitor performance of production systems to meet SLA and SLO metrics.
Advise application development teams on system reliability improvements.
Enhance system observability to reduce false alarms and improve efficiency.
Implement chaos engineering practices for regular reliability testing.
Align site reliability direction with client goals and business needs.

Hands-on experience in programming and scripting languages such as Python, Go, or Bash.
Good understanding of at least one Public Cloud (AWS, Azure, or GCP).
Exposure to observability tools like Grafana, Datadog, or ELK Stack.
Familiarity with DevOps and GitOps practices.
Knowledge of container-based architecture and orchestration tools like Kubernetes.
Understanding of technical architecture and modern design patterns.
Familiarity with Cloud’s Well Architected Framework principles.

AWSAzureBashDatadogDockerGoGoogle Cloud PlatformGrafanaKubernetesPython