about 3 hours ago
Responsibilities
- Lead the design and implementation of reliable, scalable, and secure production platforms.
- Collaborate with cross-functional teams to maintain resilient infrastructure and deployment patterns.
- Provide technical leadership and mentorship to engineers, promoting strong engineering standards.
- Participate in a 24x7 on-call rotation to support critical services and ensure platform availability.
- Drive standardization, automation, and documentation to improve operational consistency.
- Contribute to the full lifecycle of platform and service delivery from design to optimization.
Requirements
- 5+ years of experience in DevOps, SRE, platform engineering, or software engineering roles.
- Strong Kubernetes experience at scale with a deep understanding of containers.
- Hands-on experience with infrastructure as code tools such as Terraform, Ansible, or Puppet.
- Strong programming skills in at least one object-oriented language and effective scripting capabilities.
- Strong understanding of security principles and best practices across infrastructure and services.
- Significant hands-on experience in at least one major cloud platform, with exposure to AWS, GCP, or OCI.
- Strong monitoring and observability experience using tools like Prometheus or Grafana.
- Solid understanding of networking fundamentals and distributed systems.
- Strong Linux and/or Windows systems administration experience.
- Experience with software delivery automation, CI/CD pipelines, and secure SDLC practices.
- Good understanding of SRE concepts such as SLIs, SLOs, SLAs, and availability.
Tech Stack
AnsibleApache KafkaAWSElasticsearchGoogle Cloud PlatformGrafanaKubernetesLinuxMySQLPostgreSQLPrometheusPuppetRedisTerraformWindows
