CRED

site reliability engineer - core and data

CRED

Apply
2 months ago
Bengaluru, India
Mid Level

Responsibilities

  • Design, implement, and manage scalable, fault-tolerant cloud infrastructure.
  • Work closely with engineering teams to translate business requirements into reliable infrastructure systems.
  • Operate containerized workloads on AWS using ECS and EKS.
  • Build and maintain observability to understand system health and performance.
  • Diagnose production issues and restore services under real-world load.
  • Automate infrastructure and operations using Infrastructure as Code and CI/CD pipelines.
  • Ensure adherence to compliance standards for financial services infrastructure.
  • Participate in on-call rotations and incident response, owning problems end-to-end.

Requirements

  • 2–5 years of experience working with production infrastructure or backend systems.
  • Strong Linux fundamentals and a genuine interest in operating systems.
  • Comfortable troubleshooting across systems, containers, and networks.
  • Hands-on experience with cloud platforms, preferably AWS.
  • Exposure to container orchestration platforms such as ECS or Kubernetes.
  • Curiosity about microservice ecosystems and observability.
  • Experience managing large, complex distributed systems in production.
  • Strong problem-solving skills and proficiency in at least one programming language.
  • Exposure to data or platform workloads like Spark, Airflow, or Kafka is a plus.
  • Understanding of data pipelines and resource/capacity tuning.
  • Experience with observability stacks such as Prometheus or Grafana.
  • Contributed to infrastructure or workload cost optimization in cloud environments.

Benefits

  • In-house pantry with lunch and dinner provided for all team members.
  • Paid sick leaves and comprehensive health insurance.
  • No fixed work timings, promoting a flexible work environment.
  • Salaries paid before the joining date as a show of trust.

Tech Stack

Apache AirflowApache FlinkApache KafkaApache SparkAWSGrafanaKubernetesLinuxPrometheus

Categories

BackendData EngineeringDevOps