Site Reliability Engineer (SRE)

about 2 months ago

Paris, FranceMid Level / Senior

Responsibilities

Design and implement scalable, reliable, and fault-tolerant systems across cloud environments.
Develop and maintain observability tools, including monitoring, logging, and alerting.
Automate infrastructure provisioning, deployment, and incident response using IaC tools.
Optimize system performance, scalability, and incident response workflows.
Collaborate with development and DevOps teams to enhance system reliability.
Conduct root cause analysis and implement preventative measures.
Ensure high availability through load balancing and disaster recovery strategies.
Improve CI/CD pipelines for faster and stable deployments.
Optimize cloud cost and resource utilization across major platforms.
Participate in on-call rotations to address system failures.

Around 4+ years of experience in Site Reliability Engineering, DevOps, or System Engineering.
Strong knowledge of cloud platforms like AWS, Azure, or GCP.
Experience with observability and monitoring tools such as Prometheus and Grafana.
Proficiency in Infrastructure as Code tools like Terraform or CloudFormation.
Hands-on experience with containerization and orchestration technologies.
Strong Linux system administration and networking fundamentals.
Experience with incident management and root cause analysis.
Proficiency in scripting languages for automation.
Knowledge of load balancing and distributed systems.
Understanding of security best practices and compliance requirements.
Strong communication skills for cross-functional collaboration.