Platform Engineer (Reliability) - Unannounced Project

about 6 hours ago

Dublin, Ireland +2 moreSenior

H1B Sponsor

Responsibilities

Shape reliability and operational excellence engineering practices to maintain high system uptime.
Drive performance testing, tuning, and capacity planning to ensure effective system scaling.
Identify and automate systemic manual processes to improve efficiency.
Debug and resolve reliability and performance issues across services and codebases.
Embed security and compliance into engineering platforms and delivery pipelines.
Design and implement observability solutions for actionable insights into system health.
Participate in incident response and postmortem reviews to drive systemic improvements.
Help teams optimize cloud usage in line with business objectives and budget constraints.

Strong background in software engineering with experience in SRE or platform practices.
Experience owning or operating systems in production, including incident response.
Ability to take ownership of complex systems and improve them over time.
Proficient in navigating and understanding unfamiliar codebases.
Experience debugging complex distributed systems across service boundaries.
Strong communication skills for effective collaboration with technical and non-technical stakeholders.
Passion for continuous improvement and staying current with emerging trends.
Solid experience with Infrastructure as Code tools like Terraform.
Experience with containerized workloads on cloud-native platforms.
Familiarity with AWS Well-Architected Framework or equivalent standards.
Experience designing observability strategies for metrics, logs, and traces.
Strong programming skills in languages like Go or Python.

AWSDatadogGoKubernetesPythonTerraform