GrepJob
Recorded Future

Site Reliability Engineer

Recorded Future
Apply
about 1 month ago
Gothenburg, SwedenMid Level / Senior
H1B Sponsor

Responsibilities

  • Ensure performance, capacity, scalability, reliability, and security of the platform.
  • Make systemic improvements for recurring issues.
  • Perform Root Cause Analysis for outages.
  • Design and maintain scalable infrastructure on AWS.
  • Develop observability solutions using tools like Grafana and ELK.
  • Automate infrastructure provisioning using Terraform and Chef.
  • Participate in a 24/7 on-call rotation for production incidents.
  • Collaborate with engineering teams for high availability applications.
  • Identify and address performance bottlenecks.
  • Drive continuous improvement through automation and process optimization.

Requirements

  • 3+ years of experience in Site Reliability Engineering or similar roles.
  • Extensive hands-on experience with AWS and networking concepts.
  • Expert-level troubleshooting and diagnostic skills.
  • Proven track record of reducing system downtime.
  • Advanced Linux skills including networking and storage.
  • Experience managing observability suites like Grafana and ELK.
  • Strong proficiency in Terraform and Chef.
  • Preference for automating tasks via Infrastructure as Code.
  • Ability to create clear incident reports and technical documentation.
  • Strong collaboration and communication skills.

Tech Stack

Apache KafkaAWSChefElasticsearchGrafanaKibanaKubernetesLogstashMongoDBPrometheusRabbitMQTerraform

Categories