GrepJob
OneTrust

Staff SRE

OneTrust
Apply
about 3 hours ago
Bengaluru, IndiaStaff+

Responsibilities

  • Design and build platforms, tools, and frameworks to improve system reliability, scalability, and performance.
  • Define and implement SRE best practices, including SLIs/SLOs, error budgets, and reliability metrics.
  • Lead incident response efforts, drive root cause analysis, and implement long-term fixes to prevent recurrence.
  • Analyze system behavior, identify bottlenecks and saturation points, and implement solutions to improve resilience.
  • Partner with engineering teams to embed reliability into the software development lifecycle.
  • Evaluate emerging technologies and recommend tools that enhance productivity, observability, and system robustness.
  • Drive capacity planning, performance tuning, and cost optimization efforts.
  • Collaborate with cross-functional teams to identify gaps, prioritize improvements, and resolve production issues.
  • Provide technical leadership and mentorship across the engineering organization.
  • Influence senior leadership with insights, metrics, and recommendations to improve system health and operational excellence.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field.
  • 10+ years of experience in software engineering with a strong focus on backend systems and distributed architecture.
  • Extensive experience building and operating Java-based systems using RESTful APIs, Spring Boot, and Microservices architecture.
  • Strong understanding of distributed systems concepts, including fault tolerance, eventual consistency, and scalability.
  • Proven experience with cloud platforms (AWS/Azure/GCP) and cloud-native architectures.
  • Expertise in observability tools such as Prometheus, Grafana, ELK, or similar.
  • Experience defining and managing SLIs, SLOs, and error budgets.
  • Strong knowledge of CI/CD pipelines, automation, and infrastructure as code.
  • Hands-on experience with incident management, root cause analysis (RCA), and postmortems.
  • Excellent analytical, debugging, and problem-solving skills.
  • Strong communication, collaboration, and leadership abilities.

Benefits

  • Comprehensive healthcare coverage.
  • Flexible PTO.
  • Equity RSUs and annual performance bonus opportunities.
  • Retirement account support.
  • 14+ weeks of paid parental leave.
  • Career development opportunities.
  • Company-paid privacy certification exam fees.

Tech Stack

AWSAzureGoogle Cloud PlatformGrafanaJavaPrometheusSpring Boot

Categories