Platform ULL - Colo - Reliability
Squarepoint Capital
8 months ago
London, United Kingdom +3 more
Senior
H1B Sponsor
Responsibilities
- Manage systems efficiently at scale through standardization, automation, testing, and in-depth monitoring.
- Enforce development standards for source control, testing, and continuous integration.
- Manage a distributed compute environment and multiple petabyte-scale storage systems.
- Install, manage, and monitor the Linux operating system (RHEL based).
- Troubleshoot complex hardware and software issues throughout the technology stack.
- Create self-healing systems and automated recovery processes.
- Respond to system incidents and participate in on-call rotations.
- Conduct root cause analysis of incidents and outages.
- Reduce operational toil through the development of user-driven automated workflows.
- Work with business owners to regularly re-prioritize the book of work.
Requirements
- 5+ years of experience working with Linux (RHEL/CentOS/Rocky preferred) in a large complex environment.
- Experience with server management and support for HP, SuperMicro, Dell, and overclock servers.
- Knowledge of low latency network interfaces and kernel bypass configuration and optimization.
- Experience with build and configuration management tools, specifically Chef or Ansible.
- Familiarity with observability tools, specifically Grafana and Prometheus.
- Strong scripting and automation skills in Python, Ruby, and Bash.
- In-depth knowledge of server network stack configuration, tuning, and troubleshooting.
- Critical thinking and problem-solving skills for troubleshooting complex issues.
- Good understanding of trading venues such as Nasdaq, LSE, and Euronext.
- Degree in Engineering, Computer Science, or related experience.
Tech Stack
AnsibleBashChefGrafanaLinuxPrometheusPythonRuby
Categories
DevOpsSecurityTesting