Senior Platform Reliability Engineer
Company: G-research
Location: Dallas
Posted on: May 6, 2024
|
|
Job Description:
Do you want to tackle the biggest questions in finance with near
infinite compute power at your fingertips?G-Research is a leading
quantitative research and technology firm, with offices in London
and Dallas. We are proud to employ some of the best people in their
field and to nurture their talent in a dynamic, flexible and highly
stimulating culture where world-beating ideas are cultivated and
rewarded.This is a hybrid role based in our new Dallas
infrastructure hub where we work on the latest technologies in a
cutting-edge environment.The roleThe Reliability Engineering team,
part of our Platforms as a Service (PaaS) function, works with a
variety of technologies, including multiple Kubernetes clusters,
multiple database technologies, low latency networks and big data
warehouse across multiple regions around the globe.We are actively
seeking an experienced Site Reliability Engineer (SRE) with a
proven track record in building up and bootstrapping SRE functions
across multiple teams.We want an individual who excels in ensuring
the robustness, scalability, and fault tolerance of large-scale
infrastructure. The ideal candidate will have a comprehensive
understanding of the intricacies involved in architecting,
deploying, and maintaining high-performance solutions, coupled with
a track record of implementing and enhancing reliability measures
across all infrastructure ecosystems.This role demands hands-on
experience in orchestrating resilient systems, fine-tuning
performance, and implementing proactive strategies to mitigate
potential downtimes or disruptions. The successful candidate will
play a pivotal role in driving the reliability, efficiency, and
scalability of infrastructure platform through innovative solutions
and best-in-class practices.In return, you will gain exposure to
the latest hardware and software technologies in a forward-thinking
company, which values innovation, personal development and
training.Key responsibilities of the role include:Leading efforts
to enhance existing practices across teams, fostering collaboration
and synchronization to optimize system reliability and
scalabilityDriving strategies for enhancing systems performance,
leveraging innovative approaches to improve efficiency and
streamline processesImplementing best practices for system
reliability, fault tolerance, and scalability, ensuring alignment
with evolving industry standardsCultivating a culture of continuous
improvement, encouraging regular reviews and iterative enhancements
to tools, methodologies, and processesEnhancing incident response
processes by conducting comprehensive reviews, implementing
improvements, and integrating learned lessons into future
strategiesLeading efforts to optimize capacity planning strategies,
ensuring systems are prepared for future scaling while maximizing
resource utilizationCollaborating with security teams to fortify
and enhance security measures within systems, ensuring compliance
with evolving policies and standardsCollaborating effectively with
other SREs within PaaS, and colleagues in different time
zones.(Dallas and London)Who are we looking for?The successful
candidate will be an experienced Platforms Reliability Engineer who
is enthusiastic about contributing to an automated, scalable,
reliable and high-performing Infrastructure and Platform as a
Service:A strong desire to continually learn about new
technologies, approaches, and systems, along with the agility to
work across multiple teamsA strong communicator with excellent
written communications to technical and non-technical audiencesA
self-starter with excellent problem-solving skillsProficient in Go
or other programming language such as Python, Rust or Java for
automation and development tasksExtensive Linux, Networking and
Infrastructure knowledgeExperience with CI/CD (preferably Jenkins
and ArgoCD) and Configuration Management tools, such as Ansible and
TerraformExperience deploying and running applications on Docker
and Kubernetes, including the creation of Helm chartsFamiliarity
with monitoring tools like Prometheus, Grafana, Open Telemetry and
the ELK stack (Elasticsearch, Logstash, Kibana), or
similarUnderstanding of core SRE concepts and their implementation
in platform engineeringBeneficial experience would
include:Experience building and bootstrapping an SRE organization
across multiple teamsExperience working on large-scale
infrastructure to improve performance, stability and efficiencyWhy
should you apply?Market-leading compensation plus annual
discretionary bonusInformal dress code and excellent work/life
balanceExcellent paid time off allowance of 25 daysSick days,
military leave, and family and medical leaveGenerous 401(k)
plan16-weeks fully paid parental leaveMedical and Prescription,
Dental, and Vision insuranceLife and Accidental Death &
Dismemberment (AD&D) insuranceEmployee Assistance and Wellness
programsGenerous relocation allowance and supportGreat selection of
office snacks, and hot and cold drinksOn-site gym and car parkingby
Jobble
Keywords: G-research, Arlington , Senior Platform Reliability Engineer, Accounting, Auditing , Dallas, Texas
Click
here to apply!
|