ArlingtonTXRecruiter Since 2001
the smart solution for Arlington jobs

Lead Site Reliability Engineer

Company: GM Financial
Location: Arlington
Posted on: May 3, 2021

Job Description:

Overview

We are expanding our efforts into complementary data technologies for decision support in areas of ingesting and processing large data sets. Our interests are in enabling data science and search based applications on large and low latent data sets in both a batch and streaming context for processing.To that end, this role will incorporates aspects of software engineering and operations, combining SRE and DevOps skills to come up with efficient ways of managing and operating applications. The role will require a high level of responsibility and accountability to deliver technical solutions. The data sets we deal with support both off-line and in-line machine learning training and model execution. Other data sets support search engine based analytics. Exploration and deployment of technologies activities include identifying opportunities that impact business strategy, selecting data solutions software, and defining hardware requirements based on business requirements. Responsibility also includes documentation of procedures for deployment, monitoring, managing and switching the environments in production and disaster recovery sites. This role participates along with team counterparts to architect an end-to-end framework developed on a group of core data technologies

Responsibilities

JOB DUTIES

  • Manage/Administer/Deploy Kubernetes and Spark cluster environments, on bare-metal and container infrastructure, including service allocation and configuration for the cluster, capacity planning, performance tuning, and ongoing monitoring
  • Define and refine processes and procedures for the site reliability engineering practice
  • Setup, manage and maintain Kubernetes based scalable environments for high-availability and work with vendors for smooth and continuous operations
  • Work closely with data scientists, data architects, data engineers, ETL developers, cybersecurity, network, Linux, other IT counterparts, and business partners to design and setup the environments to manage the ingested and processed datasets from the external sources, internal systems, and the data warehouse to extract features of interest
  • Evaluate, research, experiment with data processing, management and scalability technologies in a lab to keep pace with industry innovation while assessing business impact and viability for use cases associated with efforts in hand
  • Design, setup, test, deploy, monitor, document, and troubleshoot data processing and associated automation issues from the operations perspective
  • Work with IT Operations and Information Security Operations with monitoring and troubleshooting of incidents to maintain service levels
  • Work with Information Security Vulnerability Management and vendors to remediate known impacting vulnerabilities
  • Contribute to the evolving distributed systems architecture to meet changing requirements for scaling, reliability, performance, manageability, and cost
  • Report utilization and performance metrics to user communities
  • Contributes to planning and implementation of new/upgraded hardware and software releases
  • Responsible for monitoring the Linux, Kubernetes, Object Storage(MinIO), Feature Store, and Spark
  • Research and recommend innovative, and where possible, automated approaches for administration tasks
  • Identify approaches to efficiencies in resource utilization, provide economies of scale, and simplify support issues
  • Responsible for administration of Machine Learning platforms & Operations (MLOps) Such as Kubeflow/Jupyterhub/Python
  • Perform other duties as assigned
  • Conform with all company policies and procedures


Qualifications

Knowledge

  • Excellent knowledge of Kubernetes Administration, Deployments & Upgrades
  • Excellent Knowledge on Apache Spark administration on various platforms
  • Strong working knowledge of Object Store(MinIO) and Spark cluster security, networking connectivity and IO throughput along with other factors that affect distributed system performance
  • Strong working knowledge of disaster recovery, incident management, and security best practices
  • Working knowledge of containers (eg, docker) and major orchestrators (eg, Mesos, Kubernetes, Docker Datacenter)
  • Working knowledge of software defined networking
  • Working knowledge of hardening Data at Rest with key based encryption technologies
  • Working knowledge of setting up and customize interactive data analytics tools (eg, Apache Zeppelin, Jupyter notebooks)
  • Excellent knowledge on building the docker images to provide Containers-as-a-service
  • Working knowledge on Azure Administration, Azure DevOps & Azure Kubernetes Service (AKS)
  • Working knowledge of Pipeline Automation: Azure DevOps (YAML, ARM), Terraform, Jenkins, Chef/Puppet, Ansible
  • Working knowledge of CICD methodologies like Artifactory/Git/Gitops/Jenkins
  • Working knowledge of Code Scanning tools: SonarQube, Checkmarx/Blackduck/Twistlock
  • Working knowledge of Object Storage like S3/MinIO, Bucket policies and administration
  • Working knowledge of Kubernetes Storage protocols
  • Experienced with networking infrastructure including VLAN and firewalls
  • Working knowledge of hardening Kubernetes clusters with network policies like Calico/Tigera, service meshes like Istio, Internal & external load balancers

Skills

  • Proven track record with Red Hat Enterprise Linux & Kubernetes administration
  • Proficiency in a high-level language like Python, Go, Ruby and/or Java
  • Solid experience in High Availability and distributed systems, Linux , Data and SAN Storage Networks, NAS and Networking, leveraging tools to instrument and automate proactively and eventually predictive availability solutions
  • Proven track record leading complex enterprise production support efforts adhering to a mix of DevOps & SRE frameworks
  • Experience transitioning platforms to the cloud, with knowledge of cloud frameworks & design patterns, micro-service architectures
  • Extensive Knowledge of networking, including DNS, DHCP, firewalls, load balancers and IP routing
  • Experience in Monitoring tools - Splunk, Zenoss, Elastic, Appdynamics, Dynatrace, Grafana, Promotheus, Kiali etc,
  • Ability to grasp difficult concepts, large architectures, and sophisticated designs quickly and troubleshoot with debugging skills across a variety of integrated platforms
  • Proven capability to provide operational visibility on environment health to Senior Leadership, Technology and Business partners
  • Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services
  • Strong customer advocate with excellent written and verbal communication skills

Education

  • Bachelor's Degree in related field or equivalent work or military experience required
  • Master's Degree preferred

Experience

  • 5-7 years hands-on experience with supporting Linux production environments required
  • 5-7 years of hands-on administration experience on Spark required
  • 2-3 years hands-on experience in cloud technologies with Microsoft Azure required
  • 3-5 years hands-on experience with scripting with bash, perl, ruby, or python required
  • 3-5 years experience with Docker Datacenter required
  • 2-4 years of hands-on administration experience on Machine learning platforms required
  • Minimum of 1 year of experience in Mesos, Kubernetes, OpenShift and/or Deis or other such container/platform-as-a-service orchestrator required
  • Minimum of 1 year of hands-on experience on CICD tools & Technologies required
  • Minimum of 1 year of lead experience of site reliability engineering team required #LI-TS1
- provided by Dice

Keywords: GM Financial, Arlington , Lead Site Reliability Engineer, Other , Arlington, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Other Other Jobs


Pool Cleaner
Description: Job Description Job Summary Family owned and operated Swimming Pool Maintenance and Repair Company is now hiring for a full time Pool Maintenance Technician position. No experience necessary as training (more...)
Company: Bowers Pool Service LLC
Location: Wylie
Posted on: 05/12/2021

Associate Administrator
Description: Experience: Previous retail leadership experience is required. Farming, ranching, pet/equine, or welding knowledge is strongly preferred. Must be 18 years of age or older and possess a valid driver apos (more...)
Company: Tractor Supply Company
Location: Wylie
Posted on: 05/12/2021

Pool Cleaner
Description: Job Description About Us: SunTex Pool Service is run on a basic set of principles: Honesty Dependability Quality Service As a family-owned and operated business, we apply these principles
Company: Suntex Pools LLC
Location: Wylie
Posted on: 05/12/2021


Plumber Apprentice
Description: Job Description We are seeking a Plumber Apprentice to join our team The selected individual will assist the licensed plumber they are
Company: Professional Plumbers Group
Location: Wylie
Posted on: 05/12/2021

General Helper
Description: Job Description Are you looking for immediate work and that makes an immediate positive impact Are you the type of person who loves diversity in your work day We are now hiring for General Helper (more...)
Company: PeopleReady
Location: Wylie
Posted on: 05/12/2021

Prestige Beauty Advisor
Description: p OVERVIEW /p p Experience a place of energy, passion, and excitement. A place where the joy of discovery and uncommon artistry blend to create exhilarating buying experiences-for true beauty enthusiasts. (more...)
Company: Ulta Salon, Cosmetics & Fragrance, Inc.
Location: Wylie
Posted on: 05/12/2021

Associate Manager
Description: Experience: Previous retail leadership experience is required. Farming, ranching, pet/equine, or welding knowledge is strongly preferred. Must be 18 years of age or older and possess a valid driver apos (more...)
Company: Tractor Supply Company
Location: Wylie
Posted on: 05/12/2021

Electrician
Description: Job Description We are currently seeking an Electrician You will strive to provide safe electrical
Company: Mead Electric LLC
Location: Wylie
Posted on: 05/12/2021

Technical Support Representative
Description: Job Description Are You Looking for What-s Next in Your Career New Job
Company: Kelly Services
Location: Wylie
Posted on: 05/12/2021

General Labor
Description: Are you looking for immediate work and that makes an immediate positive impact Are you the type of person who loves diversity in your work day We are now hiring for General Laborwith a real passion (more...)
Company: PeopleReady
Location: Wylie
Posted on: 05/12/2021

Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

Arlington RSS job feeds