First Citizens Bank - Raleigh, NC

posted 3 months ago

Full-time
Remote - Raleigh, NC
Credit Intermediation and Related Activities

About the position

As a Site Reliability Engineer (SRE) at First Citizens Bank, you will play a crucial role in ensuring the performance, reliability, and availability of our critical applications. This position is integral to our mission of providing exceptional banking services, particularly in the innovation, technology, green tech, and life sciences sectors. You will be part of a dedicated team that is responsible for the uptime and efficiency of customer-facing systems, driving adherence to Service Level Objectives (SLOs) through effective monitoring, alerting, and scaling practices. Your responsibilities will include software development in an Enterprise Java environment, with a focus on utilizing Spring Boot and Python for Continuous Integration and Continuous Deployment (CICD) pipelines. You will maintain, support, and troubleshoot large-scale application and infrastructure deployments, diving deep into issues and outages to establish root causes and communicate findings to business partners. A strong aptitude for analyzing and troubleshooting application, operating system, networking, configuration, and performance problems is essential. You will also be expected to have a solid understanding of Site Reliability Engineering concepts and best practices, with experience executing system deployments in environments such as AWS, private cloud, and OpenShift. Your role will involve designing, documenting, and implementing automated procedures, as well as automating system administrative tasks using scripting tools, preferably Python or shell scripting. A fundamental understanding of Internet networking protocols, including TCP/IP, TLS, DNS, HTTP, and SMTP, is required. In addition, you will work with various monitoring and automation tools such as Ansible, Gitlab, Splunk, Grafana, and Prometheus. As a culture champion for SRE best practices, you will leverage your ability to communicate effectively with both technical and non-technical staff. Familiarity with system hardening and security best practices will also be beneficial in this role.

Responsibilities

  • Be part of the team that owns the availability, performance and reliability of customer-facing systems
  • Drive adherence to SLOs through monitoring, alerting, and scaling
  • Software Development in an Enterprise Java Environment, including experience with Spring Boot and Python for CICD pipelines
  • Maintain, support and troubleshoot critical, large-scale application and infrastructure deployments
  • Dive deep into issues and outages to establish root causes and communicate them to your business partners
  • Analyze and troubleshoot application, operating system, networking, configuration and performance problems
  • Understand Site Reliability Engineering concepts and best practices
  • Execute system deployments (AWS, private cloud, OpenShift)
  • Design, document, and implement automated procedures
  • Automate system administrative tasks with scripting tools (Python or shell preferred)
  • Understand Internet networking protocols: TCP/IP, TLS, DNS, HTTP, SMTP
  • Utilize monitoring and automation tools such as Ansible, Gitlab, Splunk, Grafana, Prometheus
  • Champion SRE best practices and communicate clearly with both technical and non-technical staff
  • Familiar with system hardening and security best practices

Requirements

  • Bachelor's Degree and 2 years of experience in Application Engineering OR High School Diploma or GED and 6 years of experience in Application Engineering
  • Experience in Software Engineering background
  • Experience implementing / following SRE practices
  • Experience working in a large financial institution (or similar environment in scope and complexity)
  • Hands-on experience with deploying and maintaining systems in a containerized environment (public or private cloud)
  • Understand performance and availability requirements and have experience working with Software Engineering teams to define deployment, configuration and monitoring requirements
  • Ability to create meaningful metrics and alerting for service health monitoring
  • Experience reducing manual effort through automation with scripting
  • Skilled with configuration management and automation frameworks
  • Proficiency driving Root Cause Analyses to meaningful improvements
  • Experience leading troubleshooting efforts with production/non-production systems

Nice-to-haves

  • 4+ years of experience in Software Engineering background
  • 2+ years of experience implementing / following SRE practices

Benefits

  • Comprehensive benefits program for full-time associates (20+ hours)
  • Customized offerings designed to support families
  • Access to additional benefits information via company website
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service