Application Engineer I - Site Reliability Engineering

First Citizens Bank - Raleigh, NC

posted 3 months ago

Full-time

Remote - Raleigh, NC

Credit Intermediation and Related Activities

About the position

As a Site Reliability Engineer (SRE) at First Citizens Bank, you will play a crucial role in ensuring the performance, reliability, and availability of our critical applications. This position is integral to our mission of providing exceptional banking services, particularly in the innovation, technology, green tech, and life sciences sectors. You will be part of a dedicated team that is responsible for the uptime and efficiency of customer-facing systems, driving adherence to Service Level Objectives (SLOs) through effective monitoring, alerting, and scaling practices. Your responsibilities will include software development in an Enterprise Java environment, with a focus on utilizing Spring Boot and Python for Continuous Integration and Continuous Deployment (CICD) pipelines. You will maintain, support, and troubleshoot large-scale application and infrastructure deployments, diving deep into issues and outages to establish root causes and communicate findings to business partners. A strong aptitude for analyzing and troubleshooting application, operating system, networking, configuration, and performance problems is essential. You will also be expected to have a solid understanding of Site Reliability Engineering concepts and best practices, with experience executing system deployments in environments such as AWS, private cloud, and OpenShift. Your role will involve designing, documenting, and implementing automated procedures, as well as automating system administrative tasks using scripting tools, preferably Python or shell scripting. A fundamental understanding of Internet networking protocols, including TCP/IP, TLS, DNS, HTTP, and SMTP, is required. In addition, you will work with various monitoring and automation tools such as Ansible, Gitlab, Splunk, Grafana, and Prometheus. As a culture champion for SRE best practices, you will leverage your ability to communicate effectively with both technical and non-technical staff. Familiarity with system hardening and security best practices will also be beneficial in this role.

Responsibilities

Be part of the team that owns the availability, performance and reliability of customer-facing systems
Drive adherence to SLOs through monitoring, alerting, and scaling
Software Development in an Enterprise Java Environment, including experience with Spring Boot and Python for CICD pipelines
Maintain, support and troubleshoot critical, large-scale application and infrastructure deployments
Dive deep into issues and outages to establish root causes and communicate them to your business partners
Analyze and troubleshoot application, operating system, networking, configuration and performance problems
Understand Site Reliability Engineering concepts and best practices
Execute system deployments (AWS, private cloud, OpenShift)
Design, document, and implement automated procedures
Automate system administrative tasks with scripting tools (Python or shell preferred)
Understand Internet networking protocols: TCP/IP, TLS, DNS, HTTP, SMTP
Utilize monitoring and automation tools such as Ansible, Gitlab, Splunk, Grafana, Prometheus
Champion SRE best practices and communicate clearly with both technical and non-technical staff
Familiar with system hardening and security best practices

Requirements

Bachelor's Degree and 2 years of experience in Application Engineering OR High School Diploma or GED and 6 years of experience in Application Engineering
Experience in Software Engineering background
Experience implementing / following SRE practices
Experience working in a large financial institution (or similar environment in scope and complexity)
Hands-on experience with deploying and maintaining systems in a containerized environment (public or private cloud)
Understand performance and availability requirements and have experience working with Software Engineering teams to define deployment, configuration and monitoring requirements
Ability to create meaningful metrics and alerting for service health monitoring
Experience reducing manual effort through automation with scripting
Skilled with configuration management and automation frameworks
Proficiency driving Root Cause Analyses to meaningful improvements
Experience leading troubleshooting efforts with production/non-production systems

Nice-to-haves

4+ years of experience in Software Engineering background
2+ years of experience implementing / following SRE practices

Benefits

Comprehensive benefits program for full-time associates (20+ hours)
Customized offerings designed to support families
Access to additional benefits information via company website

Application Engineer I - Site Reliability Engineering

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company