SRE / Site Realiability Engineer

Motion Recruitment - Charlotte, NC

posted 3 months ago

Full-time

Charlotte, NC

Administrative and Support Services

About the position

The Site Reliability Engineer (SRE) position is a long-term contract opportunity with a well-known financial services company, offering a chance to work with some of the brightest minds in the industry. This role is based in various locations including Chandler, AZ, Charlotte, NC, Iselin, NJ, Irving, TX, and New York, NY, and operates in a hybrid work environment. The company has a rich history of over 150 years and is committed to innovation in the digital age, focusing on customer satisfaction and financial success. The contract duration is set for 12 months, providing a competitive benefits package to the selected candidate. As an SRE, you will be responsible for supporting a suite of applications that enable wires and payment processing across various technologies. You will partner with cross-organizational teams to implement end-to-end platform engineering capabilities, ensuring resilience, high performance, and high availability of applications. Your role will involve production deployments, governance, and acting as a gatekeeper for changes to the production environment. You will also conduct incident management, coordinate solutions to prevent customer impact, and inform leadership on ongoing issues in alignment with ITIL best practices. The position requires a proactive approach to learning new tools and concepts in the platform engineering space, including automation, CI/CD pipelines, Ansible, Prometheus, and cloud engineering concepts. You will contribute to increasing system efficiencies and reducing the need for human intervention in related tasks, making this a critical role in maintaining the operational integrity of the company's systems.

Responsibilities

Support a suite of applications across varying technologies enabling wires and payment processing.
Partner with cross-organizational teams to ensure end-to-end platform engineering capabilities are implemented, ensuring resilience, high performance, and high availability of the applications.
Responsible for production deployments, ensuring governance and controls and act as a gatekeeper for any changes to the production environment.
Provide cross-functional and cross-organizational coordination on root cause analysis for any production issues.
Drive recovery efforts during incident calls. Gather essential information for root cause analysis and stability improvement.
Review and analyze moderately complex operational support systems, application software, and system management tools to ensure the highest levels of systems and infrastructure availability.
Conduct incident management and coordinate/implement solutions to prevent customer impact.
Inform leadership on ongoing issues through formal communications in alignment with ITIL best practices.
Learn new tools and concepts in platform engineering space to implement automation, building and testing CI/CD pipelines, Ansible, Prometheus, and Cloud engineering concepts.
Contribute to increasing system efficiencies and lowering the human intervention time on related tasks.

Requirements

5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through work experience, training, military experience, or education.
5+ years of experience with Linux.
3+ years database support/knowledge with Mongo and/or SQL experience (Oracle SQL).
2+ years app development or support in payments and/or wires applications.

Nice-to-haves

Experience in payments and/or wires application technology (major/large banking technology experience).
Exposure/knowledge/understanding of MTS (Money Transfer System), Real Time/Instant Payments, and other payment origination applications and technologies.
Linux bash scripting ability - develop scripts on the fly and seek automation opportunities to improve/stabilize application space and reduce manual intervention requirements.
Kafka understanding/experience.
Proficiency in utilizing Service Now for incident, problem, and change management (opening incidents, reviewing changes, documenting records).
Extremely strong verbal, written, and interpersonal communication skills.
Ability to identify root-cause issues/improvement opportunities, and design approaches.
Knowledge and understanding of complex enterprise systems and frameworks including frontends, middleware, services layer, database, backend, and downstream interfaces.
Hands-on experience using monitoring tools like Splunk, AppDynamics, Grafana, Geneos ITRS etc.

Benefits

Competitive benefit package offered by the company.

SRE / Site Realiability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company