Bank of America - Jersey City, NJ
posted 3 months ago
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities, and shareholders every day. We are seeking Senior Site Reliability Engineers (SREs) to design, build, and maintain our next-gen AWS platform. This role provides an opportunity to work with a wide range of technologies and build a unique perspective that comes with integrating disparate services (both on-prem/off-prem) which must interact seamlessly with each other. You will work with colleagues that are fun, smart, hardworking, and driven. You will be part of a team that is growing, giving you room to innovate and be creative. In this position, you will collaborate with a diverse set of engineers, architects, and teams to design, develop, test, and implement secure, robust, highly available, and scalable solutions for Bank of America's External Cloud Platform. You will also work with other software engineers and teams to design and implement deployment approaches using highly scalable, automated, continuous integration, and continuous delivery pipelines. Your responsibilities will include all aspects of reliability, collaborating with technical experts, key stakeholders, and team members to resolve complex problems, owning the issue until you are sure it will not reoccur. You will have a deep understanding of SRE practices, service level indicators, and service level objectives; proactively utilizing them to resolve issues before they impact customers. Additionally, you will gather, analyze, synthesize, and develop visualizations and reporting from large, diverse data sets in service of continuous improvement of the platform. You will implement infrastructure, configuration, and network as code for the applications and platforms in your remit, identify opportunities to eliminate toil and automate the triage of issues to improve overall operational stability, and collaborate with others to identify, analyze, and resolve platform vulnerabilities. You will also proactively promote the adoption of site reliability engineering best practices within the team and organization, participate in 24x7 on-call coverage following the sun model, and perform blameless Postmortems (RCAs) as needed.