Unclassified - Cary, NC
posted 3 months ago
As a Site Reliability Engineer/ITAO at Deutsche Bank, you will play a crucial role in ensuring the stability and performance of applications within the Corporate & Investment Banking sector. Your primary responsibility will be to collaborate closely with application teams to maintain well-monitored applications that are resilient to faults. This involves agreeing upon and periodically reviewing Service Level Agreements (SLAs) and Service Level Objectives (SLOs) to ensure that applications meet the required availability standards based on their criticality. You will also be responsible for maintaining Error Budgets for the application teams, which will help prevent releases if the production stability and availability fall below acceptable levels. In this role, you will leverage your knowledge and experience with relevant tools used in the Site Reliability Engineering (SRE) environment. You will specialize in one or more technical domains to provide optimum service levels in line with SLAs and Operating Level Agreements (OLAs). Your work will involve managing application availability, performance, and compliance, as well as organizing Level 3 support for applications in collaboration with development teams. You will identify gaps in security and compliance, driving remediation efforts while managing the technical roadmap of applications to ensure timely upgrades, patches, and strategic changes are implemented. Additionally, you will build monitoring solutions to alert teams in the event of failures or performance issues, optimizing uptime and providing feedback loops to improve application resilience. You will also work to identify and eliminate or automate toil for both application teams and the SRE team, enhancing overall effectiveness. Your role will require you to manage the resolution of outages in coordination with both technical and business teams, ensuring that actions are taken to reduce the likelihood of future failures.