Unclassified - Cary, NC

posted 3 months ago

Full-time - Entry Level
Remote - Cary, NC

About the position

As a Site Reliability Engineer/ITAO at Deutsche Bank, you will play a crucial role in ensuring the stability and performance of applications within the Corporate & Investment Banking sector. Your primary responsibility will be to collaborate closely with application teams to maintain well-monitored applications that are resilient to faults. This involves agreeing upon and periodically reviewing Service Level Agreements (SLAs) and Service Level Objectives (SLOs) to ensure that applications meet the required availability standards based on their criticality. You will also be responsible for maintaining Error Budgets for the application teams, which will help prevent releases if the production stability and availability fall below acceptable levels. In this role, you will leverage your knowledge and experience with relevant tools used in the Site Reliability Engineering (SRE) environment. You will specialize in one or more technical domains to provide optimum service levels in line with SLAs and Operating Level Agreements (OLAs). Your work will involve managing application availability, performance, and compliance, as well as organizing Level 3 support for applications in collaboration with development teams. You will identify gaps in security and compliance, driving remediation efforts while managing the technical roadmap of applications to ensure timely upgrades, patches, and strategic changes are implemented. Additionally, you will build monitoring solutions to alert teams in the event of failures or performance issues, optimizing uptime and providing feedback loops to improve application resilience. You will also work to identify and eliminate or automate toil for both application teams and the SRE team, enhancing overall effectiveness. Your role will require you to manage the resolution of outages in coordination with both technical and business teams, ensuring that actions are taken to reduce the likelihood of future failures.

Responsibilities

  • Collaborate with application teams to ensure stable and well-monitored applications that are resilient to faults.
  • Agree and periodically review Service Level Agreements (SLAs) and Service Level Objectives (SLOs) to maintain application availability.
  • Maintain Error Budgets for application teams and prevent releases if production stability is too low.
  • Manage application availability, performance, and compliance, and organize Level 3 support with development teams.
  • Identify gaps in security and compliance and drive remediation efforts.
  • Manage the technical roadmap of applications, ensuring timely upgrades, patches, and strategic changes are applied.
  • Build monitoring solutions to alert teams in the event of failures or performance issues.
  • Provide feedback loops to improve application resilience across multiple teams.
  • Identify and eliminate or automate toil for application and SRE teams to optimize effectiveness.
  • Manage resolution of outages with technical and business teams.

Requirements

  • Bachelor's degree in Computer Science or IT-related discipline, or equivalent experience in Information Technology (IT) in large corporate environments.
  • Demonstrable experience in Site Reliability Engineering (SRE).
  • Experience with Prometheus/Grafana monitoring stack and scripting skills (Groovy, shell, etc.).
  • Working experience with UNIX and ORACLE database; SQL Server and WebLogic experience is desirable.
  • Familiarity with container orchestration tools such as Openshift, Kubernetes, Docker Swarm, and CI/CD tooling.

Nice-to-haves

  • Proficiency in a high-level programming language is desirable.
  • Excellent analytical and problem-solving skills.

Benefits

  • Hybrid working model with up to 60% work from home flexibility.
  • Generous vacation, personal, and volunteer days.
  • Access to Employee Resource Groups for community engagement.
  • Competitive compensation packages including health and wellbeing benefits.
  • Retirement savings plans and parental leave.
  • Family building benefits and educational resources.
  • Matching gift and volunteer programs.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service