Federal Reserve Bank - San Francisco, CA
posted 3 months ago
As a Senior Site Reliability Engineer at the Federal Reserve Bank of San Francisco, you will play a crucial role in the Data & Analytics Services (DAS) Team, where you will have the opportunity to apply your engineering skills across various technology solutions. This position allows you to engage in multiple aspects of product delivery, from inception through design, build, and deployment. You will collaborate with Product Managers, Architects, Engineers, and Customer teams in a dynamic environment, focusing on developing Infrastructure as Code to launch server instances and configure software. Your technical leadership will be essential in planning, designing, and implementing cloud-based infrastructure systems, whether they are traditional or non-traditional. Your responsibilities will include improving and protecting the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of cloud-based software and systems. You will implement, manage, and scale distributed systems in various cloud environments, including public, private, or hybrid clouds. Additionally, you will help implement an automation strategy for cloud services, working closely with architects and developers to reduce toil, minimize human errors, drive scalability, and enhance the reliability of the data platform. You will be responsible for identifying and responding to service failures to ensure compliance with Service-Level Agreements, regularly updating application playbooks to expedite incident mitigation. Collaborating with development teams, you will establish Service-Level Objectives and key Service-Level Indicators, design and deploy Infrastructure-as-Code solutions, and lead postmortem exercises to improve operational readiness. Your role will also involve conducting Production Readiness Reviews and facilitating compliance by rehydrating infrastructure on schedule and empowering developers with self-service capabilities. Incident response, on-call activities, and managing system activities to an error budget will also be part of your responsibilities.