Sr Site Reliability Engineer

$134,200 - $164,000/Yr

Boeing

posted 5 months ago

Full-time - Mid Level

Remote

1,001-5,000 employees

Transportation Equipment Manufacturing

About the position

As a Sr Site Reliability Engineer at BECU, you will have the opportunity to make a significant, positive change in our journey. Your contributions will be highly valued, and your growth will be continually fostered. This role is remote within WA, OR, ID, AZ, TX, SC, GA. You will analyze complex telemetry and problem trends to reduce outages and time to resolution, recommending and implementing telemetry for operations and engineering to increase reliability and availability. You will identify areas of opportunity and improvement, mitigate member-facing incidents and outages, and quickly identify and remediate root causes. In this position, you will gather and analyze data to understand environmental risks, develop strategies to mitigate those risks, and improve the reliability and availability of our systems. You will communicate and provide guidance on best practices of technical expertise, participate in post-incident reviews, and collaborate with team members to implement best practices. Additionally, you will manage, develop, and automate solutions, scripting, and automation of hybrid environments, perform code and peer reviews, and develop prototypes, proofs of concept, and solutions. You will also configure and manage production workloads in Azure and on-premises datacenters, and construct and maintain infrastructure-as-code using Terraform, ARM, and BICEP. This isn't just about ticking off tasks on a list; it's about making a significant, positive change in BECU's journey, where your contributions are valued, and your growth is continually fostered.

Responsibilities

Analyze complex telemetry and problem trends to reduce outages and time to resolution.
Recommend and implement telemetry for operations and engineering to increase reliability and availability.
Identify areas of opportunity and improvement, mitigate member-facing incidents and outages, and quickly identify and remediate root causes.
Gather and analyze data to understand environmental risks and develop strategies to mitigate those risks.
Communicate and provide guidance on best practices of technical expertise and participate in post-incident reviews.
Collaborate with team members to implement best practices.
Manage, develop, and automate solutions, scripting, and automation of hybrid environments.
Perform code and peer reviews, and develop prototypes, proofs of concept, and solutions.
Configure and manage production workloads in Azure and on-premises datacenters.
Construct and maintain infrastructure-as-code using Terraform, ARM, and BICEP.

Requirements

Bachelor's degree in computer science or related discipline, or equivalent experience.
Minimum 5 years of related functional experience.
Minimum 2 years of experience in System Architecture, Advanced Operating System management, Software development or Test, Public or Private Cloud Engineering or Operations, DevOps, or other modern enterprise infrastructure systems or practices like Infrastructure-as-Code.
Experience optimizing and debugging complex systems by leveraging monitoring, debugging, and logs to identify root causes.
Experience with enterprise observability and how to create observability using enterprising tooling.
Experience with IT Disaster Recovery patterns to support Business Continuity.
Demonstrated proficiency with verbal and written skills to effectively communicate with executives, leadership, product groups and peers.
Ability to work well in a hybrid team environment.
System administration and automation with PowerShell, Python, or bash required.
Experience with public cloud (Azure/AWS/Google Cloud) technologies required.
Knowledge of agile methodologies and lean principles (e.g., Scrum, Kanban, Demos, Retrospectives, etc.) required.

Nice-to-haves

Certifications in Microsoft and/or competing Cloud Technologies (Microsoft AZ-104, AZ-400, Hashicorp Terraform Associate, GitHub Foundation, GitHub Actions).
Experience with cloud infrastructure environments, preferably Azure, and expertise in Infrastructure as Code (IAC) principles and tools, such as BICEP, PowerShell, ARM templates and/or Terraform.
Previous experience using ITIL, traditional infrastructure, service management, modern software delivery practices, end-to-end automation concepts, and how to leverage tools to achieve these goals preferred.
Experience with Continuous Integration and Continuous Delivery systems and tools such as Azure DevOps Services, GitHub Actions, Jenkins, and similar.
Experience at creating build/deployment pipelines in YAML preferred.
Advanced experience integrating security at design time, and regularly evaluating efficacy of security measures preferred.

Benefits

Medical, dental, vision and life insurance coverage.
Disability and AD&D insurance.
Health care and dependent care flexible spending accounts.
Health savings accounts for eligible employees.
401k plan and employer-funded retirement plan.
Accrual of 6.16 hours of paid time off (PTO) per pay period, up to a maximum of 160 PTO hours per year.
Ten paid holidays throughout the calendar year.

Sr Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company