Sr. Engineer Site Reliability

$143,100 - $143,100/Yr

Andronico's Community Markets - La Grande, OR

posted 3 months ago

Full-time

Remote - La Grande, OR

1,001-5,000 employees

About the position

Albertsons Companies is at the forefront of the revolution in retail, focusing on innovation and building a culture of belonging. The Technology & Engineering Department is seeking a Staff Site Reliability Engineer for the Retail Operations team located in Pleasanton, CA. This role involves working with a diverse tool suite including Java, Micro-services, Spring Boot, MS-Azure, Python, GIT, Pearl, React, and Jenkins in both cloud and on-prem environments. The primary responsibility of the candidate will be to troubleshoot and resolve incidents, configure and implement alert monitoring for customer-facing applications, and support applications written in Java and shell scripting. As a member of the application support team, the engineer will diagnose, isolate, and debug production issues to quickly resolve customer-facing incidents. They will actively drive post-incident root cause analysis efforts and partner with development engineers to implement corrections to problems associated with supported applications. The role requires providing technical guidance in diagnosing issues, driving collaboration sessions among IT and product groups, and contributing to the design and enhancement of critical services and applications. The engineer will also perform proactive analysis to predict and prevent production incidents, define and implement performance monitoring capabilities, and participate in Production Turnover activities. They will interface with Engineering Managers, Developers, and Build Experts to understand technology requests and business complexities, adhere to Incident, Problem, and Change Management processes, and maintain incident and problem ticket documentation. The position requires providing support for customer-facing activities that require 24x7 availability, including after-hours On-Call rotation activities.

Responsibilities

Diagnose, isolate, and debug production issues to quickly resolve customer-facing incidents.
Actively drive post-incident root cause analysis efforts.
Partner with development engineers to implement corrections to problems associated with supported applications.
Provide technical guidance in the diagnosis of issues as they arise in support of critical applications.
Drive collaboration sessions among IT and product groups to facilitate optimal performance, support, and operation of the relevant services or applications.
Contribute to the design, implementation, and enhancement of critical services and applications.
Perform proactive analysis and troubleshooting to predict and prevent production incidents.
Define, contribute, and implement performance monitoring capabilities for critical services or applications.
Design, configure, and implement automated monitoring rules and alerts.
Participate in Production Turnover activities to bring new platforms and technologies into the environment.
Interface with Engineering Managers, Developers, and Build Experts to understand the technology requested and the business complexities as they relate to IT requirements.
Adhere to Incident, Problem and Change Management process & best practices.
Maintain incident and problem ticket documentation.
Prepare change documentation & implement fixes for recurring issues.
Author and maintain knowledge articles in ServiceNow based on actionable monitoring alerts.
Provide knowledge transfer (KT) sessions amongst peers and offshore team members.
Collaborate with key vendors on functional, performance and capacity improvements.
Foster teamwork.
Manage multiple work streams.
Provide support for customer-facing activities that require 24x7 availability, including after-hours On-Call rotation activities.

Requirements

4-year degree in Computer Science, Information Systems, or related functional field and/or equivalent combination of education or work experience.
5 years of programming experience using various standard scripting languages and high-level programming languages.
Strong troubleshooting skills with an ability to quickly diagnose complex production issues.
Experience with application servers (WebSphere, WebLogic, and/or JBoss) and database technologies (Oracle, DB2, and/or SQL Server).
Experience in UI/Web 2.0 Development (JavaScript, CSS, Ajax, Adobe Flash/Flex, Dojo, YUI, and/or JQuery).
Strong knowledge of UNIX and Windows operating systems.
Experience creating and maintaining application processes and documentation.
Knowledge of current monitoring tools (Grafana, Azure Tools) is required.
Exposure to network concepts and technologies.
Strong experience with the full software development lifecycle and software development methodologies (Agile).
Ability to understand client expectations and to resolve issues that may affect service.
Strong interpersonal skills with the ability to work effectively across multiple levels of the organization.
Ability to mentor, coach and train other application support engineers.
Self-starter, with a demonstrated ability to learn beyond formal training with a strong aptitude for delivering quality products.

Nice-to-haves

Experience in a retail environment is preferred.

Benefits

Medical, dental, vision, disability and life insurance
Sick pay (accrued based on hours worked)
PTO/Vacation Pay (accrued based on hours worked) of Flexible Time Off
Paid holidays (8-9 days annually)
Bereavement pay
Retirement benefits (such as 401(k) eligibility)

Sr. Engineer Site Reliability

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company