Sr Site Reliability Engineer

$142,002 - $192,200/Yr

Workday - Atlanta, GA

posted 2 months ago

Full-time - Mid Level
Remote - Atlanta, GA
Publishing Industries

About the position

As a Senior Site Reliability Engineer (SRE) at Workday, Inc., you will play a crucial role in ensuring the reliability and performance of our services across various environments, including production, sandbox, implementation, sales, training, and partner services. Your primary responsibilities will involve developing, supporting, and enhancing utilities that automate manual tasks and streamline processes. You will also be engaged in server capacity additions on both Baremetal and Private cloud infrastructures. In this position, you will be responsible for upgrading, patching, executing, and monitoring the processes that keep the Workday Service operational. This includes creating and maintaining scripts, applying patches, and making configuration changes to our systems, either manually or through automation tools. A key aspect of your role will be to ensure that we consistently meet our Service Level Agreements (SLAs). You will also be tasked with identifying, documenting, and following up on issues encountered during all phases of service delivery. Additionally, you will work on enhancements and improvements for monitoring, alerting, and tracing not only internal services but also, most importantly, production services. Your contributions will be vital in maintaining the high standards of service reliability that our customers expect from Workday.

Responsibilities

  • Develop, support, and improve utilities that automate manual tasks and streamline processes.
  • Work on server capacity additions on Baremetal and Private cloud.
  • Upgrade, patch, execute, and monitor processes that keep the Workday Service operational in all environments.
  • Create and maintain scripts for system management.
  • Apply patches and make configuration changes to systems manually or using automation tools.
  • Ensure compliance with Service Level Agreements (SLAs).
  • Identify, document, and follow up on issues found during service delivery phases.
  • Enhance and improve monitoring, alerting, and tracing of internal and production services.

Requirements

  • Experience in site reliability engineering or a related field.
  • Proficiency in scripting and automation tools.
  • Strong understanding of server capacity management.
  • Experience with cloud infrastructure, particularly Baremetal and Private cloud.
  • Ability to troubleshoot and resolve issues in a timely manner.
  • Familiarity with monitoring and alerting systems.

Nice-to-haves

  • Experience with Workday services or similar enterprise applications.
  • Knowledge of best practices in service reliability and performance optimization.
  • Familiarity with ITIL or similar frameworks.

Benefits

  • Workday Bonus Plan eligibility.
  • Role-specific commission/bonus opportunities.
  • Annual refresh stock grants.
  • Comprehensive benefits package.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service