Arizona Public Service - Phoenix, AZ

posted 2 months ago

Full-time - Mid Level
Phoenix, AZ
Utilities

About the position

As an Enterprise Monitoring Engineer | Site Reliability Engineer at APS, you will play a crucial role in maintaining, installing, and configuring cutting-edge infrastructure technologies. Your expertise will be instrumental in ensuring the seamless operation of our IT environment, with a focus on proactive monitoring and automation. You will oversee the installation, maintenance, and configuration of both current and next-generation infrastructure technologies, ensuring optimal performance and reliability. This position requires you to work closely with technology teams across the enterprise—including application, database, server, storage, network, and security teams—to develop and implement effective monitoring solutions that address the unique needs of each area. Your responsibilities will include designing and improving alerting systems with a focus on proactive issue detection and self-healing capabilities, which will help reduce downtime and enhance system resilience. You will also be expected to share your knowledge and experience with other platform and product engineers, mentoring them to deliver high-value solutions across the organization. Additionally, you will create and maintain custom code that supports the enterprise's monitoring needs, ensuring it is both testable and sustainable over time. Collaboration with stakeholders will be key as you translate product goals into prioritized platform requirements, ensuring alignment with business objectives. You will develop tools and services that transform complex challenges into turnkey solutions, enabling efficient operations in a sophisticated and evolving IT environment. Compliance with strict regulatory procedures is essential, as you will ensure all operations and developments maintain the highest standards of security and governance. Participation in a rotating on-call schedule will also be required to provide critical support and address issues as they arise, ensuring the stability of the IT infrastructure.

Responsibilities

  • Oversee the installation, maintenance, and configuration of both current and next-generation infrastructure technologies.
  • Work closely with technology teams across the enterprise to develop and implement effective monitoring solutions.
  • Design and improve alerting systems with a focus on proactive issue detection and self-healing capabilities.
  • Share knowledge and experience with other platform and product engineers, mentoring them to deliver high-value solutions.
  • Create and maintain custom code that supports the enterprise's monitoring needs.
  • Collaborate with stakeholders to translate product goals into prioritized platform requirements.
  • Develop tools and services that transform complex challenges into turnkey solutions.
  • Ensure all operations and developments comply with strict regulatory procedures.
  • Participate in a rotating on-call schedule to provide critical support and address issues.

Requirements

  • Bachelor's degree in Information Technology or related field.
  • Two (2) years of prior relevant experience or equivalent combination of education and directly related experience for Enterprise Monitoring Engineer II.
  • Five (5) years of prior relevant experience or equivalent combination of education and directly related experience for Enterprise Monitoring Engineer III.
  • Working technical knowledge gained through experience within a job area or system.
  • Experience with protocols such as SNMP, SSH, WMI.

Nice-to-haves

  • Knowledge of all infrastructure technologies including compute, storage, networking, and software solutions.
  • Experience with DevOps tools like Terraform, Jenkins, or Ansible.
  • Ability to develop and write technical documentation.
  • Experience working in an agile environment.
  • Familiarity with Python or other comparable skills.

Benefits

  • Flexible work options (home-based or office-based)
  • Opportunities for professional development and training
  • Participation in a rotating on-call schedule for critical support
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service