St. Jude Children's Research Hospital - Memphis, TN

posted 4 months ago

Full-time - Mid Level
Memphis, TN
Hospitals

About the position

St. Jude Children's Research Hospital is seeking an HPC DevOps Engineer (Level II or Senior) to ensure the smooth operation of its High-Performance Computing (HPC) infrastructure. This role is critical in supporting the hospital's mission to advance cures and means of prevention for pediatric catastrophic diseases through research and treatment. The successful candidate will work closely with downstream operation teams to maintain the integrity and performance of the HPC systems, leveraging a deep understanding of both development and operations processes in an HPC environment. The position requires a strong technical background relevant to the hardware and software platforms hosted in the St. Jude data center. The HPC DevOps Engineer will focus on integrated operations by utilizing and developing various automation tools for system configuration, testing, metric collection, monitoring, and self-sufficient actions. This role is pivotal in enabling the full cycle of DevOps operations, which includes setting up necessary tools and hardware/software platforms, defining processes for development, testing, release, update, and support, and striving for continuous improvement through the establishment of CI/CD pipelines. The engineer will also be responsible for monitoring processes throughout their lifecycle, ensuring adherence to defined protocols, and implementing improvements to minimize lead times. In addition to technical responsibilities, the role involves identifying and deploying security measures through continuous vulnerability assessments and risk management. The engineer will handle critical incidents with effective risk management and root cause analysis, document processes for system utilization, and collaborate with end users and downstream teams to understand customer requirements and project KPIs. For senior-level candidates, responsibilities will also include managing periodic reporting to management and customers, stakeholder engagement, and mentoring team members. The position requires regular and predictable attendance to meet the goals and objectives of the department and institution.

Responsibilities

  • Utilize both commercially available and open-source platforms to implement/develop automation tools for IT systems configuration and infrastructure maintenance.
  • Enable the full cycle of DevOps operations by setting up tools and required hardware/software platforms (e.g. appropriate CI/CD tools).
  • Define processes needed for DevOps operations, including development, test, release, update, and support.
  • Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipelines).
  • Monitor the processes during the entire lifecycle for adherence and update or create new processes for improvement and minimizing lead time.
  • Identify and deploy security measures by continuously performing vulnerability assessments and risk management.
  • Handle critical incidents with swift risk management and root cause analysis.
  • Document processes for utilizing systems.
  • Work with both end users and downstream teams to understand customer requirements and project KPIs.
  • Monitor and measure customer experience and KPIs.
  • Coordinate and communicate within the team and with customers.
  • (Senior Role) Manage periodic reporting on the progress to management and the customer.
  • (Senior Role) Manage stakeholders and external interfaces.
  • (Senior Role) Mentor and guide team members.
  • Perform other duties as assigned to meet the goals and objectives of the department and institution.

Requirements

  • Bachelor's degree in Computer Science, Engineering, Business or related field of study required; Master's or Doctorate degree preferred.
  • For Senior HPC DevOps Engineer: Minimum of four (4) years of IT experience with experience in infrastructure operations and engineering environments, preferably DevOps environments.
  • Experience in infrastructure maintenance, systems configuration, and system security management.
  • Experience in operations automation, IaaS development, or other large hardware/software system optimization.
  • Experience in customer service leveraging commercialized ticketing systems, such as JIRA, Service-Now, etc.
  • Possess the technical skill to review, verify, and validate the software code developed in the project.
  • Some experience in business stakeholder engagement and management.
  • For HPC DevOps Engineer II: Minimum of two (2) years of IT experience with experience in infrastructure operations and engineering environments, preferably DevOps environments.
  • Some experience in infrastructure maintenance, systems configuration, and system security management.
  • Possess the technical skill to review, verify, and validate the software code developed in the project.
  • Some experience working with business stakeholders to identify and document requirements.
  • Proven track record as a quick learner.

Benefits

  • Competitive salary range of $94,640 - $169,520 per year based on experience and qualifications.
  • Comprehensive health insurance coverage.
  • Retirement savings plan with 401(k) options.
  • Paid time off and holidays.
  • Opportunities for professional development and continuing education.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service