St. Jude Children's Research Hospital - Memphis, TN
posted 4 months ago
St. Jude Children's Research Hospital is seeking an HPC DevOps Engineer (Level II or Senior) to ensure the smooth operation of its High-Performance Computing (HPC) infrastructure. This role is critical in supporting the hospital's mission to advance cures and means of prevention for pediatric catastrophic diseases through research and treatment. The successful candidate will work closely with downstream operation teams to maintain the integrity and performance of the HPC systems, leveraging a deep understanding of both development and operations processes in an HPC environment. The position requires a strong technical background relevant to the hardware and software platforms hosted in the St. Jude data center. The HPC DevOps Engineer will focus on integrated operations by utilizing and developing various automation tools for system configuration, testing, metric collection, monitoring, and self-sufficient actions. This role is pivotal in enabling the full cycle of DevOps operations, which includes setting up necessary tools and hardware/software platforms, defining processes for development, testing, release, update, and support, and striving for continuous improvement through the establishment of CI/CD pipelines. The engineer will also be responsible for monitoring processes throughout their lifecycle, ensuring adherence to defined protocols, and implementing improvements to minimize lead times. In addition to technical responsibilities, the role involves identifying and deploying security measures through continuous vulnerability assessments and risk management. The engineer will handle critical incidents with effective risk management and root cause analysis, document processes for system utilization, and collaborate with end users and downstream teams to understand customer requirements and project KPIs. For senior-level candidates, responsibilities will also include managing periodic reporting to management and customers, stakeholder engagement, and mentoring team members. The position requires regular and predictable attendance to meet the goals and objectives of the department and institution.