University of California - Los Angeles, CA
posted 4 months ago
The HPC System Administrator position at UCLA's Office of Advanced Research Computing (OARC) is a critical role that supports the university's mission of education, research, and service through innovative technology practices. The OARC High Performance Computing (HPC) Systems Research Technology Group (RTG) is responsible for supporting thousands of UCLA researchers and over 300 research groups. This is achieved through consultation and the operation of the Hoffman2 High Performance Research Cluster, which consists of approximately 1000 compute nodes, GPU nodes, high-speed networking, high-performance storage, and extensive hardware and software support infrastructure across multiple data centers. As an HPC System Administrator, you will serve as a technical expert in the areas of systems and application software development, HPC cluster system administration, and management of the backup system environment. The role requires a strong understanding of HPC cluster architectures and concepts, as well as the ability to stay current with industry best practices. You will be expected to work independently or as part of a development team, effectively estimating time and effort required to complete tasks, and analyzing, benchmarking, debugging, and testing software in a technically sound manner. The position requires the ability to communicate effectively with diverse stakeholders, including researchers, peers, and management, and to write well-organized, complete, and technically correct documents and procedures. The HPC System Administrator will also need to demonstrate problem-solving skills, the ability to prioritize tasks, and the capability to manage projects effectively. Flexibility in work schedules may be considered based on operational needs, and the position requires working from UCLA's Westwood campus as operational demands dictate.