University of Chicago-posted 10 months ago
$83,750 - $107,500/Yr
Full-time
Chicago, IL

The University of Chicago Research Computing Center (RCC) is seeking a qualified Jr. HPC System Administrator & Programmer to join its Systems and Operations Team that manages and supports an ecosystem of HPC systems and services. The individual in this position will contribute to the ongoing efforts to streamline RCC processes, maintain the backend tools of the HPC environment, develop automated workflows to support the system administration efforts and improve the ways in which RCC enables transformational computational research at the University of Chicago. The job duties will primarily include development and maintenance of backend software and deployment automation for the systems in the RCC environment. The Jr. HPC System Administrator & Programmer will also closely work with the application development team in consolidating continuous integration and continuous deployment approaches (CI/CD) and supporting faculty projects. The ideal candidate will possess a strong technical background in programming and HPC, an analytical mind, and be comfortable working as part of a team. The job participates in the design of automated, scalable, and rapidly deployable solutions to systems infrastructure and server configuration. Installs, configures, and maintains operating systems, monitoring and alerting systems, utility software, and firewalls. Plans and executes hands-on maintenance for production servers as well as Windows and Linux servers. This is a hybrid position requiring at least 3 days a week onsite.

  • Work with moderate guidance to administer simple systems, assist in the administration of larger systems in an HPC environment, including both software and hardware.
  • Install, design, configure and maintain tools and scripts that are used for systems provisioning and configuration management.
  • Develop and maintain system software to automate operations such as management of HPC user accounts and resource allocations (i.e., computing cycles and storage quotas).
  • Maintain and further develop database-backed solutions and software to track and monitor HPC inventory including servers, network devices, compute nodes, and their respective details (specifications, locations, warranty status and renewals, health status, etc.).
  • Design and develop tools to automate tasks such as: Collection of metrics and usage information, Backup of research data to different storage tiers.
  • Identify and apply security patches and upgrades.
  • Execution of benchmarks and creation of a benchmark performance database.
  • Assist with the implementation, integration, administration and maintenance of security and infrastructure monitoring solutions and dashboards by developing tools and scripts, and also by leveraging existing open-source and commercial solutions.
  • Design and develop tools and metrics to assist RCC leadership with visualizing, analyzing and reporting usages information and other system statistics.
  • Assist with deployment, configuration and customization of applications commonly used to support an academic HPC environment such as XDMoD, Open OnDemand, ColdFront, etc.
  • Proactively troubleshoot issues, and respond to complex user support requests.
  • Create and maintain documentation related to tools and solutions developed, system administration procedures.
  • Work with other internal teams to provide and gather feedback regarding user support and service delivery, identify and foster opportunities for improvement.
  • Assist with maintaining a knowledge base of useful systems-related information and standard operating procedures that other internal teams can consult when providing user support.
  • Become involved with mentoring students and interns working in the Systems team.
  • Contribute to developing software, tools and/or platforms for the reproducibility of scientific research.
  • Maintains complex system and network administration functions.
  • Works with moderate guidance to administer simple systems and assists in the administration of larger systems.
  • Ensures integrity by implementing appropriate routine software and hardware solutions.
  • Conducts routine hardware and software audits of workstations, backing up all information.
  • Performs other related work as needed.
  • Minimum requirements include a college or university degree in related field.
  • Minimum requirements include knowledge and skills developed through 2-5 years of work experience in a related job discipline.
  • Master’s in Computer Science or closely related field.
  • Minimum of two year’s experience working with HPC systems or equivalent experience.
  • Experience with basic system configuration, fluent use of the command line interface, experience with building and installing software.
  • Experience with Python programming, including various packages for data processing (i.e., Numpy, Scipy, Pandas, Matplotlib).
  • Experience with shell scripting (Bash).
  • Experience with open-source SQL databases (deployment, configuration, modeling, access).
  • Experience with development in a Linux environment, version control using Git, GitLab/GitHub development practices.
  • Experience with container technology (Docker, Kubernetes).
  • Experience with automation and configuration management tools (Ansible, Puppet).
  • Experience implementing automation and monitoring of infrastructure and systems.
  • Experience reading, modifying, and porting existing Perl scripts.
  • Experience in setting up and executing benchmarks in an HPC environment and analyzing their results systematically.
  • Experience in creating and maintaining documentation that describes implemented solutions and standard operating procedures.
  • The University of Chicago offers a wide range of benefits programs and resources for eligible employees, including health, retirement, and paid time off.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service