Texas A&M University - Kingsville, TX

posted 26 days ago

Full-time - Mid Level
Kingsville, TX
5,001-10,000 employees
Educational Services

About the position

The High-Performance Computing Engineer (HPC) at Texas A&M University - Kingsville is responsible for the design, development, and operational management of the institution's high-performance computing resources. This role involves collaborating with faculty, researchers, and students to support their computational research projects, ensuring that the HPC infrastructure meets their needs, and optimizing computational methods to facilitate groundbreaking research across various disciplines.

Responsibilities

  • Design and implement HPC infrastructure, including compute clusters, storage, and interconnects.
  • Evaluate and integrate advancements in HPC, cloud, and storage technology.
  • Manage and optimize HPC clusters, addressing hardware, software, and networking issues.
  • Perform system administration tasks on HPC clusters, including configuration, maintenance, and troubleshooting.
  • Monitor performance, troubleshoot, and implement security measures.
  • Provide technical support and training for researchers on HPC tools and best practices.
  • Organize training sessions and workshops on HPC best practices and programming techniques.
  • Collaborate with researchers on computational strategies and code optimization.
  • Represent the department in strategic planning and advisory roles.
  • Guide IT strategies to support teaching, research, and service goals.
  • Collaborate and advise the CIO and other executive staff on IT needs.
  • Establish information technology strategy and direction to achieve university goals.
  • Deploy and maintain scientific software and development tools.
  • Develop scripts and tools to automate tasks and enhance workflows.
  • Regularly review and document disaster recovery and business continuity procedures.
  • Assess HPC utilization, lifecycle, and performance for improvement opportunities.
  • Design, test, and verify the disaster recovery plan.
  • Lead administrator for campus HPC systems and document performance analyses.
  • Identify and implement solutions to advance computational research.
  • Develop policies for data integrity, backup, and availability.
  • Design scalable storage solutions for efficient data access and integration.
  • Build partnerships with industry, academic institutions, and HPC networks.
  • Create training programs and documentation to support organizational needs.

Requirements

  • Bachelor's degree or an equivalent combination of education and experience.
  • Six years of related experience in high-performance computing.

Nice-to-haves

  • Master's in Computer or Computational Science, Statistics, or Engineering program.
  • Ten years or more experience in HPC related to hands-on system administration and management of large-scale supercomputing clusters.
  • Five years of management and leadership experience in HPC or research computing centers.
  • Experience with computing clusters in Windows and Linux and virtualized environments.
  • Knowledge of scripting languages like Bash, Python, and Perl.
  • Knowledge of C/C++, Fortran, CUDA, OpenCL, OpenMP, and MPI for scientific computing.
  • Experience with configuration management tools like Puppet, Chef, Ansible, Salt.
  • Knowledge of container technologies such as Docker, Singularity, and Kubernetes.

Benefits

  • Commensurate salary based on experience and qualifications.
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service