Southern Methodist Universityposted about 2 months ago
Full-time • Mid Level
Dallas, TX
Educational Services

About the position

This role is an on-campus, in-person position. Dedicated to supporting SMU's research community, the Senior System Administrator for High Performance Computing (HPC) works exclusively to design, build, maintain, operate and manage HPC systems at SMU. This position shares responsibility for university HPC technical support as member of a two-person HPC systems infrastructure team. This position also assists with Enterprise Linux support. This position provides hardware, software and end-user support for SMU's growing number of research faculty and center compute resources dedicated to advancement of SMU research activities. Demonstrates advanced knowledge with all the technical tools required to perform the job. Subject matter expert in primary areas of support. Able to solve complex problems crossing multiple research disciplines with little or no escalation support. Effective technical resource to others to resolve problems and implement projects.

Responsibilities

  • Design, plan, deploy, administer services & troubleshoot issues related to HPC services for research at SMU.
  • Install and maintain cluster environments and provision systems using automated installation methods.
  • Manage/maintain Lustre parallel file system and NFS storage.
  • Manage/maintain InfiniBand high performance interconnect fabric.
  • Configure, manage, monitor SLURM scheduling & queuing system.
  • Develop/maintain programs/scripts that aid in operation and automation of administrative tasks using various shell and scripting languages (bash, Perl, Python).
  • Compile, install, and port software in support needed by SMU researchers.
  • Build and deploy open source and vendor/commercial software required by researchers.
  • Plan projects, communicate with end users and management, provide updates and expectations management.
  • Document all configurations, procedures, and changes.
  • Diagnose and resolve system and operational problems with research systems.
  • Work with researchers and constituents to diagnose and optimize workloads.
  • Participate in on call support of research infrastructure.
  • Coordinate with vendors to resolve hardware and software problems.
  • Ensure hardware firmware and software revision levels are maintained at the appropriate level on HPC research systems.
  • Keep current with research computing, HPC technology trends and best practices.

Requirements

  • Bachelor's degree is required.
  • A minimum of six years of full time Linux system administration experience in a large computing environment is required.
  • Candidate must demonstrate clear, professional communication to work with team members and customers of diverse technical abilities.
  • Candidate must demonstrate strong written communication skills.
  • Candidate must possess strong problem-solving skills with the ability to identify and analyze problems, as well as devise solutions.
  • Candidate must have strong organizational, planning and time management skills.

Nice-to-haves

  • Experience with NVidia DGX, Containers and Kubernetes.
  • Knowledge of reporting tools including XDMO.
  • Work experience installing and maintaining clustered environments and provisioning systems using automated installation methods.
  • Direct experience working with InfiniBand and knowledge of configuration and management of SLURM or other scheduling and queuing systems.
  • Familiarity with DDN hardware and the Lustre file system.
  • Proficiency in supporting Nvidia/Mellanox InfiniBand networks.
  • Competence with Bright Cluster Manager.
  • Knowledge of Nvidia DGX systems.

Benefits

  • Broad, competitive array of health and related benefits.
  • Traditional benefits such as health, dental, and vision plans.
  • Wide range of wellness programs.
  • Array of retirement programs.
  • Access to a wide variety of professional and personal development opportunities, including tuition benefits.

Job Keywords

Hard Skills
  • InfiniBand
  • Kubernetes
  • Linux
  • Perl
  • Python
  • 0F42BG9vMua 5y67uxAR9
  • 368x7si 2LPZ4RJEH
  • 5ahcdgUvHP qEOxPDjIH03n
  • 7ngp9Kz 7QwkCg0
  • 9XzC7LIqEFPtsO FkOGLm02
  • BGsd1t793PCSrZ pNLBcPAjx9r
  • D9oGNjZBa PK92yEg
  • dbEwDhL NSX05p9EtQ8fkzA
  • DcrzNl2CjhiK deM6ba9SQt2O
  • DcxoeGYB EgXbfGPTQ
  • f3WruAes28EaIM GR7rDY4yL
  • hWuTFG4nL FusDLJXyx5tq
  • iXYGAj9L2 4EdcHOnhBXj
  • JxnX8zovpt4MWL 3teUoIW62HC
  • KtEps vgGYmwxJ
  • l2v4ar A7zWaVl8tfyx
  • M13e0 y3B7Ir
  • miZu7R
  • msb0X vS1G7yf2grIQ XqagwZhTpc
  • NkaXrRYFzbZcE s6lECoYVPRfOBF
  • OTN4dYs NGaexvW2rFCSfch
  • oXnM1bGYjOe QzSWCAR41vu
  • RbiOUA0D WZlJcvS6ysu08B5
  • sknJVqu lE6h8zigUpDCYFG
  • to3fu9vgOCe YZ5Pm09
  • U67rfpn2 PuaZNflDAVE
  • UjzflopBMK xuJelzOr
  • WEGIxnA YbsEJ1M4Bvp8g
  • xCIDO hflyXYGs
  • XwdlZ237fM OeDCoPw6QEf
  • z5cNU1Gn jVXKOtSonCai
Soft Skills
  • i8KJ1UE5 h5qZuALe
  • ndHk0DAO KbZW9Fmt HugTBXcx
  • oK8jr msnqBPSzgf0
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service