Kforce - Boulder, CO

posted about 1 month ago

Full-time - Mid Level
Boulder, CO
Administrative and Support Services

About the position

The HPC Linux Systems Administrator will support the National Oceanic and Atmospheric Administration (NOAA) by maintaining the operational readiness and availability of high-performance computing systems. This role involves managing new technology insertions, providing remote technical support, and collaborating with other NOAA sites. The position requires hands-on technical expertise and the ability to troubleshoot and resolve operational issues effectively.

Responsibilities

  • Maintain operational readiness and availability of NOAA's high-performance computing systems.
  • Manage and support new technology insertions.
  • Provide remote technical support and collaboration with other NOAA sites.
  • Develop and deploy monitoring capabilities for HPC systems.
  • Implement tools for cluster administration.
  • Perform hardware break/fix support, including node and board-level replacements.
  • Manage spare part inventories and perform tracking of vendor RMAs.
  • Enhance user and system administration online documentation repositories.
  • Support HPC system users through a helpdesk ticketing system.

Requirements

  • Bachelor's degree or 8+ years of experience in Systems Administration or IT support.
  • Hands-on experience with computer hardware maintenance and troubleshooting.
  • Programming or scripting knowledge in at least one language (e.g., Bash, Perl, Python).
  • Experience deploying and managing large-scale HPC systems using OS provisioning tools (e.g., xCAT, Warewulf).
  • Experience using configuration management tools (e.g., Ansible, Puppet).
  • Linux system administration experience (e.g., RedHat or Rocky Linux).
  • Batch management/scheduling experience, preferably with Slurm.
  • Network interconnect configuration and monitoring experience (e.g., InfiniBand, Ethernet).
  • Strong writing skills for technical documents and user documentation.

Nice-to-haves

  • Team player with the ability to work in diverse technical support environments.
  • Resourceful with initiative for independent troubleshooting.
  • Willingness to learn and apply new knowledge.
  • Disciplined troubleshooting skills with creative problem-solving abilities.
  • Attention to detail in time management and analytical thinking.

Benefits

  • Medical, dental, and vision insurance
  • Health Savings Account (HSA)
  • Flexible Spending Account (FSA)
  • 401(k) plan
  • Life, disability, and AD&D insurance
  • Paid time off for salaried personnel
  • Paid sick leave for hourly employees on Service Contract Act projects
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service