This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Oak Ridge National Laboratory - Oak Ridge, TN

posted about 2 months ago

Full-time - Mid Level
Oak Ridge, TN
51-100 employees
Professional, Scientific, and Technical Services

About the position

The Linux HPC Systems Engineer at Oak Ridge National Laboratory (ORNL) is responsible for designing, operating, and maintaining high-performance computing (HPC) clusters and systems that support scientific research. This role is part of the Emerging Technologies & Computing Group, which aims to facilitate research excellence by providing robust HPC systems engineering, integration, and support. The engineer will collaborate with research organizations to optimize workflows, ensure system performance, and promote HPC services, all while adhering to the lab's core values of impact, integrity, teamwork, safety, and service.

Responsibilities

  • Design, operate, and maintain HPC clusters and servers to support scientific research.
  • Advocate and promote HPC services to researchers handling large data sets.
  • Ensure the availability, performance, scalability, and security of production systems.
  • Leverage automation and monitoring solutions to optimize system management practices.
  • Collaborate with technical points of contact to install and tune scientific toolsets for performance.
  • Optimize workflows and monitoring solutions to reduce off-hours support needs.

Requirements

  • Bachelor's degree in computer science, computer engineering, information technology, or a related field.
  • 2 to 4 years of relevant experience in systems engineering or administration.
  • 1+ year managing UNIX/Linux systems.
  • 1+ year utilizing configuration management and automation tools such as Git, Jenkins, Ansible, or Puppet.
  • Proficiency in at least one scripting language such as Bash or Python.
  • Experience in troubleshooting and system administration with Linux servers.
  • Experience supporting large data systems.

Nice-to-haves

  • Understanding of multiple operating systems and cluster technologies.
  • Experience with CentOS/RHEL, Ubuntu, and VMware.
  • Understanding of HPC platforms and SLURM job submissions.
  • Experience with containerized applications in an HPC environment.
  • Knowledge of networking fundamentals including TCP/IP and network diagnostics.
  • Experience with High Performance Parallel File Systems like Lustre or GPFS.
  • Experience with monitoring systems like Grafana or Nagios.

Benefits

  • Medical and retirement plans
  • Flexible work hours
  • On-site fitness facilities
  • Banking and cafeteria services
  • Prescription Drug Plan
  • Dental Plan
  • Vision Plan
  • 401(k) Retirement Plan
  • Contributory Pension Plan
  • Life Insurance
  • Disability Benefits
  • Generous Vacation and Holidays
  • Parental Leave
  • Legal Insurance with Identity Theft Protection
  • Employee Assistance Plan
  • Flexible Spending Accounts
  • Health Savings Accounts
  • Wellness Programs
  • Educational Assistance
  • Relocation Assistance
  • Employee Discounts
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service