Oak Ridge National Laboratory - Oak Ridge, TN

posted about 1 month ago

Full-time - Senior
Oak Ridge, TN
Professional, Scientific, and Technical Services

About the position

The Senior HPC Systems Engineer at Oak Ridge National Laboratory is responsible for designing and deploying capabilities for next-generation leadership computing systems. This role involves performance analysis, system evaluation, and the implementation of solutions for complex technical problems in high-performance computing (HPC). The position is part of the Advanced Technologies Section within the National Center for Computational Sciences, which focuses on improving facility operations and supporting large-scale scientific HPC systems.

Responsibilities

  • Plan and lead the implementation, validation, and deployment of solutions for complex technical problems.
  • Conduct performance analysis for large scale HPC workloads.
  • Evaluate scaling and performance of modern heterogeneous and extreme-scale HPC systems.
  • Manage operating and runtime systems for HPC environments.
  • Analyze large scale network architectures and communication libraries.
  • Perform evaluations of new computing technologies for suitability in large scale HPC systems.
  • Guide future system design requirements and procurement strategies based on evaluations.
  • Analyze the design and deployment process for large scale HPC systems and optimize deployment strategies.
  • Collaborate in authoring peer-reviewed papers, technical papers, reports, and proposals.
  • Support researchers in systems architecture and file and storage systems.

Requirements

  • A Ph.D., M.S., or B.S. in computer science, computer engineering, or a related field.
  • Familiarity with at-scale HPC system design and operations, including DOE ASCR facilities and vendor proposal evaluations.
  • 6+ years of relevant experience outside of education.

Nice-to-haves

  • Excellent interpersonal skills and strong oral and written communication skills.
  • Experience with modern software development practices.
  • Experience with Linux systems programming.
  • Experience with high-speed communication network systems.
  • Experience with heterogeneous systems design and deployment.
  • Familiarity with large-scale system design, deployment, and operations, including DOE ASCR facilities and DOE 413.3B process.
  • Ability to map large-scale application requirements to hardware-software and libraries design.

Benefits

  • Medical and retirement plans
  • Flexible work hours
  • On-site fitness facilities
  • Banking services
  • Cafeteria facilities
  • Prescription Drug Plan
  • Dental Plan
  • Vision Plan
  • 401(k) Retirement Plan
  • Contributory Pension Plan
  • Life Insurance
  • Disability Benefits
  • Generous Vacation and Holidays
  • Parental Leave
  • Legal Insurance with Identity Theft Protection
  • Employee Assistance Plan
  • Flexible Spending Accounts
  • Health Savings Accounts
  • Wellness Programs
  • Educational Assistance
  • Relocation Assistance
  • Employee Discounts
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service