National Renewable Energy Laboratory - Golden, CO

posted about 1 month ago

Full-time - Entry Level
Golden, CO
Professional, Scientific, and Technical Services

About the position

The National Renewable Energy Laboratory (NREL) is seeking a full-time Machine Learning Research Engineer to join the AI, Learning, and Intelligent Systems (ALIS) Group within the Computational Science Center. This position is focused on the development of large-scale foundation models aimed at scientific discovery. The successful candidate will leverage modern machine learning techniques to address significant challenges in science, contributing to NREL's mission of advancing renewable energy and energy efficiency technologies. This role is remote-friendly and provides access to state-of-the-art high-performance computing (HPC) systems equipped with the latest hardware accelerators for distributed computing. In this position, the engineer will be responsible for designing, developing, configuring, debugging, deploying, and supporting frameworks and workflows on scalable, accelerated computing hardware and cloud architectures. The role involves close collaboration with research scientists to implement, train, and deploy large-scale machine learning models, as well as assisting in the publication of research findings. The engineer will also work alongside computing professionals to optimize the efficiency of model training and inference on advanced computing hardware. Candidates should possess a relevant technical background in machine learning, with experience in implementing, training, and deploying models on high-performance systems or cloud environments. Familiarity with large language models is preferred, and a genuine interest in AI for science and NREL's mission is essential. NREL values diverse perspectives and encourages individuals from various backgrounds to apply, even if they do not meet every listed expectation.

Responsibilities

  • Design, develop, configure, debug, deploy, and support frameworks and workflows on scalable, accelerated computing hardware and cloud architectures centered on modern machine learning.
  • Collaborate closely with research scientists to implement, train, and deploy large-scale machine learning models for scientific discovery.
  • Assist in running experiments involving the deployment of machine learning models via containerized workflows.
  • Work with computing professionals to optimize large-scale model training and inference efficiency on advanced computing hardware.

Requirements

  • Relevant Bachelor's Degree and 2 or more years of experience or equivalent relevant education/experience, or a relevant Master's Degree or equivalent relevant education/experience.
  • General knowledge and application of standards, principles, concepts, and techniques in the specific field.
  • Skilled in analytical techniques and practices, and problem-solving.
  • Skilled in written and verbal communication.
  • Intermediate programming ability with various computer software programs and information systems.
  • Proficient at Git and GitHub.
  • Experience maintaining large, open-source software projects.
  • Familiarity with PyTorch and Tensorflow.

Nice-to-haves

  • Ph.D. in computer science, applied math, a related field, or BSc/MSc plus relevant experience.
  • Familiarity with cloud platforms and services (e.g., AWS, Google Cloud, Azure) for deploying machine learning models.
  • Experience with containerization and scaling including Docker and Apptainer.
  • Background and experience in managing software dependencies in HPC and Cloud environments with common tools such as Modules, Conda, Spack, and containerization.
  • Experience with distributed training frameworks and tools such as DeepSpeed, HuggingFace transformers and accelerate, Megatron, etc.
  • Experience writing custom CUDA code and/or ROCm.

Benefits

  • Medical, dental, and vision insurance
  • Short- and long-term disability insurance
  • Pension benefits
  • 403(b) Employee Savings Plan with employer match
  • Life and accidental death and dismemberment (AD&D) insurance
  • Personal time off (PTO) and sick leave
  • Paid holidays
  • Tuition reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service