Senior AI and Machine Learning Engineer

$128,000 - $295,000/Yr

Hewlett Packard Enterprise

posted 2 months ago

Full-time - Senior

Remote

5,001-10,000 employees

Computer and Electronic Product Manufacturing

About the position

Hewlett Packard Enterprise (HPE) is seeking a Senior AI and Machine Learning Engineer to join our High Performance Computing, AI and Labs team. This role is designated as ‘Remote/Teleworker’, allowing you to primarily work from home. HPE is a global edge-to-cloud company that is committed to advancing the way people live and work. We focus on delivering innovative solutions that accelerate our customers' digital transformation, enabling them to tackle complex, data-intensive workloads. Our team combines deep expertise with the development of cutting-edge supercomputers, defining the next era of computing and delivering valuable insights and innovations. As a Senior AI and Machine Learning Engineer, you will play a critical role in this mission, working on high-performance computing systems and AI workloads. In this position, you will be responsible for installing and configuring complex IT infrastructure components, including servers, storage, and networks. You will develop software scripts and configurations to automate deployment processes and study the performance of Large Language Models running on HPE GPU servers. Your role will involve performing system-level analysis of server workloads across various HPE platforms, including those running deep learning and machine learning code, utilizing accelerated hardware and high-speed networks like InfiniBand. You will also write white papers and guidance documents for AI workload and model selection, capturing and reviewing system performance data to understand workload behavior. Additionally, you will communicate technical work effectively to non-technical colleagues and provide guidance to less-experienced staff members. This position requires a Master's degree or PhD in Computer Science, Engineering, Information Technology, or a relevant field, along with typically 3+ years of experience in the field. You will need to have a strong background in Machine Learning and Artificial Intelligence, experience with containers and distributed deep learning, and familiarity with High Performance Computer Servers and Networking. Programming experience in languages such as Python, C, C++, and Fortran is strongly desired. Strong analytical and critical thinking skills are essential, as well as the ability to work independently in a semi-remote setting. HPE values diversity and inclusion, and we are committed to creating a workplace that reflects a variety of backgrounds and perspectives.

Responsibilities

Installs and configures complex IT infrastructure components (servers, storage, network)
Develop software scripts and configurations for automating deployment
Study and improve the performance of Large Language Models run on HPE GPU servers
Performs system level analysis of server workloads on various HPE platforms running DL and ML code to include accelerated hardware and high speed networks like InfiniBand
Writes white papers and other guidance documents for AI workload and model selection
Captures and reviews system performance data, logs, traces to understand workload behavior
Develops software and scripts that help analyze AI workload performance data
Communicates technical work well and can provide summaries of work to non-technical colleagues
Works with software and hardware partners in optimizing systems and resolving performance issues
Documents and reports issues discovered when testing and evaluating the systems
Communicates project status and concerns to management in a timely manner
Provides guidance to less-experienced staff members.

Requirements

Master's degree or PhD in Computer Science, Engineering, Information Technology or Systems, or relevant field
Typically 3+ years of experience
3+ years of experience in Machine Learning/Artificial Intelligence
Experience working with containers and distributed deep learning and neural networks, to include transformers used in generative AI projects
Experience working with High Performance Computer Servers, High Performance Networking, and associated software
Experience working with Weka I/O, NFTS and Lustre File Systems
Programming experience in Python, C, C++, Fortran programming language is strongly desired
Strong analytical and critical thinking skills
Must be a self-starter and be able to work with minimum supervision in a semi-remote setting.

Nice-to-haves

Artificial Intelligence Technologies
Cross Domain Knowledge
Data Engineering
Data Science
Design Thinking
Development Fundamentals
Full Stack Development
IT Performance
Machine Learning Operations
Scalability Testing
Security-First Mindset

Benefits

Health & Wellbeing benefits
Personal & Professional Development programs
Diversity, Inclusion & Belonging initiatives

Senior AI and Machine Learning Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company