Senior AI and Machine Learning Engineer - HPE1US1176368EXTERNALENUS

$73,000 - $145,000/Yr

Hewlett Packard Enterprise - Houston, TX

posted about 2 months ago

Full-time - Senior

Onsite - Houston, TX

Computer and Electronic Product Manufacturing

About the position

Hewlett Packard Enterprise (HPE) is seeking a Senior AI and Machine Learning Engineer to join our High Performance Computing, AI and Labs team. This role is primarily remote, allowing you to work from home while contributing to innovative solutions that accelerate our customers' digital transformation. As a global edge-to-cloud company, HPE is dedicated to helping organizations connect, protect, analyze, and act on their data and applications, enabling them to derive insights and outcomes swiftly in today's complex environment. Our culture is built on collaboration, diversity, and the pursuit of excellence, making it an ideal place for professionals looking to grow their careers. In this position, you will focus on enhancing the performance of Large Language Models on HPE GPU servers, conducting system-level analyses of HPC and AI workloads across various HPE platforms. You will run machine learning and deep learning code on advanced hardware, including NVIDIA and AMD GPUs, and high-speed networks like InfiniBand. Your responsibilities will also include developing software and scripts to automate AI workloads, installing and configuring complex IT infrastructure components, and documenting performance data to understand workload behavior. You will communicate your findings effectively to both technical and non-technical colleagues, mentor junior staff, and collaborate with software and hardware partners to optimize systems and resolve performance issues. This role requires a strong educational background, typically a Master's degree or PhD in Computer Science, Engineering, Information Technology, or a related field, along with at least three years of relevant experience in machine learning and artificial intelligence. You will need proficiency in AI and machine learning frameworks such as TensorFlow, PyTorch, and ONNX, as well as experience with high-performance computing servers and networking. Strong analytical skills and the ability to work independently in a semi-remote setting are essential for success in this role.

Responsibilities

Studies and improves performance of Large Language Models running on HPE GPU servers
Performs system level analysis of HPC & AI workloads on various HPE platforms
Runs ML/DL code on accelerated hardware like NVIDIA and AMD GPUs and high-speed networks like InfiniBand
Develops software and scripts to automate AI workloads and analyze performance data
Installs and configures complex IT infrastructure components (servers, storage, network)
Writes white papers and other guidance documents for AI workload and model selection
Captures and reviews system performance data, logs, traces to understand workload behavior
Communicates technical work well and presents work to non-technical colleagues
Works with software and hardware partners in optimizing systems and resolving performance issues
Documents and reports issues when testing and evaluating systems
Communicates project status and concerns to management in a timely manner
Mentors less-experienced staff members

Requirements

Master's degree or PhD in Computer Science, Engineering, Information Technology or Systems, or relevant field
Typically 3 years of experience in Machine Learning/Artificial Intelligence
Proficiency in one or more AI & Machine Learning frameworks or libraries (TensorFlow, PyTorch, ONNX, DeepSpeed, Horovod, TensorRT, NeMo)
Experience with containers and distributed deep learning and neural networks, including transformers used in generative AI projects
Experience with High Performance Computer Servers, High Performance Networking, and associated software
Experience with Weka I/O, NTFS and Lustre File Systems
Programming experience in Python or C/C++ is strongly desired
Strong analytical and critical thinking skills
Must be a self-starter, able to work with minimum supervision in a semi-remote setting

Nice-to-haves

Artificial Intelligence Technologies and performance benchmarking
Cross Domain Knowledge
Data Engineering
Data Science
Design Thinking
Development Fundamentals
Full Stack Development
IT Performance
Machine Learning Operations
Scalability Testing
Security-First Mindset

Benefits

Comprehensive suite of benefits supporting physical, financial, and emotional wellbeing
Programs for personal and professional development
Flexibility to manage work and personal needs
Inclusive work environment celebrating individual uniqueness

Senior AI and Machine Learning Engineer - HPE1US1176368EXTERNALENUS

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company