LLMOps Engineer - Cloud/Gen AI

Vertiv - Westerville, OH

posted 4 months ago

Full-time

Westerville, OH

Wholesale Trade Agents and Brokers

About the position

As an LLMOps Engineer - Cloud/Gen AI, you will play a crucial role in building and maintaining the infrastructure and pipelines for cutting-edge Large Language Models (LLMs). This position requires close collaboration with Generative AI Architects to ensure the efficiency, scalability, and reliability of Generative AI models in production. Your expertise in automating and streamlining the LLM lifecycle will be instrumental in achieving these goals. The role is based onsite at Vertiv's Westerville, OH - HQ location, where you will be expected to conceptualize, develop, and execute Machine Learning (ML)/LLM pipelines specifically tailored for Large Language Models. This includes tasks such as data acquisition, pre-processing, model training and tuning, deployment, and monitoring. In this position, you will utilize automation tools such as GitOps, CI/CD pipelines, and containerization technologies like Docker and Kubernetes to streamline ML/LLM tasks across the Large Language Model lifecycle. Establishing robust monitoring and alerting systems will be essential to track Large Language Model performance, data drift, and other key metrics, allowing you to proactively identify and resolve issues. You will also perform truth analysis to assess the accuracy and effectiveness of Large Language Model outputs, comparing them to known, accurate data. Collaboration is key in this role, as you will work closely with infrastructure and DevOps teams, as well as Generative AI Architects, to optimize model performance and resource utilization. Additionally, you will oversee and maintain cloud infrastructure (e.g., AWS, Azure) specifically for Large Language Model workloads, ensuring cost-efficiency and scalability. Staying current with the latest advancements in ML/LLM Ops will be crucial, as you will integrate these developments into generative AI platforms and processes. Effective communication with both technical and non-technical stakeholders will be necessary to provide updates on the performance and status of Large Language Models.

Responsibilities

Conceptualize, develop, and execute Machine Learning (ML)/LLM pipelines specifically for Large Language Models, including data acquisition, pre-processing, model training/tuning, deployment, and monitoring.
Utilize automation tools such as GitOps, CI/CD pipelines, and containerization technologies (Docker, Kubernetes) to streamline ML/LLM tasks across the Large Language Model lifecycle.
Establish robust monitoring and alerting systems to track Large Language Model performance, data drift, and other key metrics, proactively identifying and resolving issues.
Perform truth analysis to assess the accuracy and effectiveness of Large Language Model outputs, comparing them to known, accurate data.
Collaborate closely with infrastructure, DevOps teams, and Generative AI Architects to optimize model performance and resource utilization.
Oversee and maintain cloud infrastructure (e.g., AWS, Azure) specifically for Large Language Model workloads, ensuring cost-efficiency and scalability.
Stay current with the latest advancements in ML/LLM Ops, integrating these developments into generative AI platforms and processes.
Communicate effectively with both technical and non-technical stakeholders, providing updates on the performance and status of Large Language Models.

Requirements

Bachelor's or Master's degree in Computer Science, Engineering or similar.
At least 5 years of experience as an ML engineer within public cloud platforms.
Strong programming skills in Python and/or other languages.
Expertise in cloud platforms (e.g., AWS, Azure) for ML workloads, MLOps, DevOps, or Data Engineering.
Proven experience in MLOps, LLMOps, or related roles, with hands-on experience deploying and managing machine learning and large language model pipelines.
Familiarity with generative AI applications and domains such as content creation, data augmentation, style transfer.
Strong knowledge of Generative AI architectures and methods, including chunking, vectorization, context-based retrieval and search, working with Large Language Models such as Open AI GPT 3.5/4.0, Llama2, Llama3, Mistral, etc.

Nice-to-haves

Strong understanding of cybersecurity principles and best practices to ensure the integrity, security, and confidentiality of data.
Knowledge of AI ethics and understanding how to apply Trustworthy AI to ensure safe, responsible, and ethical use of AI technology.
Experience with data engineering and data visualization tools and techniques.
Passion for learning and exploring new generative AI technologies and methods.

LLMOps Engineer - Cloud/Gen AI

About the position

Responsibilities

Requirements

Nice-to-haves

Tools

Career Hubs

Guides

Company