AI/ML Engineer - GCP

$97,000 - $169,000/Yr

Publicis Groupe - Dallas, TX

posted 8 days ago

Full-time - Mid Level
Dallas, TX
10,001+ employees
Professional, Scientific, and Technical Services

About the position

The Platform Engineer - AI & GPU Services is responsible for implementing and maintaining AI/ML platforms and GPU resource management across cloud (GCP) and on-premise infrastructure. This role requires expertise in cloud services, AI/ML technologies, and infrastructure automation to support product and platform engineering functions, focusing on generative AI services and container orchestration.

Responsibilities

  • Architect, build, and maintain AI/ML platforms using Google Cloud Platform (GCP) services like Compute, Storage, IAM, and VPC.
  • Manage NVIDIA GPU resources across projects using Run.ai or similar tools.
  • Develop and maintain MLOps pipelines on platforms like Vertex AI, supporting AI/ML model training and deployment.
  • Write Python scripts for model development, automation, and infrastructure management.
  • Use Terraform for Infrastructure as Code (IaC) to automate provisioning and deployment of cloud resources.
  • Deploy and manage AI/ML models on container orchestration platforms such as OpenShift and GKE.
  • Collaborate with AI teams to facilitate LLM deployment (e.g., Llama, Mistral) and GPU utilization.
  • Automate and enhance CI/CD pipelines for seamless integration and deployment of services.
  • Monitor performance and capacity with Prometheus, Grafana, and other observability tools to ensure system stability.
  • Engage in DevOps practices, including containerization, orchestration, and infrastructure management.

Requirements

  • Strong experience with Google Cloud Platform (GCP) and its core services (Compute, Storage, IAM, VPC).
  • Experience with GPU resource management tools (e.g., Run.ai).
  • Proficiency with Python for AI/ML workflows and automation.
  • Hands-on experience with MLOps platforms like Vertex AI.
  • Experience with Terraform for managing cloud infrastructure using Infrastructure as Code (IaC) practices.
  • Knowledge of Kubernetes and container orchestration platforms such as OpenShift and GKE.
  • Familiarity with monitoring and logging tools like Prometheus, Grafana, and the ELK Stack.
  • Proven track record of working with CI/CD pipelines and DevOps automation tools.

Benefits

  • Flexible vacation policy; time is not limited, allocated, or accrued
  • 16 paid holidays throughout the year
  • Generous parental leave and new parent transition program
  • Tuition reimbursement
  • Corporate gift matching program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service