Paramountposted 11 days ago
$74,400 - $150,000/Yr
Full-time - Mid Level
NY

About the position

We are looking for a DevOps Engineer - Personalization Services to join our Applied Intelligence Personalization Team. This role will focus on infrastructure support for ML applications running on Kubernetes and Ray Clusters, ensuring high-performance computing for content recommendations and personalization services. The ideal candidate will have 2+ years of experience working with Kubernetes, TensorFlow (TF), Prometheus, and high-performance parallel computing for scaling ML workloads efficiently.

Responsibilities

  • Design, implement, and manage scalable and reliable infrastructure for ML-based personalization services.
  • Optimize Kubernetes-based deployments for ML training and inference workloads.
  • Manage and scale Ray Clusters for distributed ML workloads and parallel computing.
  • Automate CI/CD pipelines to streamline the deployment of ML models and services.
  • Develop observability and monitoring solutions using tools like Prometheus, Datadog, and OpenTelemetry.
  • Ensure high availability, security, and performance of ML infrastructure.
  • Work with ML engineers and backend teams to integrate scalable model training and inference systems.
  • Implement autoscaling strategies for Ray-based workloads based on computational demand.
  • Optimize ML infrastructure for TensorFlow model training and serving.
  • Debug and resolve production issues related to latency, scaling, and reliability.

Requirements

  • 2+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure Engineering as well as strong knowledge of GCP, AWS, or AZURE.
  • Hands-on experience with online inferencing, expertise in TensorFlow model training and serving, and experience with high-performance parallel computing architectures.
  • Hands-on experience with CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI, strong experience with Kubernetes and container orchestration, as well as expertise in infrastructure as code (IaC) using Terraform or Helm.

Nice-to-haves

  • Experience with message queues and event-driven architectures (Pub/Sub, Kafka, etc.).
  • Proficiency in monitoring and logging solutions (Datadog, Prometheus, OpenTelemetry, etc.).
  • Strong scripting skills in Python, Bash, or Go for automation.
  • Hands-on experience with ML model serving frameworks (TensorFlow Serving, Triton, TorchServe, etc.).
  • Familiarity with load balancing, API gateways, and caching strategies.
  • Experience optimizing low-latency microservices for ML-based personalization.
  • Familiarity with Argo CD.
  • Passion for building and maintaining high-performance infrastructure for large-scale ML applications.

Benefits

  • A culture of learning passionate about innovative ML infrastructure and DevOps standard processes.
  • A collaborative team environment where engineering supports real-time personalization.
  • A remote-friendly work setup with opportunities to work on scalable ML training and inference systems.
  • Medical, dental, vision, 401(k) plan, life insurance coverage, disability benefits, tuition assistance program and PTO.
  • This position is bonus eligible.
Hard Skills
TensorFlow
5
Kubernetes
4
Prometheus
2
Argo CD
1
Bash
1
0CYqIr l8ULT1NFwqP3MJ2
0
0k481Vh5EyzD3U2xJ KUdqDzaBHN PFozW4Zc
0
0pg7x6bMi J1nFReANLWYdD
0
5eGY3g4VFLnkD VLfqxSvW
0
By2QW8eSAvNg uB5DiHozCQ
0
Dc6gKsEUYO87 K7FlMc0tV9O
0
FrXpVABP
0
HdDUx7Q3j Kae1RMqCJZ
0
Il0F9tGSs7OkgqKfeBT
0
JCj6 6kKLyZXS
0
JuF1slEZ
0
Ke9qf yT7V4JjnF
0
N4u8ztv
0
QNgZ2 Xw8e0dktJU
0
QbS6 V9BnSX5kqH6
0
RzqPtXQcweICA 9ji6xnd1
0
S7IzkWnHo nosDEtivQ
0
TfR5ub9Lxij 3Qwq5bWSUet
0
V4yFP1WX2uJZYTA CIz 7fkPw
0
VIstZ ejcbs6YzqRCA IGWQCKMXL5
0
YbFQ QB27vEOq
0
b7Z43KV
0
bjl6rxR3sv1Fu
0
bsDBidS
0
cq3G8w1rXp
0
dFtOoMre U8nmWblPxesQ
0
e6fanJz2AMjP 6qvZepdPW1yl
0
gQqOP gE2LjqCvyH
0
hSLeGB
0
k4pDeNh PvF8JGlsh3tW
0
kI9
0
vKU4Mq oZC E5IlS8W
0
znrialvTJhoYk bKgXtwW6
0
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service