Hire It People - Austin, TX

posted 15 days ago

Full-time - Mid Level
Austin, TX
Professional, Scientific, and Technical Services

About the position

The position is for a Big Data Engineer specializing in Machine Learning Operations (MLP Ops) and Site Reliability Engineering (SRE). The role focuses on managing the end-to-end machine learning lifecycle within an in-house Kubernetes cluster, ensuring the stability and availability of production services, and fostering a culture of continuous improvement in incident resolution processes.

Responsibilities

  • Manage the end-to-end machine learning lifecycle on the in-house Kubernetes cluster.
  • Ensure the stability and availability of production services.
  • Handle incident resolution and maintain documentation as needed.

Requirements

  • 7+ years of experience in a relevant role.
  • Solid understanding of AI/ML, Jupyter Notebook, and Jenkins.
  • Basic understanding of Kubernetes (K8s) and experience with Kubeflow for MLOps.

Nice-to-haves

  • Experience working with geographically distributed teams.
  • Familiarity with best practices for operations in MLP Ops.

Benefits

  • Competitive salary based on experience.
  • Opportunity to work on cutting-edge technology in AI/ML.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service