This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Chan Zuckerberg Initiative - Redwood City, CA

posted about 2 months ago

Full-time - Mid Level
Redwood City, CA
Professional, Scientific, and Technical Services

About the position

The position involves building and maintaining AI/ML and Data Engineering infrastructure solutions at the Chan Zuckerberg Initiative (CZI). The role focuses on creating scalable, efficient, and secure systems that support the work of researchers and engineers in the fields of education and science. The individual will collaborate with various teams to design and implement complex systems that integrate with large-scale AI/ML GPU compute infrastructure, enhancing the capabilities of CZI's technology initiatives.

Responsibilities

  • Participate in the technical design and building of efficient, stable, performant, scalable, and secure AI/ML and Data infrastructure engineering solutions.
  • Engage in active hands-on coding for Deep Learning and Machine Learning models.
  • Design and implement complex systems that integrate with large-scale AI/ML GPU compute infrastructure and platforms.
  • Work on containerized applications and infrastructure using Kubernetes to support large-scale GPU research clusters.
  • Collaborate with team members to design and build cloud-based AI/ML platform solutions, including Databricks Spark and Weaviate Vector Databases.
  • Assist in data management solutions across a heterogeneous collection of complex datasets.
  • Build tooling that optimally utilizes shared infrastructure for AI/ML efforts.

Requirements

  • BS or MS degree in Computer Science or a related technical discipline or equivalent experience.
  • 5+ years of relevant coding experience.
  • 3+ years of systems architecture and design experience across Data, AI/ML, Core Infrastructure, and Security Engineering.
  • Experience in scaling containerized applications on Kubernetes or Mesos, with expertise in creating custom containers using secure AMIs and continuous deployment systems.
  • Proficiency with Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, and experience with On-Prem and Colocation Service hosting environments.
  • Proven coding ability with a systems language such as Rust, C/C++, C#, Go, Java, or Scala.
  • Experience with a scripting language such as Python, PHP, or Ruby.
  • AI/ML platform operations experience in environments with challenging data and systems platform challenges, including large-scale Kafka and Spark deployments.
  • MLOps experience working with medium to large scale GPU clusters in Kubernetes (Kubeflow), HPC environments, or large scale Cloud-based ML deployments.
  • Working knowledge of Nvidia CUDA and AI/ML custom libraries.
  • Knowledge of Linux systems optimization and administration.
  • Understanding of Data Engineering, Data Governance, Data Infrastructure, and AI/ML execution platforms.

Nice-to-haves

  • Experience with PyTorch, Keras, or TensorFlow.
  • Experience with HPC and Slurm.

Benefits

  • Generous employer match on employee 401(k) contributions.
  • Annual benefit for employees for housing, student loan repayment, childcare, commuter costs, or other life needs.
  • CZI Life of Service Gifts awarded to employees to support causes they care about.
  • Paid time off to volunteer at an organization of choice.
  • Funding for select family-forming benefits.
  • Relocation support for employees moving to the Bay Area.
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service