Nvidia - Santa Clara, CA

posted 18 days ago

Full-time - Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

NVIDIA is seeking a Senior Systems Software Engineer to join the TAO Toolkit ML Data and Platforms Team, focusing on machine learning data modeling. This role involves developing innovative algorithms to process vast amounts of unstructured data using machine and deep learning techniques, collaborating with deep learning architects and engineers to create advanced AI models.

Responsibilities

  • Help in finding the right data for a Multi-Modal model with scalable systems.
  • Design and develop an active (and passive) learning paradigm within (and out) of the loop annotators to iteratively mine informative data.
  • Design various (ML and DL) architectures and loss functions to ingeniously formulate automated pseudo-labeling for various multi-modal tasks.
  • Design insightful metrics (in settings: unsupervised, semi-and-supervised) for performance characterization of various models and data.
  • Build scalable and robust ETL pipelines using novel and meaningful ML and DL models to deliver high-quality datasets.
  • Work with internal teams to define requirements, enhance products, and automate workflows.

Requirements

  • Bachelor's degree (or equivalent experience) in Computer Engineering, Computer Science, Electrical Engineering, Robotics, or related field.
  • 5+ years of ML / DL-related engineering experience with strong architecture and design skills.
  • Excellent background and understanding of the deep roots of Machine Learning and Deep Learning.
  • Expertise with an understanding of out-of-distribution and related concepts.
  • Knowledge of PyTorch, distributed machine learning, and distributed file systems.
  • 3+ years leading complex sometimes ambiguous projects, particularly in high-throughput services at supercomputing scale.
  • Experience with Dagster, Terraform.
  • Experience in high-performance computing environments and workflow automation frameworks (e.g., Airflow).

Nice-to-haves

  • Proficient in running applications on cloud platforms using Kubernetes and Docker, and ML frameworks like Pytorch.
  • Proficient in building systems and familiar with deep learning architectures and tools like NVIDIA TensorRT-LLM, Multimodal-LLM, and Triton Server.
  • Familiar with GPU programming concepts, and writing custom CUDA kernels.
  • Experience with SQL databases and cloud infrastructure (AWS, GCP, Kubernetes).

Benefits

  • Competitive salary package
  • Equity
  • Benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service