Nvidia - Santa Clara, CA

posted 16 days ago

Full-time - Mid Level
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

NVIDIA is seeking experienced software developers to work on hardware integration and bare-metal provisioning within our Linux-based cluster management software environment. The role involves developing and enhancing the Bright Cluster Manager, which powers thousands of Linux clusters globally. The position focuses on installation and provisioning processes, edge site deployment, and integrating with the latest hardware technologies, aiming to improve scalability and usability for a variety of workloads.

Responsibilities

  • Development of the head node and compute node installation and provisioning processes.
  • Work on functionality in the area of edge site deployment.
  • Integrating our product with the latest hardware (e.g GPUs, DPUs, accelerators, high-speed interconnects such as Infiniband).
  • Work on features related to composable infrastructure management.
  • Develop new features for our BIOS and firmware upgrade management.
  • Develop functionality that makes Bright clusters usable for a wider range of workloads, and increases scalability to allow clusters to scale to huge number of nodes.
  • Adding support for new Linux distributions.
  • Improving support for alternative CPU architectures such as ARM.
  • Work on adding features to our Ansible collections for Cluster Installation and Management.
  • Assist our support team with customer support requests in the above mentioned features and help our customers to use our product more efficiently.

Requirements

  • Degree in Computer Science or related field (or equivalent experience).
  • 7+ years of experience in software development and/or related roles.
  • Familiarity with the Linux operating system and networking concepts in Linux.
  • Good practical knowledge about common software installed as part of a typical Linux installation.
  • Proficient in Python and familiar with object-oriented software design, design patterns, and concurrent programming techniques.
  • Emphasis on high quality of work and producing clean code.
  • Eager to learn and use new technologies.

Nice-to-haves

  • Experience with Ansible.
  • Experience with high-performance computing and system administration.
  • Knowledge of Kubernetes, AWS, Azure, GCE, OpenStack, Jenkins, and distributed programming.
  • Proficiency in C++.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Diversity and inclusion programs
  • Ongoing training and development opportunities
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service