Nvidia - Santa Clara, CA

posted 13 days ago

Full-time - Mid Level
Santa Clara, CA
5,001-10,000 employees
Computer and Electronic Product Manufacturing

About the position

NVIDIA is seeking skilled software engineers to enhance its enterprise GPU management and monitoring tools. The role involves designing and building Linux-based management agents, CLI tools, and integration solutions that connect GPUs with the data center software management ecosystem. The position focuses on supporting NVIDIA products across various platforms, ensuring reliable and secure delivery of data-center monitoring products, and improving development and release infrastructure.

Responsibilities

  • Create and maintain Helm Charts for custom software deployment.
  • Create and maintain development environments using technologies such as k3d, kind, tilt, helmfile, etc.
  • Utilize and implement best practices for software delivery in Kubernetes environments.
  • Create and maintain CI/CD pipelines on Jenkins, GitLab, and/or GitHub.
  • Improve and maintain integrations with static-analysis tools such as Coverity to ensure product quality.
  • Enhance the reliability of CI/CD pipelines by addressing platform issues.
  • Interface with internal NVIDIA tooling for signing and publishing products.
  • Configure CI/CD runners and integrations with version control systems.
  • Create and manage Infrastructure-as-Code tools like Terraform or Ansible for provisioning and managing infrastructure.
  • Collaborate with the development team to understand requirements and implement efficient DevOps practices.
  • Communicate with system owners to understand deployment environments and requirements.

Requirements

  • BS or higher in Computer Science or equivalent experience.
  • 5+ years of meaningful industry experience with a strong DevOps background.
  • Experience maintaining and debugging CI/CD pipelines on Jenkins/GitLab/GitHub.
  • Experience with containerized environments (Docker, cri-o, podman).
  • Business level English proficiency.
  • Outstanding written and verbal interpersonal skills.
  • Strong motivation and commitment to learn new skills.
  • Ability to execute all aspects of the software development lifecycle.
  • Experience managing time in a fast, heavily multitasked environment.
  • Experience with container orchestration platforms like Kubernetes, including availability and scaling solutions.

Nice-to-haves

  • Development experience with Python, Go, C, C++, and/or Rust.
  • Fluency in Bash scripting.
  • Background with containers and common orchestration frameworks.
  • Experience generating and using static-analysis reports.
  • Familiarity with dynamic analysis tools and fuzzing techniques.
  • Familiarity with C/C++ build environments and dependency management.
  • Familiarity with Go build environments and dependency management.
  • Experience with Kubernetes and running Jenkins on Kubernetes.
  • Knowledge of docker and runc internals.
  • Knowledge of logging and monitoring solutions in Kubernetes.

Benefits

  • Equity options
  • Comprehensive health benefits
  • Flexible work hours
  • Opportunities for professional development
  • Diversity and inclusion programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service