Nvidia - Santa Clara, CA

posted 3 months ago

Full-time - Mid Level
Santa Clara, CA
Computer and Electronic Product Manufacturing

About the position

NVIDIA's deep learning platform has already made a major impact in the field and is broadly used across leading academic institutions, start-ups, and industry, including the world's largest Internet companies. We need hardworking and creative people to help us dive into more of these rare opportunities in GPU application in the enterprise solution space. We are now looking for an Applications Engineer with the ability to focus on customer enablement of enterprise products in a datacenter environment. We seek an expert who is familiar with out-of-band telemetry, diagnostics, and cluster management best practices established in the industry. In this role, you will work with customers and internal teams to resolve hardware, firmware, and software issues, and provide key technical collateral. This is a highly technical engineering role that is responsible for providing best-in-class support to NVIDIA's enterprise customers. We are looking for someone who has superb interpersonal skills, and can understand, explain, and solve customer problems. This position will focus on NVIDIA enterprise datacenter products in workstation and server applications.

Responsibilities

  • Building and improving our partner ecosystem guidelines around GPU-accelerated computing for both single-node and multi-node cluster deployments.
  • Working on Out-of-band telemetry and management initiatives to enable partners to design deep learning clusters at scale.
  • Resolving system integration issues related to thermal, mechanical, electrical, PCIe and GPU interconnect interfaces including out of band management services.
  • Understanding system design requirements for HPC and AI workloads to drive platform configuration guides for x86 and ARM servers.
  • Conducting the installation, configuration and bring-up of enterprise server hardware.
  • Working directly with our NVIDIA customers, analyzing data to answer questions, reproduce errors, resolve same, or escalate customer issues.
  • Being involved in customer interaction, customer communication via conference calls, or face to face meetings.
  • Familiarizing yourself with performing hardware debug using oscilloscope and analyzers to qualify, validate and solve NVIDIA products for customer systems.
  • Tracking, filing new bugs and reproducing issues as needed.
  • Creating product specifications, hardware design guides, application notes, and other supporting technical collateral.

Requirements

  • BS or higher in EE, CE or Systems Engineering or equivalent experience.
  • 6+ years of relevant experience in supporting enterprise datacenter products for x86 or ARM architecture.
  • Minimum 5 years of experience designing and operating large scale compute infrastructure.
  • Strong analytical skills and past experience in reviewing enterprise system design and CPU architecture.
  • Understanding of x86 and ARM system architecture for server design including BMC, security and out of band management.
  • Professional-level interpersonal skills, including the ability to adjust communication to the technical level of the audience.
  • An innate capability to accurately and succinctly communicate procedures, results, and recommendations to customers.
  • Proficient in Centos/RHEL and/or Ubuntu Linux distros including Python programming and bash scripting.

Benefits

  • Equity and benefits eligibility based on position and experience.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service