Principal Engineer, Performance Analysis - AI Applications and Services

$272,000 - $419,750/Yr

Nvidia - Santa Clara, CA

posted about 2 months ago

Full-time - Senior

Santa Clara, CA

Computer and Electronic Product Manufacturing

About the position

The Principal Engineer for Performance Analysis in AI Applications and Services will focus on optimizing distributed cloud-native accelerated video analytics applications. This role involves working closely with application teams to profile, identify bottlenecks, and enhance performance across CPU, GPU, and network accelerators in a Kubernetes environment. The engineer will drive performance initiatives, develop strategies for optimization, and standardize performance measurement processes to ensure efficient resource utilization and application performance.

Responsibilities

Plan, enable, and drive performance initiatives across Cloud Native application teams.
Review, develop, deploy, and manage tools and strategies to systematically run performance experiments.
Collect and organize performance data with key partners.
Work closely with application teams to understand application resource utilization characteristics and identify performance issues through profiling.
Learn and understand various accelerators in the system for application workloads and recommend end-to-end performance optimizations.
Assist developers and product teams on the best accelerators and systems for end-to-end system performance.
Improve and standardize performance measurement processes across applications and GPU systems.
Collaborate with GPU cloud-native teams at Nvidia to deploy optimal GPU resource sharing strategies in a Kubernetes environment.

Requirements

Masters degree or PhD in Computer Science or a related field, or equivalent experience.
15+ years of experience in optimizing system design, complexity analysis, software design in Unix/Linux systems, performance, and application issues.
Experience in real-time streaming AI inference systems.
A history of working on distributed accelerated systems and solving sophisticated performance problems.
Deep hands-on experience with distributed systems based on Kubernetes.
Experience with on-prem and cloud systems and ability to work with partners across multiple teams.
Experience using and handling and optimizing modern cloud and container-based enterprise computing architectures.
Strong verbal and written communication and teamwork skills.
Ability to multitask effectively in a multifaceted environment and action-driven with strong analytical skills.

Nice-to-haves

Background with real-time computer vision AI inference and/or analytics platforms.
Experience in application issues, algorithms, and data structures.
Understanding of the functioning of AI services, deep learning, and AI.
Exposure to scheduling and resource management systems.
Knowledge of GPU programming such as OpenCL or CUDA and knowledge of multi-node GPU setups, GPU clusters, or cloud computing.

Benefits

Equity options
Comprehensive health benefits
Flexible work hours
Diversity and inclusion programs
Professional development opportunities

Principal Engineer, Performance Analysis - AI Applications and Services

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company