AMD - Austin, TX

posted about 2 months ago

Full-time - Mid Level
Austin, TX
Computer and Electronic Product Manufacturing

About the position

At AMD, we are committed to transforming lives through our technology, and as a Power Attainment Engineer for Data Center GPU, you will play a crucial role in this mission. This position is part of the AMD Data Center Power and Performance Systems Engineering Team, which is at the forefront of developing cutting-edge technologies for our data center products. The team is known for its collaborative and innovative environment, where you will work alongside highly skilled professionals who are dedicated to pushing the limits of what is possible in computing. In this role, you will primarily focus on post-silicon activities related to power attainment and optimization of AMD's data center products. Your responsibilities will include developing automation and software infrastructure, ensuring production readiness, and tuning power features. You will be expected to analyze post-silicon performance and power data, execute power attainment test plans, and drive continuous improvement in power attainment activities. This position requires a self-starter who can effectively communicate and collaborate with team members while independently driving tasks to completion. As a key stakeholder in the product performance validation process, you will analyze and debug interactions between various power management features and develop test plans for performance validation in high-performance computing and machine learning frameworks. You will also configure and set up test systems for data collection and experiments, troubleshoot system-level issues, and support prototyping experiments for new GPU features that impact performance and power. Your work will be essential in optimizing power and performance features for AI, machine learning, and high-performance computing applications.

Responsibilities

  • Actively participate in analysis of post silicon performance and power data collected to ensure integrity of results and to provide summary and conclusions of results
  • Learn and Execute Power Attainment test plans in post-silicon time periods in support of Data Center GPU product roadmap
  • Proactively driving continuous improvement for post-silicon power attainment activities
  • Participate in development of automation environment in developing scripts automating workloads, enhancing capabilities of execution capabilities in Linux, Python and other support software support tools
  • Hands-on experience locally or remotely with computers, systems or data center hardware for practical knowledge with hardware applicable to servers, data centers or thermal equipment as a means to accomplish power attainment work
  • Develop and execute characterization test plans for Datacenter GPUs related to Power attainment and feature tuning for performance optimization
  • Analyzing data from workload or execution output datalogs using excel or analysis tools manually or developed automation
  • Optimize power and performance features for AI, Machine learning & High performance computing
  • Work in a fast paced constrained environment
  • Become a key stakeholder in product performance validation process
  • Analyze and debug interactions between various power management features
  • Develop and execute performance validation test plans for HPC/ML frameworks
  • Configure and setup test and customer based ML/AI Datacenter GPU systems for data collection, experiments and post-silicon activities
  • Work in Windows and Linux environments
  • Support prototyping experiments for new GPU features that impact performance and power
  • Troubleshoot system-level issues that may occur in test environments and platforms
  • Proactively driving continuous improvement for post-silicon power and performance activities

Requirements

  • Bachelors or Masters in Computer Engineering, Electrical Engineering, or Computer Science with emphasis on computer architecture and workload analysis
  • 7+ years' experience preferred
  • Excellent grasp of computer organization/architecture and power management
  • Knowledge in power limited performance methodologies and control theory
  • Knowledge in memory partitioning and access
  • Extensive experience in platform optimization. Solid knowledge of Computer I/O.
  • Strong programming skills, experience in Python preferred
  • Desirable to be proficient in Linux command line environment and Shell scripting
  • Deep knowledge of power management techniques like deep sleep and clock gating
  • Experience with container technologies (ex. Docker)
  • Strong analytical and problem-solving skills with a key attention to detail
  • Experience in data analysis, summarization, and presentation
  • Excellent presentation and communication skills
  • Experience in debug and lab tools such as oscilloscopes, DAQs, power measurement capabilities

Nice-to-haves

  • Experience in datacenter environment preferred
  • Strong analytical and problem-solving skills with a key attention to detail
  • Experience with tools for performance analysis

Benefits

  • Base pay depending on skills, qualifications, experience, and location
  • Eligibility for annual bonus or sales incentive
  • Opportunity to own shares of AMD stock
  • Discount when purchasing AMD stock through Employee Stock Purchase Plan
  • Competitive benefits package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service