AMD - Austin, TX

posted about 2 months ago

Full-time - Mid Level
Austin, TX
Computer and Electronic Product Manufacturing

About the position

At AMD, we are committed to transforming lives through our technology, and the Power Attainment Engineer role is pivotal in achieving this mission. The successful candidate will be part of the AMD Data Center Power and Performance Systems Engineering Team, which is at the forefront of developing cutting-edge technologies for our Data Center products. This team plays a crucial role in ensuring the success of AMD as a growing company, providing a collaborative and friendly environment where skilled professionals work together to push the boundaries of innovation. In this role, the Power Attainment Engineer will focus primarily on post-silicon activities related to power attainment and optimization of AMD's Data Center products. This includes developing automation and software infrastructure, ensuring production readiness, and tuning power features. The candidate will be expected to actively participate in analyzing post-silicon performance and power data, executing power attainment test plans, and driving continuous improvement in power attainment activities. The role requires hands-on experience with data center hardware and the ability to work effectively in both Windows and Linux environments. The ideal candidate will be a self-starter with strong communication skills and the ability to collaborate effectively within a team. They will be responsible for developing and executing characterization test plans for Data Center GPUs, analyzing data from various workloads, and optimizing power and performance features for AI, machine learning, and high-performance computing. This position offers the opportunity to become a key stakeholder in the product performance validation process and to troubleshoot system-level issues in test environments.

Responsibilities

  • Actively participate in analysis of post silicon performance and power data collected to ensure integrity of results and to provide summary and conclusions of results.
  • Learn and execute Power Attainment test plans in post-silicon time periods in support of Data Center GPU product roadmap.
  • Proactively drive continuous improvement for post-silicon power attainment activities.
  • Participate in development of automation environment in developing scripts automating workloads, enhancing capabilities of execution capabilities in Linux, Python and other support software support tools.
  • Hands-on experience locally or remotely with computers, systems or data center hardware for practical knowledge with hardware applicable to servers, data centers or thermal equipment as a means to accomplish power attainment work.
  • Develop and execute characterization test plans for Datacenter GPUs related to Power attainment and feature tuning for performance optimization.
  • Analyze data from workload or execution output datalogs using excel or analysis tools manually or developed automation.
  • Optimize power and performance features for AI, Machine learning & High performance computing.
  • Work in a fast paced constrained environment.
  • Become a key stakeholder in product performance validation process.
  • Analyze and debug interactions between various power management features.
  • Develop and execute performance validation test plans for HPC/ML frameworks.
  • Configure and setup test and customer based ML/AI Datacenter GPU systems for data collection, experiments and post-silicon activities.
  • Work in Windows and Linux environments.
  • Support prototyping experiments for new GPU features that impact performance and power.
  • Troubleshoot system-level issues that may occur in test environments and platforms.
  • Proactively drive continuous improvement for post-silicon power and performance activities.

Requirements

  • Bachelors or Masters in Computer Engineering, Electrical Engineering, or Computer Science with emphasis on computer architecture and workload analysis.
  • 7+ years' experience preferred.
  • Excellent grasp of computer organization/architecture and power management.
  • Knowledge in power limited performance methodologies and control theory.
  • Knowledge in memory partitioning and access.
  • Extensive experience in platform optimization.
  • Solid knowledge of Computer I/O.
  • Experience with tools for performance analysis.
  • Strong programming skills, experience in Python preferred.
  • Desirable to be proficient in Linux command line environment and Shell scripting.
  • Deep knowledge of power management techniques like deep sleep and clock gating.
  • Experience with container technologies (ex. Docker).
  • Strong analytical and problem-solving skills with a key attention to detail.
  • Experience in data analysis, summarization, and presentation.
  • Excellent presentation and communication skills.
  • Experience in debug and lab tools such as oscilloscopes, DAQs, power measurement capabilities.

Nice-to-haves

  • Experience in datacenter environment preferred.

Benefits

  • Base pay depending on skills, qualifications, experience, and location.
  • Eligibility for incentives such as annual bonuses or sales incentives.
  • Opportunity to own shares of AMD stock and discounts on stock purchases through the Employee Stock Purchase Plan.
  • Competitive benefits package.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service