Microsoft - Redmond, WA

posted 4 months ago

Full-time - Principal
Remote - Redmond, WA
Publishing Industries

About the position

Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind expanding Cloud Infrastructure and is responsible for powering the "Intelligent Cloud" mission. SCHIE delivers the core infrastructure and foundational technologies for over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. We are looking for a Principal Hardware Quality Engineer to join the team. As a cloud business continues to grow, the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lower cost is of paramount importance. To achieve this goal, the Hardware, Infrastructure Management, and Fundamental Engineering (HIFE) team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale, and sustainability related to cloud hardware. Our mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Responsibilities

  • Lead an effective and robust supplier quality management strategy to ensure the data center hardware is manufactured at the highest level of quality standards.
  • Lead quality issues at the system level and conduct debug and failure analysis for any issue including GPU in the Azure fleet and drive resolution with partners and suppliers.
  • Provide system level technical guidance to SI and various internal stakeholders and lead through complex problems.
  • Drive the continuous improvement process based on Root Cause Analysis (RCA) and identified opportunities.
  • Responsible for quality readout based on the telemetry data analysis, to bring clarity on status, action across the organization and next steps for issue resolution.
  • Establish Critical-to-Quality performance metrics to measure and improve product quality.
  • Act as the voice of quality in the hardware change management process, ensuring quality requirements are considered and met and improved.
  • Mentor and develop team members, fostering a culture of excellence and innovation.
  • Embody our Culture and Values.

Requirements

  • Bachelor's Degree in Reliability Engineering, Electrical Engineering, or related field AND 8+ years technical engineering experience OR Master's Degree in Reliability Engineering, Electrical Engineering, or related field AND 7+ years technical engineering experience OR Doctorate Degree in Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience.
  • 5+ years of experience in working with modern server architecture and/or their subsystems - including GPU, CPU, AI hardware, Memory, Motherboard and methods for root cause analysis and debugging.
  • 3+ years of experience in leading a large-scale task force to resolve technical problems and solutions.
  • Ability to meet customer and/or government security screening requirements are required for this role.

Nice-to-haves

  • Master's degree in Electrical Engineering, Computer HW, or System Engineering.
  • Leadership skills and ability to collaborate with diverse teams and drive a call to action.
  • 10+ years of experience in working with modern server architecture and/or their subsystems - including GPU, CPU, AI hardware, Memory and methods for root cause analysis and debugging.
  • 5+ years of experience in leading a large-scale task force to resolve technical problems and solutions.

Benefits

  • Industry leading healthcare
  • Educational resources
  • Discount on products and services
  • Savings and investment
  • Maternity and paternity leave
  • Generous time away
  • Giving program
  • Opportunities to network and connect
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service