Hardware Systems Engineer, NPI

$104,000 - $155,000/Yr

Meta - Sunnyvale, CA

posted 2 months ago

Full-time - Entry Level
Sunnyvale, CA
Web Search Portals, Libraries, Archives, and Other Information Services

About the position

Meta is seeking a Hardware Systems Engineer to join the Release to Production (RTP) team, which is responsible for the Hardware Lifecycle of all Meta servers. This role involves hands-on system and hardware debugging, stress testing, and ensuring that the next generation of Meta servers is efficient, reliable, and scalable. The engineer will collaborate with various teams to test systems before their release to production data centers and track the health and lifecycle of servers in production.

Responsibilities

  • Interface with external vendors and internal engineers to understand system architecture and develop test suites for various architectures.
  • Proactively create experiments and tooling to detect and diagnose hardware/firmware/software health issues.
  • Develop test framework for large-scale test automation during product development and after mass production.
  • Implement remediations across software and hardware stack according to plan, maintaining thorough procedural records and data logs.
  • Develop and publish updates on resolutions and communicate findings internally.
  • Troubleshoot, diagnose, and root cause system failures, isolating components and failure scenarios while collaborating with stakeholders.
  • Develop visibility through data visualization and implement systemic solutions to hardware health issues.
  • Drive discussions with teams on test specifications and methodologies to continuously improve test quality.

Requirements

  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience.
  • 3+ years of work experience in domains such as CPU SOCs (x86, ARM), storage (SAS, NVMe), device management, AI-ML hardware, data center network hardware, high-speed interfaces, interconnect technologies, and power systems.
  • 3+ years experience in changing system configurations, measuring change impact within x86 environments, and scripting with Python or similar.
  • Knowledge of server architecture and components, with experience working through the full lifecycle for different server system/data center products.

Nice-to-haves

  • Familiar with lab equipment, protocol analyzers, oscilloscopes, power meters, and airflow chambers.
  • 2+ years experience in 24x7 Production support at scale (e.g., 10K storage servers and over 100K HDD).
  • 3+ years experience scripting automation in Python, PHP, or Perl, in full system technologies (including PCIe).
  • 3+ years experience supporting ASIC development (silicon bringup, characterization).

Benefits

  • Bonus
  • Equity
  • Health insurance
  • Paid holidays
  • Paid volunteer time
  • Professional development opportunities
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service