Unclassified - Los Alamos, NM
posted 4 months ago
The Hardware Break/Fix Engineer position at SOC LLC in Los Alamos, NM, is a critical role that requires a dedicated professional to ensure the high availability and performance of High-Performance Computing (HPC) systems. The successful candidate will be responsible for maintaining the operational integrity of these systems, which are essential for the client’s needs. This role demands a proactive approach to system monitoring, troubleshooting, and repair, ensuring that all hardware and software components function optimally. The engineer will work closely with clients to address their needs and resolve any issues that arise, ensuring that service level agreements (SLAs) are met and that the systems remain secure and compliant with the required standards. In this position, the engineer will create and document site procedures, system diagrams, and other configuration or support documents. They will monitor and maintain system health across compute, network, and storage components, responding to client tickets and managing support cases effectively. The role also involves maintaining availability reports, tracking hardware repairs, and ensuring that the system security posture is upheld. The engineer will be expected to troubleshoot and repair hardware issues, assist with hardware and system installations, and maintain system software and firmware revisions, including necessary patches and updates. The engineer will work as part of a team, collaborating with technical leads and customer representatives to accomplish tasks with minimal direction. This position requires a strong commitment to customer service, as the engineer will be the point of contact for inquiries regarding system software versions, product lifecycles, and third-party applications. The ability to gather data, perform analysis, and escalate problems to higher-level support groups is essential for ensuring timely resolutions to system or customer issues. Overall, this role is vital for maintaining the operational excellence of the HPC systems and ensuring client satisfaction.