Unclassified - Los Alamos, NM

posted 3 months ago

Full-time - Mid Level
Los Alamos, NM

About the position

The Hardware Break/Fix Engineer position at SOC LLC in Los Alamos, NM, is a critical role that requires a dedicated professional to ensure the high availability and performance of High-Performance Computing (HPC) systems. The successful candidate will be responsible for maintaining the operational integrity of these systems, which are essential for the client’s needs. This role demands a proactive approach to system monitoring, troubleshooting, and repair, ensuring that all hardware and software components function optimally. The engineer will work closely with clients to address their needs and resolve any issues that arise, ensuring that service level agreements (SLAs) are met and that the systems remain secure and compliant with the required standards. In this position, the engineer will create and document site procedures, system diagrams, and other configuration or support documents. They will monitor and maintain system health across compute, network, and storage components, responding to client tickets and managing support cases effectively. The role also involves maintaining availability reports, tracking hardware repairs, and ensuring that the system security posture is upheld. The engineer will be expected to troubleshoot and repair hardware issues, assist with hardware and system installations, and maintain system software and firmware revisions, including necessary patches and updates. The engineer will work as part of a team, collaborating with technical leads and customer representatives to accomplish tasks with minimal direction. This position requires a strong commitment to customer service, as the engineer will be the point of contact for inquiries regarding system software versions, product lifecycles, and third-party applications. The ability to gather data, perform analysis, and escalate problems to higher-level support groups is essential for ensuring timely resolutions to system or customer issues. Overall, this role is vital for maintaining the operational excellence of the HPC systems and ensuring client satisfaction.

Responsibilities

  • Maintain the HPC systems availability to the customer
  • Create and document site procedures, system diagrams, and other configuration or support documents
  • Monitor and maintain system health on the HPC system(s) - compute, network and storage
  • Review, resolve and respond to client tickets
  • Create, monitor and close all support cases
  • Maintain availability reports for tracking SLA's
  • Maintain the system security posture required by the client
  • Troubleshoot and repair hardware issues
  • Track/document the hardware repairs as well as opening, tracking, closing part cases and returning replaced parts
  • Maintain the on-call schedule to support our 365 24x7x2/4 contracts
  • Assist with hardware and system installation activities in new systems
  • Maintain system software and firmware revisions, including patches, updates, and OS upgrades
  • Solve system hardware, software, and third-party software issues, and provide detailed and thoughtful analysis of problem and solution
  • Gather data, perform analysis, and escalate problems to higher-level product support groups and appropriate management when necessary to ensure timely resolution of system or customer issues
  • Provide solutions and implement repair or workarounds, when possible, fully documenting steps taken when required
  • Answer customer inquiries concerning system software versions, product lifecycles, new releases, and third-party applications
  • Work with minimal direction from the technical lead and with customer nominated representatives to accomplish assigned tasks
  • Participate as part of a team and maintain good relationships with team members and customers

Requirements

  • Must have DOE Q Clearance or must have held one in the past 3 years (DoD Top Secret will be considered)
  • Experience in maintaining High-Performance Computing (HPC) systems
  • Strong troubleshooting skills for hardware and software issues
  • Ability to create and document technical procedures and system diagrams
  • Experience with system monitoring and maintaining system health
  • Ability to manage client tickets and support cases effectively
  • Knowledge of system security requirements and compliance
  • Experience with hardware repairs and tracking/documenting repair cases
  • Ability to work on-call and support 24x7x2/4 contracts
  • Experience with system software and firmware updates, including patches and OS upgrades
  • Strong analytical skills to gather data and escalate issues as necessary
  • Excellent communication skills to interact with clients and team members

Nice-to-haves

  • Experience with third-party software applications
  • Familiarity with customer service best practices
  • Knowledge of network and storage systems
  • Experience in a team-oriented environment

Benefits

  • Competitive salary between $80k and $100k per year
  • Opportunities for professional development
  • Supportive work environment
  • Potential for career advancement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service