KLAposted 2 months ago
$100,000 - $170,000/Yr
Full-time • Entry Level
Ann Arbor, MI
Computer and Electronic Product Manufacturing

About the position

Crafting, deploying, and supporting an HPC cluster from infancy to enterprise is exciting because it involves crafting robust, scalable systems that push the boundaries of computational power! This process offers the satisfaction of overcoming sophisticated challenges and seeing your work enable groundbreaking research and innovation.

Responsibilities

  • Design, implementation & support of high-performance compute clusters
  • Solid understanding on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud based computing architectures
  • Apply their attention to detail to generate HW BOMs for HPC Clusters, provide vendor management and coordinate HW release activities.
  • Use their strong skills with the Linux OS to configure appropriate operating systems for the HPC system
  • Understand and assemble the project specifications and performance requirements at the subsystem and system levels. Adhere and strive to project timelines to ensure program achievements complete on time.
  • Support design and release of new products to manufacturing and ultimately the customer, providing quality golden images, procedures, scripts and documentation to the manufacturing team and customer support team.
  • Lead EOL Parts Re-Qualification for long term system deployments
  • Support in-house as well as in-field critical issues

Requirements

  • Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
  • Experience of crafting and maintaining robust storage
  • Strong HPC HW knowledge especially in the Server, GPU, Networking, Storage, Scheduler, BIOS & BMC arenas.
  • Experience in System-D, Net boot/PXE, Linux HA.
  • Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP, HTTP, LDAP, SMTP.
  • Strong with Storage File Shares: NFS/CIFS
  • Ability to code and develop Shell and Python scripts.
  • Experience with one or more of the listed Configuration Mgmt utilities. (Ansible, Salt, Chef, Puppet etc)

Nice-to-haves

  • Possess a strong DevOps focus: Knowledge of setting up a continuous development pipelines, Repository software (Git-based).
  • Hypervisor Knowledge: VMWare, Proxmox, or XCP-ng
  • Knowledge of Apache/Nginx, Setting up proxy/reverse proxy, application server routing, load balancing (HA Proxy)
  • HPC Schedulers: SGE/SLURM
  • Monitoring tools: Prometheus, Grafana, Nagios
  • Database Technologies: MySQL
  • BS or MS degree 5+ years validated experience
  • Computer Engineering or Electrical Engineer related fields

Benefits

  • medical, dental, vision, life, and other voluntary benefits
  • 401(K) including company matching
  • employee stock purchase program (ESPP)
  • student debt assistance
  • tuition reimbursement program
  • development and career growth opportunities and programs
  • financial planning benefits
  • wellness benefits including an employee assistance program (EAP)
  • paid time off and paid company holidays
  • family care and bonding leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service