This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

HPC Systems Engineer

$120,000 - $160,000/Yr

SpaceX - Hawthorne, CA

posted 2 months ago

Full-time - Entry Level
Hawthorne, CA
Transportation Equipment Manufacturing

About the position

The HPC Systems Engineer at SpaceX will be responsible for administering and managing high-performance computing (HPC) clusters, storage systems, and high-speed networks. This role requires providing application support to SpaceX employees across various engineering disciplines and involves installing and integrating Linux-based compute clusters. The ideal candidate should thrive in a fast-paced environment, demonstrating self-motivation and ingenuity.

Responsibilities

  • Administer and manage HPC clusters, storage systems, and high-speed networks.
  • Provide application support to SpaceX employees across engineering disciplines.
  • Install and integrate Linux-based compute clusters.
  • Write instructional documentation and convey highly technical ideas in non-technical terms.

Requirements

  • Bachelor's degree in computer science, engineering, math, or scientific discipline; OR 2+ years of professional experience building software in lieu of a degree.
  • Experience with Linux.
  • Experience with client and server hardware/software, management tools, enterprise networking, virtualization, and security technologies.

Nice-to-haves

  • 1+ years of professional experience building, deploying and troubleshooting Linux systems.
  • Experience with a scripting language (Bash, Python) to automate and solve recurring tasks.
  • Experience building, deploying and troubleshooting HPC clusters.
  • Familiarity with cluster resource managers (Slurm, PBS, LSF).
  • Experience with monitoring and alerting technologies (Prometheus, Grafana, Nagios).
  • Familiarity with scientific and engineering computing (CFD, FEA).
  • Familiarity with ML frameworks (PyTorch, Tensorflow).
  • Familiarity with GPU usage in a compute cluster and Cuda.
  • Experience with containers (Docker, Podman, Singularity).
  • Experience deploying and maintaining automated configuration management software (Puppet, Ansible).
  • Comfortable working with mission critical and sensitive systems, with a sense of urgency appropriate to the responsibilities.

Benefits

  • Comprehensive medical, vision, and dental coverage.
  • 401(k) retirement plan.
  • Short and long-term disability insurance.
  • Life insurance.
  • Paid parental leave.
  • Various discounts and perks.
  • 3 weeks of paid vacation.
  • 10 or more paid holidays per year.
  • 5 days of sick leave per year.
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service