SpaceX - Hawthorne, CA

posted 3 months ago

Full-time - Mid Level
Hawthorne, CA
Transportation Equipment Manufacturing

About the position

SpaceX is looking for a GNC Site Reliability Engineer to operate and scale custom-built mission-critical products for Guidance Navigational and Control (GNC). The GNC team performs trajectory design and vehicle simulation and participates in recurring mission-critical launch operations. This position will work with the GNC team to maintain and improve a set of GNC-focused tools. Examples of these products include Monte Carlo simulations on a high-performance computing cluster, automated data analysis systems, continuous integration systems for rocket and simulation software, GNC analysis infrastructure, and vehicle configuration verification tools. The ideal candidate will be flexible, possess broad skills across product operations and software development, and flourish in a fast-paced and challenging environment. The responsibilities of the GNC Site Reliability Engineer include deploying, upgrading, operating, maintaining, and scaling a suite of mission-critical GNC products and services. The engineer will provision and maintain virtual and physical servers, work with the SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster, and closely collaborate with GNC software engineers to create highly operable and maintainable products. The role also involves adding monitoring for web applications, managing the underlying computational infrastructure of GNC in collaboration with IT, and engaging in and improving the whole lifecycle of services from inception and design through deployment, operation, and refinement. The engineer will make recommendations for future hardware purchases, practice sustainable incident response and postmortems, and provide end-user support to GNC engineering for products by becoming an expert on analysis applications and supporting users in troubleshooting and pointing to features. Additionally, the engineer will configure automated deployment pipelines for web applications, develop or improve GNC web applications and tools for better usability, maintainability, and robustness, demo and document new software changes, and focus on performance bottlenecks and performance improvement techniques.

Responsibilities

  • Deploy, upgrade, operate/maintain, and scale a suite of mission-critical GNC products and services
  • Provision and maintain virtual and physical servers
  • Work with SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster
  • Closely collaborate with GNC software engineers to create highly operable and maintainable products
  • Add monitoring for webapps and respond to outages
  • Manage the underlying computational infrastructure of GNC in collaboration with IT
  • Engage in and improve the whole lifecycle of services: from inception and design, through deployment, operation and refinement
  • Make recommendations for future hardware purchases
  • Practice sustainable incident response and postmortems
  • Provide end-user support to GNC engineering for products by becoming an expert on analysis applications and support users in troubleshooting and pointing to features
  • Configure automated deployment pipelines for webapps
  • Develop or improve GNC webapps and tools for better usability, maintainability, and robustness
  • Demo and document new software changes such as operating system upgrades, shared filesystem changes, or major tool rollouts
  • Focus on performance bottlenecks and performance improvement techniques

Requirements

  • Bachelor's degree in computer science, information systems/IT, engineering, math, or scientific discipline and 2+ years of software development experience OR 4+ years of professional experience building software with site reliability or DevOps in lieu of a degree
  • Experience with Linux operating systems
  • Experience with Python and Python based development frameworks

Nice-to-haves

  • 2+ years of systems administration, site reliability engineering, or DevOps experience
  • 2+ years of experience with Python and Python-based development frameworks
  • 2+ years of Linux experience
  • Expertise with Docker, Vagrant, and Kubernetes or similar technologies
  • Extensive Experience with configuration management tools such as Ansible, Puppet, Terraform
  • Experience with build systems (Make, Bazel / Pants / Buck, Gradle) and package management tools (pip, npm)
  • Strong understanding of virtualization and hypervisor technologies
  • Understanding of databases and data modeling
  • Experience with automatically managing dozens or hundreds of servers
  • Strong networking knowledge of TCP/IP
  • Experience scaling web applications and optimizing applications for performance
  • Professional experience with standard front-end technologies like modern HTML, CSS, JavaScript (we use AngularJS, Polymer, Backbone.js, React, and more), REST, JSON
  • Solid understanding of UI/UX design to provide intuitive applications
  • Experience with high-performance computing systems or large-scale data analysis systems
  • Must be comfortable working with mission-critical and sensitive systems, with a sense of urgency appropriate to the responsibilities

Benefits

  • Comprehensive medical, vision, and dental coverage
  • 401(k) retirement plan
  • Short & long-term disability insurance
  • Life insurance
  • Paid parental leave
  • Various discounts and perks
  • 3 weeks of paid vacation
  • 10 or more paid holidays per year
  • 5 days of sick leave per year
  • Potential discretionary bonuses
  • Ability to purchase additional stock at a discount through an Employee Stock Purchase Plan
  • Long-term incentives in the form of company stock, stock options, or long-term cash awards
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service