GNC Site Reliability Engineer (Falcon)

$120,000 - $145,000/Yr

SpaceX - Hawthorne, CA

posted 4 months ago

Full-time - Mid Level

Hawthorne, CA

Transportation Equipment Manufacturing

About the position

SpaceX is looking for a GNC Site Reliability Engineer to operate and scale custom-built mission-critical products for Guidance Navigational and Control (GNC). The GNC team performs trajectory design and vehicle simulation and participates in recurring mission-critical launch operations. This position will work with the GNC team to maintain and improve a set of GNC-focused tools. Examples of these products include Monte Carlo simulations on a high-performance computing cluster, automated data analysis systems, continuous integration systems for rocket and simulation software, GNC analysis infrastructure, and vehicle configuration verification tools. The ideal candidate will be flexible, possess broad skills across product operations and software development, and flourish in a fast-paced and challenging environment. The responsibilities of the GNC Site Reliability Engineer include deploying, upgrading, operating, maintaining, and scaling a suite of mission-critical GNC products and services. The engineer will provision and maintain virtual and physical servers, work with the SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster, and closely collaborate with GNC software engineers to create highly operable and maintainable products. The role also involves adding monitoring for web applications, managing the underlying computational infrastructure of GNC in collaboration with IT, and engaging in and improving the whole lifecycle of services from inception and design through deployment, operation, and refinement. The engineer will make recommendations for future hardware purchases, practice sustainable incident response and postmortems, and provide end-user support to GNC engineering for products by becoming an expert on analysis applications and supporting users in troubleshooting and pointing to features. Additionally, the engineer will configure automated deployment pipelines for web applications, develop or improve GNC web applications and tools for better usability, maintainability, and robustness, demo and document new software changes, and focus on performance bottlenecks and performance improvement techniques.

Responsibilities

Deploy, upgrade, operate/maintain, and scale a suite of mission-critical GNC products and services
Provision and maintain virtual and physical servers
Work with SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster
Closely collaborate with GNC software engineers to create highly operable and maintainable products
Add monitoring for webapps and respond to outages
Manage the underlying computational infrastructure of GNC in collaboration with IT
Engage in and improve the whole lifecycle of services: from inception and design, through deployment, operation and refinement
Make recommendations for future hardware purchases
Practice sustainable incident response and postmortems
Provide end-user support to GNC engineering for products by becoming an expert on analysis applications and support users in troubleshooting and pointing to features
Configure automated deployment pipelines for webapps
Develop or improve GNC webapps and tools for better usability, maintainability, and robustness
Demo and document new software changes such as operating system upgrades, shared filesystem changes, or major tool rollouts
Focus on performance bottlenecks and performance improvement techniques

Requirements

Bachelor's degree in computer science, information systems/IT, engineering, math, or scientific discipline and 2+ years of software development experience OR 4+ years of professional experience building software with site reliability or DevOps in lieu of a degree
Experience with Linux operating systems
Experience with Python and Python based development frameworks

Nice-to-haves

2+ years of systems administration, site reliability engineering, or DevOps experience
2+ years of experience with Python and Python-based development frameworks
2+ years of Linux experience
Expertise with Docker, Vagrant, and Kubernetes or similar technologies
Extensive Experience with configuration management tools such as Ansible, Puppet, Terraform
Experience with build systems (Make, Bazel / Pants / Buck, Gradle) and package management tools (pip, npm)
Strong understanding of virtualization and hypervisor technologies
Understanding of databases and data modeling
Experience with automatically managing dozens or hundreds of servers
Strong networking knowledge of TCP/IP
Experience scaling web applications and optimizing applications for performance
Professional experience with standard front-end technologies like modern HTML, CSS, JavaScript (we use AngularJS, Polymer, Backbone.js, React, and more), REST, JSON
Solid understanding of UI/UX design to provide intuitive applications
Experience with high-performance computing systems or large-scale data analysis systems
Must be comfortable working with mission-critical and sensitive systems, with a sense of urgency appropriate to the responsibilities

Benefits

Comprehensive medical, vision, and dental coverage
401(k) retirement plan
Short & long-term disability insurance
Life insurance
Paid parental leave
Various discounts and perks
3 weeks of paid vacation
10 or more paid holidays per year
5 days of sick leave per year
Potential discretionary bonuses
Ability to purchase additional stock at a discount through an Employee Stock Purchase Plan
Long-term incentives in the form of company stock, stock options, or long-term cash awards

GNC Site Reliability Engineer (Falcon)

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company