SpaceX - Hawthorne, CA
posted 3 months ago
SpaceX is looking for a GNC Site Reliability Engineer to operate and scale custom-built mission-critical products for Guidance Navigational and Control (GNC). The GNC team performs trajectory design and vehicle simulation and participates in recurring mission-critical launch operations. This position will work with the GNC team to maintain and improve a set of GNC-focused tools. Examples of these products include Monte Carlo simulations on a high-performance computing cluster, automated data analysis systems, continuous integration systems for rocket and simulation software, GNC analysis infrastructure, and vehicle configuration verification tools. The ideal candidate will be flexible, possess broad skills across product operations and software development, and flourish in a fast-paced and challenging environment. The responsibilities of the GNC Site Reliability Engineer include deploying, upgrading, operating, maintaining, and scaling a suite of mission-critical GNC products and services. The engineer will provision and maintain virtual and physical servers, work with the SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster, and closely collaborate with GNC software engineers to create highly operable and maintainable products. The role also involves adding monitoring for web applications, managing the underlying computational infrastructure of GNC in collaboration with IT, and engaging in and improving the whole lifecycle of services from inception and design through deployment, operation, and refinement. The engineer will make recommendations for future hardware purchases, practice sustainable incident response and postmortems, and provide end-user support to GNC engineering for products by becoming an expert on analysis applications and supporting users in troubleshooting and pointing to features. Additionally, the engineer will configure automated deployment pipelines for web applications, develop or improve GNC web applications and tools for better usability, maintainability, and robustness, demo and document new software changes, and focus on performance bottlenecks and performance improvement techniques.