Thales - Hawthorne, CA
posted 4 months ago
SpaceX is looking for a Sr. GNC Site Reliability Engineer to operate and scale custom-built mission-critical products for Guidance Navigational and Control (GNC). The GNC team performs trajectory design and vehicle simulation and participates in recurring mission-critical launch operations. This position will work with the GNC team to maintain and improve a set of GNC-focused tools. Examples of these products include Monte Carlo simulations on a high-performance computing cluster, automated data analysis systems, continuous integration systems for rocket and simulation software, GNC analysis infrastructure, and vehicle configuration verification tools. The ideal candidate will be flexible, possess broad skills across product operations and software development, and flourish in a fast-paced and challenging environment. In this role, you will deploy, upgrade, operate, maintain, and scale a suite of mission-critical GNC products and services. You will provision and maintain virtual and physical servers, working closely with the SpaceX HPC team to monitor and maintain a 4000+ thread HPC cluster. Collaboration with GNC software engineers is essential to create highly operable and maintainable products. You will also be responsible for adding monitoring for web applications and responding to outages, managing the underlying computational infrastructure of GNC in collaboration with IT, and engaging in and improving the whole lifecycle of services from inception and design through deployment, operation, and refinement. Additionally, you will make recommendations for future hardware purchases, practice sustainable incident response and postmortems, and provide end-user support to GNC engineering for products by becoming an expert on analysis applications. You will configure automated deployment pipelines for web applications, develop or improve GNC web applications and tools for better usability, maintainability, and robustness, and demo and document new software changes such as operating system upgrades or major tool rollouts. Focusing on performance bottlenecks and performance improvement techniques will also be a key part of your responsibilities.