Parsons - Alexandria, VA
posted 4 months ago
Parsons/Space Ground System Solutions (SGSS) is seeking a full-time Site Reliability Engineer (SRE) to join our IT Support team in Alexandria, VA. This role is pivotal in the expansion of satellite ground system software to hybrid and private cloud infrastructures. The SRE will manage, support, and facilitate infrastructure operations for developers working with a mix of government-off-the-shelf, commercial, and open-source software. A key aspect of this position is the ability to approach software engineering with a focus on IT operations, systematically designing, implementing, managing, and automating Application and Infrastructure security tools. The SRE will act as a critical link between the software development team and NRL's sponsors and customers, engineering and delivering operational solutions through automation to meet reliability and maintainability needs. In this role, the successful candidate will passionately support the design, engineering, and coordination of legacy environment migrations. They will provide a holistic IT Service Delivery view, focusing on dynamic provisioning, capacity planning, scheduled maintenance, system performance metrics, change management, and high-quality services. The SRE will develop and manage automation for deploying, operating, monitoring, and remediating failures and performance issues across various environments, including on-premise, private cloud, commercial cloud, and hybrid setups for government customers. They will also identify opportunities for root cause analysis (RCA) and develop processes to address gaps, ensuring quality system usage and management. The SRE will serve as a subject matter expert in automating cloud cyber risk mitigation, particularly in AWS and AWS GovCloud environments. They will document and automate processes discovered through engagement with software engineers and users, manage internal and customer change control boards, and ensure compliance with policy mandates. This role requires a software engineering approach to IT operations, focusing on the design, implementation, management, and automation of security tools, as well as the identification of appropriate cloud-based infrastructure to meet mission requirements. The SRE will also conduct post-incident reviews, develop new response plans, and assist the development team with requirements verification, ensuring a holistic view of capacity planning and system performance metrics.