Motion Recruitment - Arlington, VA

posted 3 months ago

Full-time
Remote - Arlington, VA
Administrative and Support Services

About the position

This company is looking for a Site Reliability Engineer to lead a team responsible for building, managing, maintaining, and scaling the centralized infrastructure services that support our mission-critical operations. The role is based in Herndon, VA, and will remain remote-friendly, requiring a couple of days on-site each month. As a Site Reliability Engineer, you will oversee the design of software solutions that integrate Open Source, Commercial Off-The-Shelf (COTS), and custom-developed components. You will deploy, configure, and manage services across production, QA, and development environments on platforms such as OpenStack and Docker. In this position, you will build and manage infrastructure using Terraform and develop deployment automation tools using Ansible. You will also create automation and configuration management solutions with SaltStack and Jenkins, and implement encryption solutions with HashiCorp Vault. Additionally, you will contribute to the development of a large-scale Software Defined Network (SDN) using Guardicore, document processes, procedures, configurations, and deployment plans, and collaborate with technical teams to implement systems and software. Occasionally, you will provide operational support, including troubleshooting and problem resolution, and offer technical leadership in operational processes and change management while mentoring less experienced engineers. Regular progress updates to management will be part of your responsibilities, and you will participate in a 24x7 on-call rotation.

Responsibilities

  • Oversee the design of software solutions that integrate Open Source, Commercial Off-The-Shelf (COTS), and custom-developed components.
  • Deploy, configure, and manage services across production, QA, and development environments on platforms such as OpenStack and Docker.
  • Build and manage infrastructure using Terraform.
  • Develop deployment automation tools using Ansible.
  • Create automation and configuration management solutions with SaltStack and Jenkins.
  • Implement encryption solutions with HashiCorp Vault.
  • Contribute to the development of a large-scale Software Defined Network (SDN) using Guardicore.
  • Document processes, procedures, configurations, and deployment plans.
  • Collaborate with technical teams to implement systems and software.
  • Occasionally provide operational support, including troubleshooting and problem resolution.
  • Offer technical leadership in operational processes and change management, while mentoring less experienced engineers.
  • Provide regular progress updates to management.
  • Participate in a 24x7 on-call rotation.

Requirements

  • Bachelor's degree in Computer Science, a related technical field, or equivalent education and experience.
  • 8+ years of experience in developing and managing mission-critical systems.
  • In-depth knowledge of Linux configuration and administration.
  • Proficiency in a high-level scripting language such as Python.
  • Extensive experience with automation, including not only development but understanding the purpose and key areas for automation.
  • Strong grasp of infrastructure-as-code principles.
  • Excellent written and verbal communication skills, with the ability to clearly explain complex issues.
  • Solid understanding of network protocols and security practices.
  • Experience building and optimizing monitoring and reporting solutions using tools like Grafana and Splunk.
  • Familiarity with development tools such as GitHub, Jira, and Confluence.

Nice-to-haves

  • Expertise in deployment automation using tools like Ansible.
  • Hands-on experience with Jenkins in a continuous integration and delivery environment.
  • Experience with Docker or Kubernetes in a production setting.
  • Familiarity with OpenStack in production environments.
  • Knowledge of HTTP proxies like Squid.
  • Experience working with Red Hat Enterprise Linux and/or FreeBSD.
  • Familiarity with CMDB and ITIL platforms such as ServiceNow.
  • Experience with RedHat Identity Manager and/or FreeIPA.
  • Administration of Linux and Unix systems in large-scale environments.
  • Experience with VMware in a production environment.
  • Familiarity with Agile methodologies, including Kanban and/or Scrum.
  • Experience in Registry Services, E-commerce, or ISP environments is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service