Site Reliability Engineer

System One - Bethesda, MD

posted 3 months ago

Full-time

Bethesda, MD

Administrative and Support Services

About the position

The Site Reliability Engineer (SRE) position at ALTA IT Services is a critical role focused on ensuring the reliability, performance, and stability of production systems. The SRE will be responsible for deploying builds into production environments and leveraging their programming background to read and understand code, although no code remediation is required. A significant part of the role involves automating routine tasks to eliminate manual intervention, particularly in areas such as access management. The SRE will also be tasked with establishing and enhancing operational capabilities from the ground up, ensuring that the platform operates efficiently and effectively. In addition to deployment and automation, the SRE will triage and troubleshoot issues, identifying root causes of errors such as 403 errors. Effective incident management is crucial, as the SRE will oversee the development, testing, and staging environments, ensuring that all systems are functioning optimally. The role requires a proactive mindset focused on automation and efficiency, with a strong emphasis on leveraging modern technologies and practices to improve system reliability and performance. The ideal candidate will have relevant education and experience in Site Reliability Engineering, with a solid technical stack that includes AWS, Kubernetes, Python, Shell scripting, and experience with GitHub Actions or Jenkins for automated test scripts. This position is a contract-to-hire opportunity, providing a pathway to a permanent role for the right candidate.

Responsibilities

Deploy builds into production
Leverage programming background to read and understand code (no code remediation required)
Automate routine tasks to eliminate manual intervention (e.g., access management)
Ensure platform performance and stability
Establish and enhance operational capabilities from the ground up
Triage and troubleshoot issues (e.g., identify root causes of 403 errors)
Manage incidents effectively
Oversee development, testing, and staging environments

Requirements

Relevant education and experience in Site Reliability Engineering
Experience with AWS
Proficiency in Kubernetes
Strong programming skills in Python
Experience with Shell scripting
Familiarity with GitHub Actions or Jenkins for automated test scripts

Benefits

Health and welfare benefits coverage options including medical, dental, vision
Spending accounts
Life insurance
Voluntary plans
Participation in a 401(k) plan

Site Reliability Engineer

About the position

Responsibilities

Requirements

Benefits

Tools

Career Hubs

Guides

Company