Disability Solutions - Atlanta, GA
posted 2 months ago
As a Site Reliability Engineer at Honeywell Connected Enterprise (HCE), you will play a crucial role in ensuring the reliability, availability, and performance of our software systems. Your responsibilities will include designing, implementing, and maintaining the infrastructure and tools necessary for monitoring and managing our applications. Your expertise in automation and troubleshooting will be essential in identifying and resolving issues to minimize downtime and optimize system performance. You will collaborate with cross-functional teams to drive continuous improvement and implement best practices for system reliability. This position is based in Atlanta, Georgia, and operates on a hybrid work schedule, allowing for a blend of in-office and remote work. In this role, you will have a significant impact on the reliability and performance of our software systems, ensuring seamless operations and customer satisfaction. You will be involved in hands-on design, analysis, development, and troubleshooting of highly distributed large-scale production systems and event-driven, cloud-based services. Your primary focus will be on Linux Administration, managing a fleet of Linux and Windows VMs as part of the application solution. You will also engage in infrastructure as code development using tools like Terraform, shell scripting, and Python. Your responsibilities will extend to ensuring the repeatability, traceability, and transparency of our infrastructure automation. You will support on-call rotations for operational duties that have not been addressed with automation and promote healthy software development practices, including compliance with chosen software development methodologies such as Agile. Additionally, you will create and maintain monitoring technologies and processes that improve visibility into our applications' performance and business metrics, keeping operational workload in check. Partnering with security engineers, you will develop plans and automation to respond to new risks and vulnerabilities effectively. Your role will also involve participating in technical training events, game day scenarios, and professional conferences to enhance your skills and knowledge.