Net2Source - Dallas, TX
posted 3 months ago
The Site Reliability Engineer (SRE) position at Net2Source Inc. is a critical role within the Site Reliability Engineering team, which is responsible for ensuring the availability, reliability, and performance of services and platforms in a highly transactional 24x7 environment. The SRE will monitor application performance, implement improvements, and automate tasks to enhance system efficiency. This role requires troubleshooting capabilities in both cloud-based and on-premises environments, handling live production incidents, and debugging application and infrastructure issues while adhering to SRE best practices. In this position, the SRE will coordinate with product owners and business representatives to define Service Level Objectives (SLOs) and error budgets for key functionalities of projects. Participation in design reviews of software components is essential to ensure they are built correctly. The SRE will also review products prior to production deployments to validate compliance with established SLOs. Conducting system analysis and configuration management to develop improvements for system software performance, availability, and reliability is a key responsibility. Collaboration with software engineers and QA teams is crucial to ensure that the system meets non-functional requirements such as performance, security, and availability. The SRE will document system knowledge, create runbooks, and ensure that critical system information is accessible to relevant stakeholders. Additionally, the role involves maintaining and monitoring the deployment of servers, docker containers, databases, and backend infrastructure, as well as participating in production feedback sessions and problem management calls to identify opportunities for product improvement.