Ramy Infotech - Dallas, TX
posted 5 months ago
We are seeking a versatile and skilled professional who excels in both Site Reliability Engineering (SRE) and Java development. This hybrid role requires a candidate who can work effectively with cross-functional teams to ensure the reliability and performance of our systems. The ideal candidate will be responsible for developing and maintaining scripts to automate tasks and processes related to performance, scalability, and resilience. They will monitor system health using SRE tools and proactively identify potential problems, ensuring that our platform remains stable and efficient. In the realm of Site Reliability Engineering, the candidate will utilize tools like Grafana, New Relic, and Kibana to monitor and analyze system performance metrics. They will triage and resolve issues affecting the platform's performance and stability, creating and managing Jira tickets to track and resolve issues efficiently. Gathering necessary data and insights for troubleshooting and optimization will be a key part of their responsibilities. On the Java development side, the candidate will design, develop, and maintain Java-based applications, writing clean, efficient, and maintainable code that adheres to best practices and coding standards. They will perform code reviews and provide constructive feedback to other developers, troubleshoot and resolve application bugs and performance issues, and collaborate with the development team to implement new features and enhancements. Conducting testing and debugging of applications to ensure high quality and reliability will also be essential. Additionally, the candidate will implement automation solutions to streamline operational workflows and reduce manual intervention, developing and maintaining automation scripts using Python, Shell scripting, or other relevant languages. They will provide end-to-end support to the business, ensuring high availability and reliability of the platform, monitoring trends in order processing and submission to ensure smooth operations, and proactively addressing anomalies and issues to maintain high availability and reliability.