Digital Apps SRE

$103,000 - $123,000/Yr

Endava - Berkeley Heights, NJ

posted 3 months ago

Full-time - Mid Level
Remote - Berkeley Heights, NJ
Professional, Scientific, and Technical Services

About the position

As a Digital Apps Site Reliability Engineer (SRE) at GalaxE Solutions, you will play a crucial role in providing hands-on support for existing environments. Your responsibilities will encompass a wide range of tasks including software installation, patch installation, upgrades, query writing, configuration, security, system monitoring and tuning, disaster recovery planning, and release deployments. You will be expected to provide 24x7 support for production Internet applications on a rotating basis, acting as a point of escalation for application support to diagnose and resolve complex customer issues related to the Portal and Web Services environments. In this role, you will drive incident crisis technical bridges and management bridges as required, leveraging your experience and organizational knowledge to reduce Mean Time to Recovery (MTTR). You will collaborate with Change Management and Release Managers to review proposed change events for production and participate in all Production Support activities during incidents and outages. As a hands-on technical resource, you will be capable of resolving all technical issues within lower and upper environments and making recommendations for performance and capacity improvements. Documentation is a key aspect of this role; you will be responsible for documenting install defects, assigning severity to problems, and performing postmortems to identify root cause analysis (RCA) after fallbacks. You will also participate in internal and external audits as required by management and work closely with Engineering to ensure all relevant Key Performance Indicators (KPIs) are implemented within the monitoring framework. Additionally, you will escalate issues to technology, operations, and/or vendors where appropriate, ensuring that database/application controls and procedures remain compliant with Corporate IT risk. Supporting Disaster Recovery tests and live recovery for all production environments will also be part of your responsibilities.

Responsibilities

  • Provide hands-on support for existing environments including software installation, patch installation, upgrades, and configuration.
  • Provide 24x7 support of production Internet applications on a rotating basis.
  • Act as a point of escalation for application support to diagnose and resolve complex customer issues.
  • Drive incident crisis technical bridges and management bridges as required to reduce MTTR.
  • Collaborate with Change Management and Release Managers to review proposed change events for production.
  • Participate in all Production Support activities during incidents and outages.
  • Resolve all technical issues within lower and upper environments and recommend performance and capacity improvements.
  • Document install defects and assign severity to problems that occurred.
  • Perform postmortems to identify root cause analysis after fallbacks.
  • Participate in internal and external audits as required by management.
  • Work closely with Engineering to implement relevant KPIs within the monitoring framework.
  • Escalate issues to technology, operations, and/or vendors where appropriate.
  • Ensure database/application controls and procedures remain compliant with Corporate IT risk.
  • Support Disaster Recovery tests and live recovery for all production environments.

Requirements

  • Bachelor's degree required; relevant, equivalent work experience may be substituted for degree requirement.
  • Experience with web servers such as Nginx or Apache configurations and reverse proxies.
  • Proficient in Linux system administration and troubleshooting Linux systems and services.
  • Experience with JBoss or WildFly administration.
  • Ability to manage and troubleshoot Linux systems and services, including bash scripting.
  • Experience working with third-party vendors.
  • Ability to participate in On-Call rotation.

Nice-to-haves

  • Experience with containerization technologies such as Docker and Kubernetes.
  • Familiarity with Continuous Integration / Continuous Delivery tools like Azure DevOps and Jenkins.
  • Solid understanding of routing and networking concepts.
  • Experience working in an Agile development environment.
  • Familiarity with collaboration platforms such as JIRA, Confluence, Wiki, and ServiceNow.

Benefits

  • Competitive salary range of $103,000 - $123,000 per year.
  • Opportunities for professional development and career growth.
  • Diversity and inclusion initiatives within the workplace.
  • Access to cutting-edge technologies and innovative projects.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service