Fulcrum Digital - Saint Louis, MO

posted 19 days ago

Full-time
Saint Louis, MO
Administrative and Support Services

About the position

The System Reliability Engineer (Application Support) at Fulcrum Digital is responsible for providing Level 2 support for production systems, including applications, databases, middleware, infrastructure, and network components. The role focuses on managing production incidents, ensuring operational readiness, and automating processes to enhance system reliability. The engineer will collaborate with various stakeholders, participate in change management, and support the DevOps team in optimizing deployment pipelines.

Responsibilities

  • Provide L2 support to production systems including application, database, middleware, infrastructure, and network components.
  • Manage production incidents end-to-end within defined SLAs with a focus on resolution.
  • Interact with stakeholders such as Release managers, program leads, service managers, and development leads.
  • Review operational readiness requirements and report gaps in monitoring, alerting, and resilience.
  • Provide pre-implementation support including release notes review and implementation dry runs.
  • Run health checks and monitor latency and memory utilization of production components.
  • Automate day-to-day activities and propose changes to improve reliability.
  • Participate in CAB and provide feedback on change requests.
  • Support the DevOps team in testing promote pipelines and suggest automation of configuration items.
  • Practice incident management best practices and perform root cause analysis (RCA).
  • Participate in disaster recovery tests and operational acceptance tests.
  • Analyze the technology stack to optimize recovery time objectives.
  • Work with team members across different time zones.
  • Share knowledge, document improvements, and mentor junior resources.
  • Support deployments of code into multiple lower environments with an emphasis on automation.
  • Engage in and improve the lifecycle of services from inception to refinement.

Requirements

  • Experience with deployments in MTF/Prod environments.
  • Knowledge of maintenance items including stop/start and disaster recovery activities.
  • Familiarity with log monitoring tools such as Splunk.
  • Experience with application monitoring tools like DynaTrace.
  • Proficiency in ticketing incident/problem management tools such as Remedy.
  • Strong skills in Linux and Shell Scripting.
  • Understanding of ITIL / ITSM processes.
  • Proficiency in PL/SQL for database interactions.
  • Troubleshooting skills for production issues.
  • Basic knowledge of Jenkins for CI/CD.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service