Dvi Technologies - Phoenix, AZ

posted 4 days ago

Full-time - Mid Level
Phoenix, AZ
Professional, Scientific, and Technical Services

About the position

The Site Reliability Engineer (SRE) role focuses on enhancing system reliability through software tools, automation, and collaboration between infrastructure and application teams. The engineer will monitor systems, troubleshoot issues, and implement improvements to ensure optimal performance and reliability of services.

Responsibilities

  • Monitor systems and infrastructure to maintain operational and performance levels
  • Rotational on-call responsibilities
  • Work closely with other SRC professionals/engineers when issues arise, collaborate on troubleshooting, and provide consultation/resolution with events/incidents
  • Anticipate potential problems before they become impacting and collaborate to determine solutions
  • Gather and analyze metrics from tools and system/application logs to assist in performance tuning, fault finding, and resolution
  • Create sustainable systems and services through automation, process enhancement, tools, and noise reduction
  • Build automation to manage the SRC operations and eliminate/minimize manual functions and toil
  • Collaborate with Application/Infrastructure support engineers and operations teams
  • Engage in post-incident reviews for improvements and determining the cause to prevent recurrence

Requirements

  • Possess a breadth and depth of technical and management knowledge
  • Continuous improvement mindset, always looking for opportunities to streamline, routinize, or automate
  • Working knowledge across technology in server administration and troubleshooting in Linux and Windows, including patching and basic scripting skills (PowerShell, Bash)
  • Experience in VCE/UCP (including VMWare versions 6 and above), platform and network connectivity, and patching understanding of current threat analysis and remediation trends
  • CIFS/NFS, Linux and Windows scripting, DPA reporting, Avamar and Data Domain administration, and solid understanding of Windows and Linux environments
  • Knowledge of Linux, Windows, WebSphere, Apache, IIS, WebLogic, and Tomcat
  • Familiarity with JCL, CICS SYSPLEX
  • Strong understanding of network protocols and OSI Model, as well as Network+ Certification
  • Experience with ServiceNow for workflow and knowledge management
  • Familiarity with collaboration tools such as TrueSight, Jira, and Confluence
  • Skilled in ITSM processes and operations analytics methodologies to drive performance improvement (e.g., Lean)
  • Strong troubleshooting and problem-solving skills, with the ability to analyze and resolve complex technical issues
  • ITIL fundamentals knowledge, including Problem Management, Change Management, Release Management, Event Management, and Incident Management

Nice-to-haves

  • Strong skills in addressing production critical incidents
  • Excellent communication and interpersonal skills, with the ability to collaborate effectively with stakeholders at all levels
  • Self-motivated and able to work independently or as part of a team, taking ownership of tasks and driving them to completion
  • Insatiable curiosity about how technologies work and how technologies interface in complex, large-scale environments
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service