Korn/Ferry International - Houston, TX

posted 19 days ago

Full-time - Mid Level
Houston, TX
Professional, Scientific, and Technical Services

About the position

The Site Reliability Engineer (SRE) role focuses on ensuring the reliability, availability, and performance of applications and infrastructure. The SRE will be responsible for monitoring, incident management, and troubleshooting across various platforms, including AWS and Big Data technologies. This position requires a blend of software engineering and systems engineering skills to maintain high service levels and support disaster recovery efforts.

Responsibilities

  • Monitor application infrastructure using tools like Splunk or Dynatrace.
  • Triage incidents related to distributed and mainframe applications and middleware platforms.
  • Manage incident and problem management functions, responding to service requests from support teams.
  • Perform Unix Shell, PERL, and Bash scripting as required.
  • Analyze incidents reported in IBM WebSphere, Apache Tomcat, and IBM DataPower for issue resolution.
  • Run job schedulers like Control-M.
  • Troubleshoot technical issues in Java/J2EE, .Net, or Cloud environments and collaborate with technology teams for solutions.
  • Coordinate incident management coverage and facilitate communications during outages.
  • Document calls, manage queues, and analyze tickets for incident impact analysis.
  • Provide an end-to-end view of issues for objectivity and act as a single voice for the line-of-business.
  • Influence senior technology leads to ensure timely resolution of incidents.

Requirements

  • At least 5 years of experience in AWS, Big Data, and Spark.
  • 2-3 years of experience in Python and Shell Scripting.
  • Proven expertise in application development and support across multiple technologies.
  • Advanced knowledge of development tools for software design, development, testing, deployment, maintenance, and improvement.
  • Proficiency in AWS, Akamai, and Datadog technologies.
  • Proficiency in Splunk, Dynatrace, Unix, Linux, Tomcat, and WebSphere.
  • Experience in setting up Splunk alerting and monitoring.
  • Experience in building monitoring dashboards through Dynatrace.

Nice-to-haves

  • Experience with general-purpose programming languages such as Java, Python, .Net, or C++.
  • Familiarity with cloud platforms like AWS and Pivotal Cloud Foundry.
  • Understanding of network topologies, load balancing concepts, and content delivery networks.
  • Knowledge of HAProxy and Pivotal Gemfire.
  • Understanding of web and mobile applications.
  • Familiarity with relational databases like Oracle/DB2 and non-relational databases like Cassandra.
  • Knowledge of IBM MQ and Kafka.
  • Understanding of risk controls and compliance standards.

Benefits

  • Professional development opportunities
  • Flexible working hours
  • Health insurance coverage
  • Paid time off
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service