Salesforce - Denver, CO

posted 10 days ago

Full-time
Denver, CO
Publishing Industries

About the position

The Site Reliability Engineer (SRE) role focuses on maintaining the performance and availability of customer-facing services by ensuring the health of supporting systems. This position involves incident management, problem management, and collaboration with technical teams to address customer concerns and technical issues. The SRE will also participate in retrospectives and contribute to the continuous improvement of the Site Reliability team.

Responsibilities

  • Maintain the constant health of customer-facing services and supporting systems.
  • Act in key support roles during major incidents (Sev0, Sev1) and participate in technical reviews for problem management.
  • Populate and participate in root cause analyses (RCAs) and hand them off to the Global Solutions team.
  • Ensure compliance with the company's internal policies and directives in all work carried out by the Site Reliability team.
  • Collaborate with other technical staff to solve technical issues and address customer concerns.
  • Lead team members in staying updated on key industry innovations and technologies.

Requirements

  • Systems engineering experience in enterprise-scale internet service engineering or support role.
  • Expertise in TCP/IP related technologies (networking protocols, network programming, etc.).
  • Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD) with strong knowledge of Red Hat Enterprise Linux and Solaris.
  • Strong understanding of monitoring security systems and administration.
  • Strong communication skills (written and oral).
  • Past experience in incident management and a good understanding of ITIL service operations.
  • Willingness to work in a 24/7 team managing large data centers.
  • Experience provisioning, operating, and running AWS/C2S based infrastructure and systems.
  • Experience writing scripts in Python, Go, or other languages.

Nice-to-haves

  • Prior Chef/Puppet or automated deployment experience.
  • Prior Jenkins/Bamboo/Spinnaker pipeline execution experience.
  • Experience in supporting and maintaining monitoring and alert systems.
  • Experience in supporting and maintaining Java applications.
  • Hands-on experience configuring and running AWS using the CLI/SDKs.
  • Certifications in Linux+, RedHat, and AWS.
  • Experience in supporting and leading Kubernetes-based applications and services.
  • Familiarity with Agile Process and DevOps.
  • Experience in conducting blameless retrospectives and post-incident investigations.
  • Working knowledge of resilience engineering concepts.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service