Salesforce - Denver, CO

posted 4 months ago

Full-time
Denver, CO
Publishing Industries

About the position

As a Site Reliability Engineer at Salesforce, you will play a crucial role in maintaining the performance and availability of customer-facing services. Your primary responsibility will be to ensure the constant health of the supporting systems, which involves proactive incident management and problem resolution. You will act in key support roles during major incidents, such as Sev0 and Sev1, and participate in technical reviews for problem management. This role requires a strong commitment to following internal compliance policies and directives while working collaboratively with the Site Reliability team to stay updated on industry innovations and technologies. In this fast-paced environment, you will be expected to solve complex technical issues quickly and effectively, balancing multiple priorities. Automation will be a key focus, as you will work to automate the detection and resolution of recurring issues in the production environment. Additionally, you will help create and improve processes to reduce operational and engineering toil, contributing to the overall efficiency of the team. Your role will also involve engaging with other technical staff to address customer concerns and technical issues as they arise. You will be part of a 24/7 team managing large data centers, which requires a willingness to work shifts and be on call when necessary. Your expertise in systems engineering, networking protocols, and Unix variants will be essential in supporting the infrastructure and ensuring its reliability.

Responsibilities

  • Maintain the performance and availability of customer-facing services.
  • Act in key support roles during major incidents (Sev0, Sev1).
  • Participate in technical reviews for problem management.
  • Populate and participate in Root Cause Analyses (RCAs) and hand them off to the Global Solutions team.
  • Ensure compliance with internal policies and directives while performing work.
  • Collaborate with team members to stay updated on industry innovations and technologies.
  • Automate detection and resolution of recurring issues in the production environment.
  • Create and improve processes to reduce operational and engineering toil.
  • Engage with technical staff to resolve customer concerns and technical issues.

Requirements

  • U.S. citizen (U.S. born or naturalized) without dual citizenship.
  • A related technical degree is required.
  • Experience in systems engineering in an enterprise-scale internet service engineering or support role.
  • Expertise in TCP/IP related technologies (networking protocols, network programming, etc.).
  • Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD) with strong knowledge of Red Hat Enterprise Linux and Solaris.
  • Strong understanding of monitoring security systems and administration.
  • Strong communication skills (written and oral).
  • Past experience in incident management with a good understanding of ITIL service operations.
  • Willingness to work in a 24/7 team managing large data centers.
  • Availability for shift work and being on call if required.
  • Experience provisioning, operating, and running AWS/C2S based infrastructure and systems.
  • Experience writing scripts in Python, Go, or other languages.

Nice-to-haves

  • Prior Chef/Puppet or automated deployment experience.
  • Prior Jenkins/Bamboo/Spinnaker pipeline execution experience.
  • Experience in supporting and maintaining monitoring and alert systems.
  • Experience in supporting and maintaining Java applications.
  • Hands-on experience configuring and running AWS (Amazon Web Services) using the CLI/SDKs.
  • Certifications in Linux+, RedHat, and AWS.
  • Experience in supporting and leading Kubernetes-based applications and services.
  • Familiarity with Agile Process and DevOps.
  • Experience in conducting blameless retrospectives and post-incident investigations.
  • Working knowledge of resilience engineering concepts.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service