Senior Site Reliability Engineer

$117,200 - $229,200/Yr

Microsoft - Redmond, WA

posted 2 months ago

Full-time - Senior
Remote - Redmond, WA
Publishing Industries

About the position

We are seeking a Senior Site Reliability Engineer (SRE) to support and expand Viva Engage, the industry-defining social network for enterprises. This platform serves millions of employees, including those from 85% of Fortune 500 companies, enabling them to build community, share knowledge, and connect with their leaders and peers. As the user base for Viva Engage continues to grow rapidly, the Site Reliability team plays a crucial role in ensuring the service remains reliable while we scale and modernize our technology stack. We need an SRE who can effectively manage the conflicting priorities of maintaining current operations while also ensuring that we have the necessary architecture for future growth. Acquired in 2012, Viva Engage combines the benefits of a startup—rapid innovation, cutting-edge technology, and significant individual impact—with the advantages of working for one of the most successful software companies in the world. Our mission is to empower every person and organization on the planet to achieve more. In this post-COVID world, our platform has become increasingly indispensable, fostering connection and a sense of belonging among remote teams. We value respect, integrity, and accountability, and we strive to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

  • Participate in on-call rotation and incident response throughout the product development and operation cycle, including responding to support requests after normal business hours, on weekends, and/or holidays.
  • Monitor system performance and proactively identify and resolve issues to ensure high availability and performance.
  • Develop and maintain automation tools and processes for deployment, monitoring, and configuration management.
  • Utilize troubleshooting skills, debugging tools, and examine logs, telemetry, and other methods to verify assumptions and customer impact, addressing findings efficiently via written and verbal communication.
  • Lead blameless postmortems for root cause analysis and production resiliency.
  • Consult with developers to design services that scale in Azure.
  • Mentor team members and contribute to the overall growth and development of the SRE team.
  • Stay current with industry trends, emerging technologies, and best practices in site reliability engineering and cloud computing.

Requirements

  • 6+ years of technical experience in software engineering, network engineering, or system administration, OR a Bachelor's Degree in Computer Science, Information Technology, or a related field with 3+ years of technical experience, OR a Master's Degree in Computer Science, Information Technology, or a related field with 2+ years of technical experience.
  • Ability to meet customer and/or government security screening requirements, including a Cloud background check and U.S. citizenship verification.

Nice-to-haves

  • Experience applying SRE principles in a large production environment.
  • Proficiency in cloud computing platforms (e.g., AWS, Azure, GCP) and related services (e.g., EC2, S3, VPC, IAM, Lambda).
  • Expertise in automation tools and frameworks (e.g., Terraform, Ansible, Chef, Puppet) and scripting languages (e.g., Python, Bash).
  • Deep understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) and incident response processes.
  • Strong problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
  • Effective communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.

Benefits

  • Industry-leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investment opportunities
  • Maternity and paternity leave
  • Generous time away
  • Giving program
  • Opportunities to network and connect
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service