Site Reliability Engineer (SRE)

Vizio Group - Dallas, TX

posted 5 months ago

Full-time - Mid Level

Dallas, TX

1,001-5,000 employees

Furniture, Home Furnishings, Electronics, and Appliance Retailers

About the position

VIZIO is seeking a Site Reliability Engineer (SRE) to join our expanding organization, which is dedicated to releasing firmware and software for millions of customers efficiently while maintaining a 99.9% uptime. The SRE will report to the Manager of DevOps Security and will play a crucial role in enhancing the availability, performance, and security of cloud services managed by the Vizio Operating System engineering organization. This position is part of the DevSecOps Engineering team, where responsibilities will encompass a diverse range of areas including cloud infrastructure, networking, monitoring, and security. In this role, you will focus on optimizing availability and performance, ensuring that Vizio services deliver seamless performance and high availability to end-users, ultimately enhancing the overall customer experience. You will be responsible for implementing robust incident response strategies to swiftly address any disruptions or issues that may arise. Collaboration is key, as you will work closely with the Engineering Organization, Security teams, and Software Engineers to secure and optimize our cloud infrastructure, DevOps pipelines, and embedded platforms. Continuous learning is essential, and as an ideal candidate, you will stay abreast of the latest trends and technologies, adapting to the ever-evolving landscape of SRE practices. We are looking for a meticulous problem solver who thrives under pressure and is committed to maintaining the highest standards of reliability and security in our cloud services.

Responsibilities

Design, develop, and deploy robust products, tooling, and system solutions in collaboration with cross-functional teams.
Proactively monitor systems to prevent incidents and build effective monitoring systems that alert based on symptoms rather than outages.
Quickly triage application and system issues, debugging and tracking them to ensure swift recovery.
Create and test infrastructure and automation code to streamline operations and ensure scalability.
Review design and code, providing constructive feedback to ensure adherence to best practices.
Perform root cause analysis for incidents, assessing their impact and providing feedback to enhance system resiliency.
Contribute to and maintain existing documentation, including educational content, system architecture diagrams, and incident management runbooks.

Requirements

Bachelor's degree in computer science, Information Technology, or a related field.
5+ years of experience in site reliability engineering, cloud infrastructure engineering, cloud-based network engineering, or DevOps.
Experience with cloud platforms and applications primarily hosted in AWS.
Experience developing in Terraform or cloud formation.
Experience with Git based source-code repositories and automated CI/CD (Github Actions, Jenkins, CircleCI).
Experience with AWS native observability tools.
Experience with SumoLogic data analytics (or similar).
Familiarity with Python.
Familiarity with Agile methodology and DevOps best practices.
Strong problem-solving skills and attention to detail.

Site Reliability Engineer (SRE)

About the position

Responsibilities

Requirements

Tools

Career Hubs

Guides

Company