E-Solutions Group - Seattle, WA

posted 2 months ago

Full-time - Senior
Seattle, WA
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

The Senior Site Reliability Engineer (SRE) role is focused on ensuring the health and performance of production systems within a cloud-based environment. The position requires a strong technical background in Linux, microservices, and NoSQL databases, along with excellent troubleshooting skills. The SRE will be responsible for monitoring production systems, creating dashboards, configuring alerts, and leading troubleshooting efforts. This role involves collaboration with cross-functional teams to implement scalable solutions and improve system reliability.

Responsibilities

  • Responsible for health of production system
  • Develop monitoring dashboards
  • Configure alerts and automate process for system recovery
  • Monitor alerts and take proactive steps to resolve system issues
  • Troubleshoot production issues
  • Lead production troubleshooting calls
  • Responsible for patches and updates on production systems
  • Design and build cutting-edge, multi-micro service solutions to support Starbucks's growth worldwide
  • Work with cross-functional teams for ongoing design efforts and systems support
  • Automate password and certificate rotations on application and DB servers
  • Help CI/CD team during rolling out application and infrastructure globally
  • Collaborate with development team and other IT teams' developer leads
  • Initiate process improvements for new and existing systems
  • Coach and mentor other team members
  • Participate in a production support rotation that includes pager responsibilities
  • Break down complex application designs into component deliverables and estimate design and development timelines

Requirements

  • 10-12 years experience in the IT industry
  • 9+ years of software and DevOps development engineering
  • Experience in working with cloud environment Azure preferred
  • Experience with Kubernetes, Azure Kubernetes (AKS) preferred
  • Experience with using Kafka, Event Hub, NATS or any messaging broker
  • Experience with Cassandra, PostgresSQL, Mongo, Elastic Search, Cosmos DB
  • Experience on Azure DevOps, Jenkins, Python, Terraform, Ansible
  • Experience with Databricks
  • Experience with DataDog, Splunk or other logging and APM tools
  • Experience in working with Linux environment
  • In-depth understanding of Computer Science fundamentals in object-oriented design, data structures, algorithms, and problem solving
  • Experience building complex, scalable, high-performance software systems
  • Demonstrated knowledge of best practices for the design and implementation of large-scale systems
  • Experience building and operating mission critical, highly available (24x7) systems
  • Ability to work well with a team in a fast-paced agile development environment
  • Bachelors in Computer Science or equivalent work experience
  • Excellent communication, analytical and problem-solving skills
  • Extensive understanding in SDLC and scrum methodologies
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service