CCC Intelligent Solutions - Chicago, IL

posted 3 months ago

Full-time - Senior
Chicago, IL
Professional, Scientific, and Technical Services

About the position

As an Azure-Based Site Reliability Engineer (SRE) at CCC Intelligent Solutions, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based applications and services hosted primarily on Microsoft Azure. This position is designed for individuals who are passionate about cloud technologies and have a strong background in site reliability engineering. You will collaborate closely with development teams to design, build, and maintain the observability and alerting components of our services. Your experience with Azure services and multi-tenant SQL-based applications will be instrumental in optimizing our cloud architecture and driving continuous improvement in our systems. In this role, you will help build an SRE culture by sharing best practices, approaches, documentation, and code with other engineering teams across the organization. You will be responsible for designing, implementing, and managing the alerting and monitoring strategy for Azure-based services. Monitoring system performance and reliability will be a key part of your responsibilities, as you will implement monitoring solutions and alerts to ensure proactive responses to potential issues. You will also collaborate with development teams to optimize Azure-based applications based on our observability strategy. Your approach to operational issues will be rooted in a software development mindset, utilizing defined feedback loops within the software delivery lifecycle. You will perform root cause analysis for incidents and implement preventive measures to minimize future disruptions. Staying updated with Azure technologies and best practices will be essential, as you will recommend and implement improvements to enhance application and system performance and efficiency. Additionally, you will participate in on-call rotations and respond to incidents as needed, ensuring timely resolution and communication. Coaching other team members to ensure systems are supported by following SRE best practices will also be part of your role.

Responsibilities

  • Help build an SRE culture by sharing best practices, approaches, documentation, and code with other engineering teams across the organization
  • Design, implement, and manage the alerting and monitoring strategy for Azure based services
  • Monitor system performance and reliability by implementing monitoring solutions and alerts to ensure proactive response to potential issues
  • Collaborate with development teams to optimize Azure based applications based off of our observability strategy
  • Approach operational issues/problems with a software development mindset through defined feedback loops within the software delivery lifecycle
  • Perform root cause analysis for incidents and implement preventive measures to minimize future disruptions
  • Stay updated with Azure technologies and best practices; recommend and implement improvements to enhance application / system performance and efficiency
  • Participate in on-call rotation(s) and respond to incidents as needed, ensuring timely resolution and communication
  • Participate in product engineering stand-ups and related design activities
  • Coach other team members to ensure systems are supported by following SRE best practices

Requirements

  • Proven experience as a Site Reliability Engineer (SRE) or similar role, with a strong focus on Azure cloud services
  • Proficiency in scripting and automation using C#, PowerShell, Python, or similar languages
  • Strong knowledge and proven experience in alerting and monitoring in an Azure based application
  • Excellent problem-solving skills with a proactive approach to identifying and resolving issues
  • Experience writing and modifying SQL queries and generating reports
  • Ability to work independently and collaboratively with Development team(s) in a fast-paced environment with a focus on continuous improvement
  • Ability to document solutions, SRE architectural patterns, and best practices to ensure that teams have guidance as needed
  • Proven ability to dig through metrics, logs, and available sources to triage and resolve an incident at any time
  • Azure certifications such as Azure Administrator Associate or Azure Solutions Architect

Nice-to-haves

  • Experience functioning as an SRE in maintaining reliability of the applications and infrastructure
  • Proficient in infrastructure as code practices
  • Experience building CI/CD pipelines from scratch
  • Able to troubleshoot complicated, cross-platform issues by handling OS, Networking, Database, and applications in cloud-based and on-premises environments

Benefits

  • 401K Match
  • Paid time off
  • Annual Incentive Plan Performance Bonus
  • Comprehensive health insurance
  • Adoption Assistance
  • Tuition Reimbursement
  • Wellness Programs
  • Stock Purchase Plan options
  • Employee Resource Groups
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service