Site Reliability Engineer III

$135,200 - $145,600/Yr

The Judge Group - Newport Beach, CA

posted about 2 months ago

Full-time - Senior
Newport Beach, CA
Administrative and Support Services

About the position

The Lead Site Reliability Engineer (SRE) will provide technical leadership and accountability for platform engineering, system design, and implementation. The role focuses on meeting non-functional requirements such as quality, security, reliability, availability, and performance, while optimizing design and engineering for new systems and enhancements. The engineer will oversee production operations and develop solutions to enhance system reliability and automation.

Responsibilities

  • Lead the design, build, and implementation of orchestration and tooling solutions for efficient administration tasks.
  • Establish best practices for structuring, automating, building, deploying, and monitoring complex distributed software products.
  • Ensure reliability and traceability of software releases and deployments of software and infrastructure changes.
  • Create and maintain platform architecture and design specifications for software environments.
  • Design and implement monitoring and recovery tools for high availability and disaster recovery.
  • Develop highly available infrastructure and platform components for product lines.
  • Implement security engineering best practices across all platforms and environments.
  • Triage alerts, diagnose and resolve critical issues, and manage change implementations.
  • Coordinate, document, and track critical incidents and root cause analysis for issue resolution.
  • Collaborate with Delivery Engineers and DevExp Engineers to enhance CI/CD orchestration systems.
  • Lead, mentor, and grow other SRE team members.
  • Promote the DevSecOps culture and SRE mindset, mentoring on reliability and best practices.
  • Identify and implement opportunities for automation and prevention of recurring issues.
  • Maintain a strong understanding of IaaS, PaaS, and SaaS offerings for cloud-based environments.
  • Design and implement processes and automation for performance testing.
  • Ensure implementations and solutions are documented and operationalized.

Requirements

  • 10-15 years of experience in infrastructure, system engineering, or software engineering.
  • Advanced knowledge in software engineering and testing automation frameworks.
  • Expertise in at least three areas: Cloud-native architecture, cloud engineering, and container orchestration solutions.
  • Strong understanding of business technology drivers and their impact on architecture design.
  • Advanced knowledge of observability engineering with hands-on experience in monitoring platforms.
  • Systematic problem-solving approach and strong communication skills.
  • Hands-on experience in designing, analyzing, scaling, and troubleshooting distributed systems.
  • Proficiency with SRE methodologies and passion for solving operational problems through automation.
  • Ability to communicate technical strategy effectively across the organization.
  • Demonstrated ability to deliver multiple engineering projects on time and within budget.

Nice-to-haves

  • Subject matter expert in designing and supporting major public cloud providers (AWS preferred).
  • Expertise in microservices lifecycle management.
  • Strong experience with logging and monitoring tools such as ELK stack and Prometheus.
  • Expert knowledge of release software tooling like Jenkins and Azure DevOps.
  • Expert-level knowledge of containerization technologies and managing Docker image lifecycles.
  • Advanced experience with Kubernetes or other orchestration solutions.
  • Extensive experience with Linux/Unix/Windows OS.

Benefits

  • Competitive hourly salary ranging from $65.00 to $70.00 USD.
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service