This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Allegis Group - Newport Beach, CA

posted about 2 months ago

Full-time - Mid Level
Newport Beach, CA
10,001+ employees
Administrative and Support Services

About the position

As a Lead Site Reliability Engineer (SRE), you will provide technical leadership and accountability for platform engineering, system design, and implementation to meet product non-functional requirements such as quality, security, reliability, availability, and performance. This role involves optimizing design and engineering for new systems and enhancements, overseeing production operations, and developing solutions to enhance system reliability and automation.

Responsibilities

  • Lead the design, build, and implement orchestration and tooling solutions for efficient administration tasks.
  • Establish best practices for structuring, automating, building, deploying, and monitoring complex distributed software products.
  • Ensure reliability and traceability of software releases and deployments.
  • Create and maintain platform architecture and design specifications.
  • Design and implement monitoring and recovery tools for high availability and disaster recovery.
  • Develop highly available infrastructure and platform components for evolving product lines.
  • Implement security engineering best practices in deployed platforms.
  • Triage alerts and resolve critical issues, managing change implementations.
  • Coordinate documentation and tracking of critical incidents and root cause analysis.
  • Collaborate with Delivery Engineers and DevExp Engineers to enhance continuous integration/continuous deployment systems.
  • Mentor and grow other SRE team members.
  • Promote the DevSecOps culture and SRE mindset.
  • Identify opportunities for automation and signal to noise reduction.
  • Prevent recurring issues and reduce time to mitigate service-impacting events.
  • Maintain understanding of IaaS, PaaS, and SaaS offerings for cloud-based environments.
  • Design and implement processes and technology for performance testing.
  • Document implementations and ensure operational processes support the solution lifecycle.

Requirements

  • 10-15 years of experience in infrastructure, system engineering, or software engineering.
  • Advanced knowledge in software engineering in test and testing automation frameworks.
  • Advanced knowledge in at least 3 key areas: Cloud native and IaaS Architecture, Design, Cloud Engineering, and Containers orchestration solutions.
  • Strong understanding of business technology drivers and their impact on architecture design.
  • Advanced knowledge on Observability engineering with hands-on experience in monitoring platforms.
  • Systematic problem-solving approach with strong communication skills.
  • Hands-on experience in designing, analyzing, scaling, and troubleshooting distributed systems.
  • Well-versed with SRE methodologies and passionate about automation and software engineering.
  • Ability to communicate technical strategy effectively across the organization.
  • Demonstrated ability to launch and deliver multiple engineering projects on time.

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service