Site Reliability Engineer

$135,200 - $141,440/Yr

Akkodis - Los Angeles, CA

posted 3 months ago

Full-time - Mid Level

Onsite - Los Angeles, CA

Administrative and Support Services

About the position

Akkodis is seeking a Site Reliability Engineer for a contract position with a client located in Los Angeles, California. This role is crucial for ensuring the reliability and performance of systems, focusing on data monitoring, alerting, and quality assurance. The engineer will be responsible for analyzing, designing, and implementing solutions to enhance system performance and remove bottlenecks. The position requires participation in technical operations and on-call rotations to address performance and reliability issues, making it essential for the candidate to thrive in a fast-paced environment. The Site Reliability Engineer will document team processes and policies, including methods of engagement and Service Level Objectives (SLOs). They will also implement monitoring and alerting systems to improve issue detection and response. The role involves maintaining and operating a Linux and Kubernetes environment, which is critical for the infrastructure's stability and efficiency. The engineer will work closely with various technologies and systems, ensuring that the services provided meet the highest standards of reliability and performance.

Responsibilities

Data monitoring and alerting, data quality assurance, and anomaly detection.
Document team processes and policies, including methods of engagement and SLOs.
Analyze, design, and implement solutions at the system level to remove bottlenecks and improve edge service performance.
Implement monitoring and alerting to improve issue detection and response.
Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
Participate in on-call rotations, responsible for resolving or escalating incoming events.
Maintain and operate a Linux and Kubernetes environment.

Requirements

3+ years of experience working with Unix Linux systems from kernel to shell and beyond with experience working with system libraries, file systems, and client-server protocols.
2+ years of experience coding Python scripts for platform operations.
Experience in networking technologies such as TCP/IP, BGP, DNS, etc. in a carrier-grade environment.
Experience in developing and operating one or more of the following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.
Bachelor's degree or above, majoring in Computer Science or related fields, with at least 2 years of related work experience.

Benefits

Medical insurance
Dental insurance
Vision insurance
Life insurance
Short-term disability
Additional voluntary benefits
EAP program
Commuter benefits
401K plan
Paid Sick Leave
Holiday pay where applicable

Site Reliability Engineer

About the position

Responsibilities

Requirements

Benefits

Tools

Career Hubs

Guides

Company