Kforce - Los Angeles, CA

posted 3 months ago

Full-time
Los Angeles, CA
Administrative and Support Services

About the position

Kforce is immediately adding a full-time Datacenter Site Reliability Engineer in support of our industry-leading technology development client in Los Angeles, CA. Our client is seeking candidates who are driven to make positive changes to the way people live and work by creating breakthrough technology solutions. The role involves a variety of responsibilities centered around ensuring the reliability and performance of data center operations. This includes data monitoring and alerting, data quality assurance, and anomaly detection. The engineer will be responsible for documenting team processes and policies, including methods of engagement and Service Level Objectives (SLOs). In this position, the engineer will analyze, design, and implement solutions at the system level to remove bottlenecks and improve edge service performance. Implementing monitoring and alerting systems to enhance issue detection and response is a key part of the role. The engineer will work in a fast-paced environment and participate in technical operations and rotations in response to performance and reliability issues. Additionally, participation in on-call rotations is required, where the engineer will be responsible for resolving or escalating incoming events. A strong understanding of maintaining and operating a Linux and Kubernetes environment is essential for success in this role.

Responsibilities

  • Data monitoring and alerting, data quality assurance and anomaly detection
  • Document team processes and policies, including methods of engagement and SLOs
  • Analyze, design, and implement solutions at the system level to remove bottlenecks and improve edge service performance
  • Implement monitoring and alerting to improve issue detection and response
  • Work in a fast-paced environment; Participate in technical operations and rotations in response to performance and reliability issues
  • Participate in on-call rotations, responsible for resolving or escalating incoming events
  • Maintain and operate a Linux and Kubernetes environment

Requirements

  • Bachelor's degree or above, majoring in Computer Science or related fields, with at least 2 years of related work experience
  • 3+ years of experience working with Unix Linux systems from kernel to shell and beyond
  • Experience working with system libraries, file systems, and client-server protocols
  • Experience reading python scripts for platform operations
  • Experience in networking technologies such as TCP/IP, BGP, DNS, etc. in a carrier-grade environment
  • Experience in developing and operating one or more of the following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.

Benefits

  • Medical insurance
  • Dental insurance
  • Vision insurance
  • Health Savings Account (HSA)
  • Flexible Spending Account (FSA)
  • 401(k)
  • Life insurance
  • Disability insurance
  • Accidental Death and Dismemberment (ADD) insurance
  • Paid time off for salaried personnel
  • Paid sick leave for hourly employees on a Service Contract Act project
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service