Geico - Chevy Chase, MD

posted about 2 months ago

Full-time - Senior
Chevy Chase, MD
Insurance Carriers and Related Activities

About the position

As a Senior Reliability Engineer, you will be instrumental in ensuring the robustness, availability, and performance of our Data Engineering and Machine Learning Platforms. This role involves close collaboration with cross-functional teams to enhance platform reliability and resilience, driving the transformation of our insurance business into a tech-focused organization committed to engineering excellence and continuous improvement.

Responsibilities

  • Design, develop, and implement software solutions that enhance the reliability and fault tolerance of our modern data and machine learning platforms.
  • Collaborate with software engineers to create robust, scalable, and efficient platforms.
  • Proactively identify and address potential reliability bottlenecks and performance issues.
  • Develop and maintain automated processes for deployment, scaling, and maintenance of platforms.
  • Build effective monitoring systems to detect anomalies, performance degradation, and capacity issues.
  • Implement proactive measures to prevent incidents.
  • Participate in on-call rotations to respond to incidents promptly.
  • Investigate and resolve storage-related incidents, ensuring minimal impact on services.
  • Conduct post-incident reviews to learn from incidents and improve system reliability.
  • Collaborate with infrastructure teams to plan for storage and compute capacity needs.
  • Scale storage systems efficiently to accommodate growing demands.
  • Optimize resource utilization while maintaining high availability.
  • Document processes, procedures, and best practices.
  • Share knowledge with colleagues to foster a culture of continuous improvement.
  • Mentor junior engineers.

Requirements

  • Bachelor's degree in computer science, Information Systems, or equivalent education or work experience.
  • Minimum of 5 years of experience in Data Engineering pipeline related roles.
  • Experience in Big Data ecosystem: ETL, tooling of Big Data Platform (Apache Spark, Airflow), Datalake, Synapse or Snowflake.
  • Experience in Machine Learning ecosystem: training models, inference, experimentation, and pipelines infrastructure.
  • Proficiency in modern on-prem object storage technologies (CEPH, MinIO) and its cloud equivalents (AWS S3, Azure Blob Storage, Google Cloud Storage).
  • Experience with infrastructure automation, tooling, and configuration management frameworks (e.g., Puppet, Chef, Ansible, Terraform, Pulumi, etc.).
  • Fluency in SQL and no-SQL.
  • Knowledge of CS data structures and algorithms.
  • Fluency and specialization with at least two modern languages such as Java, Python or Go, including object-oriented design.
  • Experience with Prometheus, Loki, and Grafana.
  • Experience with container orchestration platforms (Kubernetes, or Docker Swarm).
  • Experience with Linux and open source ecosystem.
  • Self-driven with an analytical, first principles approach.
  • Ability to take a complex challenge and deliver quality simple solutions.
  • Effective communication skills for cross-functional collaboration.

Benefits

  • Premier Medical, Dental and Vision Insurance with no waiting period
  • Paid Vacation, Sick and Parental Leave
  • 401(k) Plan
  • Tuition Reimbursement
  • Paid Training and Licensures
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service