Senior Engineer - Data/ML Platforms SRE

$82,000 - $185,000/Yr

Geico - Chevy Chase, MD

posted about 2 months ago

Full-time - Senior

Chevy Chase, MD

Insurance Carriers and Related Activities

About the position

As a Senior Reliability Engineer, you will be instrumental in ensuring the robustness, availability, and performance of our Data Engineering and Machine Learning Platforms. This role involves close collaboration with cross-functional teams to enhance platform reliability and resilience, driving the transformation of our insurance business into a tech-focused organization committed to engineering excellence and continuous improvement.

Responsibilities

Design, develop, and implement software solutions that enhance the reliability and fault tolerance of our modern data and machine learning platforms.
Collaborate with software engineers to create robust, scalable, and efficient platforms.
Proactively identify and address potential reliability bottlenecks and performance issues.
Develop and maintain automated processes for deployment, scaling, and maintenance of platforms.
Build effective monitoring systems to detect anomalies, performance degradation, and capacity issues.
Implement proactive measures to prevent incidents.
Participate in on-call rotations to respond to incidents promptly.
Investigate and resolve storage-related incidents, ensuring minimal impact on services.
Conduct post-incident reviews to learn from incidents and improve system reliability.
Collaborate with infrastructure teams to plan for storage and compute capacity needs.
Scale storage systems efficiently to accommodate growing demands.
Optimize resource utilization while maintaining high availability.
Document processes, procedures, and best practices.
Share knowledge with colleagues to foster a culture of continuous improvement.
Mentor junior engineers.

Requirements

Bachelor's degree in computer science, Information Systems, or equivalent education or work experience.
Minimum of 5 years of experience in Data Engineering pipeline related roles.
Experience in Big Data ecosystem: ETL, tooling of Big Data Platform (Apache Spark, Airflow), Datalake, Synapse or Snowflake.
Experience in Machine Learning ecosystem: training models, inference, experimentation, and pipelines infrastructure.
Proficiency in modern on-prem object storage technologies (CEPH, MinIO) and its cloud equivalents (AWS S3, Azure Blob Storage, Google Cloud Storage).
Experience with infrastructure automation, tooling, and configuration management frameworks (e.g., Puppet, Chef, Ansible, Terraform, Pulumi, etc.).
Fluency in SQL and no-SQL.
Knowledge of CS data structures and algorithms.
Fluency and specialization with at least two modern languages such as Java, Python or Go, including object-oriented design.
Experience with Prometheus, Loki, and Grafana.
Experience with container orchestration platforms (Kubernetes, or Docker Swarm).
Experience with Linux and open source ecosystem.
Self-driven with an analytical, first principles approach.
Ability to take a complex challenge and deliver quality simple solutions.
Effective communication skills for cross-functional collaboration.

Benefits

Premier Medical, Dental and Vision Insurance with no waiting period
Paid Vacation, Sick and Parental Leave
401(k) Plan
Tuition Reimbursement
Paid Training and Licensures

Senior Engineer - Data/ML Platforms SRE

About the position

Responsibilities

Requirements

Benefits

Tools

Career Hubs

Guides

Company