New York Life - Lebanon, NJ

posted 2 months ago

Full-time - Mid Level
Lebanon, NJ
5,001-10,000 employees
Insurance Carriers and Related Activities

About the position

The Senior Associate - AWS Cloud Data Platform Site Reliability Engineer (SRE) role at New York Life involves building and maintaining a core data, reporting, and analytics platform for the Insurance & Agency Group. The position focuses on ensuring the reliability, performance, and scalability of cloud-based data infrastructure using AWS services, while contributing to innovative initiatives that enhance the company's digital landscape.

Responsibilities

  • Develop and maintain monitoring, alerting, and logging systems to proactively detect and resolve incidents.
  • Perform root cause analysis and implement solutions to prevent recurrence.
  • Manage incident response, including on-call rotations, triaging, and escalation.
  • Create and manage Infrastructure as Code (IaC) using tools like Terraform.
  • Automate deployments, scaling, backups, and disaster recovery processes.
  • Develop and maintain CI/CD pipelines to ensure smooth deployment and rollback processes.
  • Analyze performance metrics and optimize infrastructure and application performance.
  • Define and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Conduct capacity planning and scaling to manage anticipated loads.
  • Implement security best practices, including network security, IAM policies, and encryption.
  • Conduct security audits and compliance checks to ensure regulatory adherence.
  • Respond to security incidents and implement remediation measures.
  • Work with development teams to ensure services are reliable, scalable, and easily monitored.
  • Collaborate with cross-functional teams to design, build, and maintain cloud infrastructure.
  • Identify and implement improvements to operational processes and workflows.
  • Design, implement, and test disaster recovery and business continuity plans.
  • Ensure regular backups and replication to minimize data loss and downtime.

Requirements

  • 3+ years of experience as a Cloud Site Reliability Engineer.
  • 1+ years of experience with AWS services (AWS S3, EC2, Glue, Redshift, RDS) in shared service or hybrid environments.
  • Proficiency in AWS services (EC2, S3, RDS, Lambda, VPC, CloudWatch, IAM, etc.).
  • Strong knowledge of scripting languages (Python, Bash, etc.) and automation tools (Terraform).
  • Experience with CI/CD tools and DevOps practices.
  • Familiarity with monitoring and logging tools.
  • Strong troubleshooting and problem-solving skills, with exposure to Machine Learning (ML) and Artificial Intelligence (AI) fields.
  • Exposure to industry-standard Data Governance processes and procedures.
  • Bachelor's degree in Computer Engineering, Computer Science, MIS, or a related field is preferred but not required.

Benefits

  • Leave programs
  • Adoption assistance
  • Student loan repayment programs
  • Comprehensive benefit options
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service