LexisNexis Risk Data Management - Alpharetta, GA

posted 4 months ago

Full-time - Mid Level
Remote - Alpharetta, GA
10,001+ employees

About the position

LexisNexis Risk Solutions is seeking a Senior Site Reliability Engineer II to join our global engineering team. This position is open to applicants based in Alpharetta, GA, or those who prefer a fully remote work environment within the United States. The successful candidate will play a crucial role in shaping the operations and support for critical applications, customers, and projects. This role requires collaboration with Development, QA, IT Operations, and Customer Operations teams, emphasizing effective communication and problem-solving skills in a fast-paced working environment. The position also involves a transition to DevOps practices, agile support, and deployment processes, making it essential for the candidate to be adaptable and forward-thinking. As a Senior Site Reliability Engineer II, you will be responsible for leading complex reliability and toil reduction projects. This advanced professional role requires a deep understanding of system and application code, enabling you to make data-driven recommendations that balance customer, development, and operational needs. You will act as a subject matter expert, providing guidance to product and development teams to enhance reliability within a product group. Additionally, you will be involved in training and mentoring junior staff, ensuring a culture of learning and continuous improvement within the team. The role includes an on-call rotation for off-peak hours to maintain 24/7, 365 system availability, highlighting the importance of reliability in our operations. You will be expected to recommend service level objectives in partnership with product and development teams, master observability tools and techniques, and act as an escalation point during incidents. Your contributions will be vital in delivering resilient application stacks through Infrastructure as Code and other DevOps practices, as well as monitoring and supporting critical, high-revenue business applications.

Responsibilities

  • Recommend service level objectives in partnership with Product and Dev teams
  • Master observability tools and techniques
  • Act as an escalation during incidents
  • Collaborate with development teams to troubleshoot systems and application performance issues
  • Improve the SRE framework
  • Champion shared services and platforms to drive reliability
  • Create disaster recovery plans including advanced fault injection
  • Advise on SRE training curriculum and content
  • Deliver resilient application stacks via 'Infrastructure as Code' and other DevOps practices
  • Monitor and provide ongoing support for critical, high-revenue business applications
  • Diagnose and resolve complex system and application issues
  • Work with diverse technical and non-technical teams, including Development, QA, IT Operations, Customer Operations, and Project Management teams
  • Write and maintain systems/application documentation for technical and non-technical audiences
  • Migrate existing applications to Cloud environments

Requirements

  • Professional experience of working within the public cloud - AWS, Azure
  • Use of orchestration tools such as Terraform, CloudFormation
  • Experience with Continuous Integration/Delivery Tools such as GitLab, GitHub, Jenkins
  • Coding and scripting experience such as PowerShell, Bash, Python or equivalent
  • Configuration management tools such as Ansible, Puppet, Chef, or equivalents
  • Hands-on experience with Windows and Linux servers, including support and troubleshooting
  • Previous analytic and troubleshooting experience
  • Cloud architecture and system design to solve key business problems and facilitate team goals
  • Experience migrating applications from on-premises to public cloud

Nice-to-haves

  • Experience working with containerized workloads such as Docker and Kubernetes
  • System and application monitoring tools such as Prometheus, Grafana, CloudWatch
  • Familiarity with Log Management tools such as Elastic Stack, Graylog or Splunk
  • Experience working with relational databases such as MySQL, MS SQL Server or similar
  • Use of Secret Management services such as Hashicorp Vault
  • Knowledge of change control and associated procedures
  • Hands-on experience performing application static/dynamic security and penetration assessments with tools such as SonarQube, CheckMarx, AppScan, BurpSuite, OWASP ZAP Proxy, WebInspect, Fortify, Veracode, Nessus

Benefits

  • Flexible work environment (remote options)
  • Investment in staff development
  • Collaborative and friendly work culture
  • Opportunities for personal and career development
  • Support for women in technology initiatives
  • Diversity and inclusion programs
  • Health and wellness programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service