This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Fidelity Investmentsposted about 2 months ago
Full-time • Senior
Westlake, TX
Resume Match Score

About the position

The position involves building and operating highly resilient platforms in AWS cloud environments. The role requires coordination of systems using Infrastructure as Code tools such as IAM, ARM, Terraform, and Chef. The candidate will perform reliability engineering throughout the entire Software Development Lifecycle (SDLC) using programming languages like Python, NodeJS, or Java. Responsibilities include deploying and supporting distributed multi-tiered application systems using Kubernetes and Continuous Integration/Continuous Deployment (CI/CD) pipelines. The role also involves creating dashboards to capture latency, availability, error, and saturation performance of applications using tools like Splunk, Grafana, Prometheus, Catchpoint, and Datadog. Additionally, the candidate will create Service-Level Indicator/Service-Level Objective (SLI/SLO) dashboards and automated processes for updates and new dashboard creation. The position requires identifying and resolving application issues using DataDog, Prometheus, and Splunk, as well as creating, maintaining, and tuning monitors using ELK, OpenSearch, and OpenTelemetry. The candidate will support applications hosted in AWS Cloud and Kubernetes, and build, deploy, automate, and support application services across multiple technology platforms, frameworks, and languages.

Responsibilities

  • Provides automated solutions for business and technology operational activities and manual tasks.
  • Analyzes the observability, resiliency, availability, and performance of applications.
  • Triages, deep dives, and executes root cause analysis.
  • Provides resolution of business and system issues through enhancement initiatives.
  • Resolves issues during critical outages to avoid negative business impact.
  • Contributes to product architectural solutions addressing high impact system issues.
  • Deploys and supports distributed multi-tiered application systems.
  • Manages the scalability and resiliency of applications.
  • Ensures daily business operations are not impacted by system issues.
  • Consults across the enterprise to plan for and implement enhancements to systems.
  • Establishes end-to-end flow of application systems to quickly identify and resolve critical business issues.
  • Tests the resiliency of application systems using Chaos Engineering techniques.
  • Mentors junior team members.

Requirements

  • Bachelor’s degree in Computer Information Systems, Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and five (5) years of experience as a Principal Site Reliability Engineer.
  • Or alternatively, a Master’s degree in a related field and three (3) years of experience as a Principal Site Reliability Engineer.
  • Demonstrated expertise in performing site reliability engineering to analyze observability, resiliency, availability, instrumentation, and performance of distributed applications.
  • Experience creating dashboards and monitors using Splunk, Grafana, Prometheus, Catchpoint, Telemetry, and Datadog.
  • Experience developing Kubernetes platforms and automations in public and private Cloud (RKS, EKS, AKS) using Python, Shell Scripting, GIT, Docker, and Kubernetes.
  • Experience automating business and technology operational activities using Jenkins Core, uDeploy, RunDeck, Ansible, and AWX.
  • Experience performing triage and root cause analysis in multi-tiered fund accounting application systems.

Benefits

  • Hybrid working model that blends onsite and offsite work experiences.
  • Commitment to creating and nurturing a diverse and inclusive workplace.
  • Reasonable accommodations for applicants with disabilities.

Job Keywords

Hard Skills
  • Ansible
  • Datadog
  • Kubernetes
  • Prometheus
  • Python
  • 3Qpwx8 oy2v5gSZ9XKD
  • 3wj9 JziPuoHUT
  • 4uhJPnKF
  • 4xwMbS2QItk nwRW8ztV
  • 5JrqD
  • 5ye1XDxh oR5ldi
  • bq90tliuS mVQvgnTl0eotH
  • dUqbMLa
  • e1lZBP4
  • Fd6RJ20s
  • G2RAuSBYh0
  • gCAoc UDTtv16jHhq
  • gLX0H tFsDdc QcHXIM5WU
  • hDakHwJFGICs uHhaEIXwx8vC
  • i3ejSDXhOv0
  • is4LYQoKbDrg tsmUPIEJgjG
  • isSg4q73B Ek42xI5odnFA
  • IvT0YAkJ
  • JB8Lm
  • jetZDUklgsdpqw8 XFI Ri1fv
  • KjXP6Ezun sMpOHomxC
  • l6SCKvU
  • mzStPGAra kjOTmJP4e0X
  • nmp0KCOBy3 NVhz Pk26NLzpav
  • ogSLRGy
  • Oi1C470Sh qAJG5tkHycN
  • Trkv
  • Uk9c3K05hfVb XuGymoKN
  • uLfH3ZRonBE1 9tPG8VX7izYI
  • vClxDga9
  • wU1yj JeFiMW TiCz7WhbY
  • wuEim tx2yD3s
  • wY84fKsi
  • xU4IWSPNX P3OxqGLU
  • yOv0Plft7DoB 1V9Qc8nNHKDe
  • YRsjIhbzK4i7 mIlAPWB7s
  • zkx8X1Zh0uc 0LzEZAMx
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service