Wells Fargo - Irving, TX

posted 3 months ago

Full-time - Senior
Irving, TX
Credit Intermediation and Related Activities

About the position

We are looking for a Sr. Site Reliability Engineer who enjoys and thrives on solving complex problems through innovation impacting change at scale in a diverse environment. You will join a focused team of Application Support and SREs introducing and advancing SRE discipline across several hundred applications and multiple vertical lines of business supporting the entire firm. The team will drive technology transformation and adoption of SRE aligned enterprise capabilities and products, launch new tooling enablement, automate away complex issues and integrate with the latest technology. Site Reliability Engineers leverage their experience as software and systems engineers to ensure applications onboarded to SRE are available, have full stack observability, introduce continuous improvement through code and automation, provide operational insight through analytics, continuously test, are integrated with CI/D and work with application teams to ensure products and service we provide are always on. In this role, you will: Instantiate Site Reliability Engineering and AIOPs capabilities at Wells Fargo Enterprise Functions Technology (EFT) igniting the practice, principles, and culture leading by example. Assist in training skilled peer engineers by growing the practice within EFT and partnering with peer platform embedded SRE teams. Introduce and mature the adoption of enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, synthetic monitoring and chaos engineering. Evolve AIOPS, introducing self-healing and autonomic capabilities solving for complex operational and systemic issues with precision including, automating processes, leveraging Robotic Process Automation and AI/ML to improve availability of products we provide to customers. Automate key SRE metrics and IT Service Operations processes including customer impact, availability of critical business flows, SLO/SLI adherence, error budget, and reduce time to recovery. Share support responsibilities for critical applications and customer journeys including leading technical resolution of high priority incidents with cross-functional partners, remediation of issues, conducting of blameless post mortems, root cause analysis and introduce continuous improvement solving problems once and for all with the goal of no repeats. Closely collaborate with EFT application development teams and other peer organizations to influence and drive stability and SRE aligned capability. Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups. Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking.

Responsibilities

  • Instantiate Site Reliability Engineering and AIOPs capabilities at Wells Fargo Enterprise Functions Technology (EFT).
  • Assist in training skilled peer engineers and grow the practice within EFT.
  • Introduce and mature the adoption of enterprise capabilities, tools, and innovation to improve availability in a multi-cloud ecosystem.
  • Evolve AIOPS by introducing self-healing and autonomic capabilities to solve complex operational issues.
  • Automate key SRE metrics and IT Service Operations processes.
  • Share support responsibilities for critical applications and customer journeys.
  • Lead technical resolution of high priority incidents with cross-functional partners.
  • Conduct blameless post mortems and root cause analysis to introduce continuous improvement.
  • Collaborate closely with EFT application development teams to drive stability and SRE aligned capability.
  • Act as an advisor to leadership on applications, network, information security, and other technologies.
  • Lead the strategy and resolution of highly complex challenges across multiple areas of the enterprise.

Requirements

  • 10+ years of Engineering experience, or equivalent demonstrated through work experience, training, military experience, or education.
  • 7+ years of Java, C#, Python or other object-oriented software engineering experience.
  • 5+ years of experience performing engineering and support tasks on Linux/Unix and Windows Servers.
  • 3+ years of experience with Cloud technologies.
  • 3+ years of experience supporting enterprise-level complex applications and platforms in Production.
  • 5+ years of designing and building complex observability solutions leveraging industry-standard toolsets or custom-built solutions.
  • 5+ years working with configuration and monitoring technologies such as Ansible, Grafana, Elastic, Splunk, Prometheus.
  • Strong verbal, written, and interpersonal communication skills.

Nice-to-haves

  • A Masters degree or higher in computer science or engineering.
  • Experience with design, implementation and governance with Artificial Intelligence, Natural Language Processing or Machine Learning Architecture.
  • Experience with Agile Scrum and Kanban methodologies.

Benefits

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service