Wells Fargo - Irving, TX
posted 3 months ago
We are looking for a Sr. Site Reliability Engineer who enjoys and thrives on solving complex problems through innovation impacting change at scale in a diverse environment. You will join a focused team of Application Support and SREs introducing and advancing SRE discipline across several hundred applications and multiple vertical lines of business supporting the entire firm. The team will drive technology transformation and adoption of SRE aligned enterprise capabilities and products, launch new tooling enablement, automate away complex issues and integrate with the latest technology. Site Reliability Engineers leverage their experience as software and systems engineers to ensure applications onboarded to SRE are available, have full stack observability, introduce continuous improvement through code and automation, provide operational insight through analytics, continuously test, are integrated with CI/D and work with application teams to ensure products and service we provide are always on. In this role, you will: Instantiate Site Reliability Engineering and AIOPs capabilities at Wells Fargo Enterprise Functions Technology (EFT) igniting the practice, principles, and culture leading by example. Assist in training skilled peer engineers by growing the practice within EFT and partnering with peer platform embedded SRE teams. Introduce and mature the adoption of enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, synthetic monitoring and chaos engineering. Evolve AIOPS, introducing self-healing and autonomic capabilities solving for complex operational and systemic issues with precision including, automating processes, leveraging Robotic Process Automation and AI/ML to improve availability of products we provide to customers. Automate key SRE metrics and IT Service Operations processes including customer impact, availability of critical business flows, SLO/SLI adherence, error budget, and reduce time to recovery. Share support responsibilities for critical applications and customer journeys including leading technical resolution of high priority incidents with cross-functional partners, remediation of issues, conducting of blameless post mortems, root cause analysis and introduce continuous improvement solving problems once and for all with the goal of no repeats. Closely collaborate with EFT application development teams and other peer organizations to influence and drive stability and SRE aligned capability. Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups. Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking.