AT&T - Dallas, TX

posted about 1 month ago

Full-time - Senior
Dallas, TX
Telecommunications

About the position

The Principal System Engineer for Production Operations at AT&T is responsible for leading a team that ensures the high availability, reliability, and resiliency of customer and agent-facing applications across various platforms. This role involves providing 24x7 support, managing incidents, and overseeing operations related to eCommerce, Care, and Retail platforms built on microservices architecture. The position requires a strong focus on Site Reliability Engineering, incident management, and collaboration with various teams to optimize operational processes and enhance customer experiences.

Responsibilities

  • Provide 24x7 Tier 1 support for customer & agent facing applications across eCommerce, Care, & Retail platforms.
  • Manage escalated issues, incidents, and outages, ensuring prompt resolution.
  • Provide visibility and status updates on escalated issues to leadership and stakeholders.
  • Develop functional and technical knowledgebase of applications and create run books for operational procedures.
  • Oversee daily Tier 1 operations of premise and hosted applications, including data centers and monitoring.
  • Work with Release Management to identify risks related to production changes.
  • Collaborate with Product Development & Tier 2 SRE teams for knowledge transfer on system changes.
  • Optimize the on-call process and incident response workflow for the team.
  • Provide metrics and status reports to leadership and stakeholders.
  • Stay current on feature development and its impact on system reliability.
  • Develop and update Standard Operating Procedures and T1 documentation based on best practices.
  • Provide technical leadership and foster a culture of responsibility and accountability.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • 10+ years of leadership experience building cross-organizational consensus.
  • 10+ years of experience managing high-performing teams.
  • 10+ years of experience with Incident Management and managing Tier 1 Production Operations teams.
  • 10+ years of experience supporting large scale eCommerce, Care, & Retail POS platforms in a leadership capacity.
  • Solid understanding of Application Performance Monitoring tools like Dynatrace and AppDynamics.
  • Hands-on experience with Customer Experience Analytics tools like Quantum Metric or Tealeaf.
  • Experience with Synthetic Monitoring tools like Catchpoint.
  • Experience working within scaled agile development teams.
  • Experience developing customer journey dashboards for proactive monitoring.
  • Experience designing and managing technical operations organizations with 24x7 support.

Nice-to-haves

  • Salesforce Development (Apex, Visualforce, Lightning) experience.
  • Experience with Salesforce Sales Cloud & Service Cloud.
  • Experience with Marketing Cloud.
  • Experience in high tech, software, or wireless/telecom industries.
  • Understanding of integration technologies and API Gateway.

Benefits

  • Medical/Dental/Vision coverage
  • 401(k) plan
  • Tuition reimbursement program
  • Paid Time Off and Holidays (at least 23 days of vacation each year and 9 company-designated holidays)
  • Paid Parental Leave
  • Paid Caregiver Leave
  • Adoption Reimbursement
  • Disability Benefits (short term and long term)
  • Life and Accidental Death Insurance
  • Employee Assistance Programs (EAP)
  • Extensive employee wellness programs
  • Employee discounts up to 50% off on eligible AT&T mobility plans and accessories.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service