CACI International - Huntsville, AL

posted 4 months ago

Full-time - Manager
Huntsville, AL
Professional, Scientific, and Technical Services

About the position

The Application and Platform Operations Center (APOC) Manager at NASA NCAPS is a pivotal role within CACI's Agile Digital Solutions Operating Group, responsible for leading application development and support for NASA's enterprise business systems. This position entails managing the Tier 1 Enterprise Service Desk (ESD) and Operations Centers, ensuring that teams are effectively improving enterprise transparency and facilitating cross-contract collaboration through ServiceNow. The APOC Manager will work in a fast-paced SAFe agile development environment, focusing on reducing operational and maintenance workloads across Agile Teams while enhancing service delivery and operational efficiency. In this role, the APOC Manager will oversee a team that provides 24x7x365 centralized performance, health, and capacity monitoring across multiple applications and platforms. Utilizing tools like Dynatrace, the manager will emphasize predictive analytics, proactive remediation, and self-healing capabilities. Responsibilities include incident triage, escalation to Tier 3 teams, and managing ticket resolution while monitoring incident SLAs. The manager will coordinate outages, troubleshoot issues across programs, and communicate effectively with end users and stakeholders. The position also requires mastering change and outage calendar management, collaborating with the Applications and Platforms Security Operations Center (APSOC) for security incident responses, and managing daily production operations support. The APOC Manager will ensure timely monitoring of services and interfaces, support critical processes outside of normal hours, and lead the development of standard operating procedures (SOPs) for incident management. Additionally, the manager will drive innovation through special projects and technical upgrades, monitor support ticket queues, and communicate metrics to senior customer stakeholders.

Responsibilities

  • Lead a team delivering 24x7x365 centralized performance, health, and capacity monitoring and event management across multiple applications and platforms using Dynatrace.
  • Incident triage, escalation to Tier 3 (Agile Product and Shared Services Teams), and ticket management through resolution; incident SLA monitoring; and incident resolution based on Agile Team-provided knowledge articles.
  • Outage coordination, including cross-program troubleshooting, outreach to on-call staff, vendor/OEM (Tier 4) escalation, and program communications to end users.
  • Problem investigations for any unplanned outages, recurring application performance issues proactively and reactively detected via monitoring tools, or at the request of the Government.
  • Master change and outage calendar management—providing transparency to NCAPS, NASA, and OCIO Enterprise Contractor stakeholders.
  • Collaborating with our Applications and Platforms Security Operations Center (APSOC), which centralizes our security incident response process, provides spillage management and sanitization expertise, and interfaces with other NASA activities.
  • Manage daily production operations support over infrastructure, systems, services, & external interfaces.
  • Lead internal IT, Product, and Shared Services teams responsible for resolving operational support issues.
  • Ensure daily monitoring is performed timely for services, jobs and interfaces for failures, irregularities, and performance issues.
  • Ensure team supports critical processes outside of normal hours to include nights, weekends and holidays.
  • Ensure Product and Operations Teams develop SOPs and processes for incident management and resolution.
  • Lead rapid response to quickly resolve issues and outages.
  • Drive innovation through special projects and technical upgrades.
  • Monitor support ticket queues to promote rapid response by all portfolio teams.
  • Communicate with senior customer stakeholders on reporting metrics (e.g., system data, error logs, & user reports).
  • Responsible for operations support metrics and Portfolio service desk ticket queues.
  • Frequently meet with Portfolio Owner, Service Delivery Manager and Hosting Facility POCs.

Requirements

  • Must be a U.S. citizen with an active Public Trust clearance (or the ability to obtain a Public Trust or higher, if needed).
  • Bachelor's degree in Computer Science, Information Management Systems, or related field. Experience considered in lieu of degree.
  • At least 10 years of relevant, related technical experience.
  • Excellent verbal and written communication skills.
  • Prior experience leading application operations and sustainment activities on a large, complex program.
  • Experience with a variety of technology stacks and platforms (.NET, Java, SQL, ColdFusion, etc.).

Nice-to-haves

  • Experience managing operations in an Agile environment.
  • Experience with SecDevOpps and Agile processes and/or tools.
  • Managed 50+ FTE's.
  • Managed a current operations organization that focuses on event and incident management.
  • Experience in monitoring tools with a focus on ITIL capabilities.

Benefits

  • Continuing education credits
  • Health insurance
  • Flexible time off
  • Comprehensive benefits including healthcare, wellness, financial, retirement, family support, continuing education, and time off benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service