Sr Cloud SRE

$68,500 - $102,750/Yr

Deltek - Herndon, VA

posted 3 months ago

Full-time - Senior
Remote - Herndon, VA
Professional, Scientific, and Technical Services

About the position

Deltek is seeking a Senior SaaS Operations Engineer to join our Costpoint GCCM Service Operations team. This role is focused on monitoring cloud service availability, incident management, and supporting the day-to-day service operations of our US Restricted (FedRAMP) SaaS offerings. We are looking for customer-focused team members who are passionate about solving complex technical challenges and delivering a best-in-class SaaS experience. In this position, you will troubleshoot complex problems, provide software fault diagnosis, resolve operational issues, and address performance bottlenecks. You will collaborate with Global SRE, Product Delivery, Product Engineering, and Customer Care teams to ensure a seamless Cloud SaaS experience for our customers 24/7. Your responsibilities will include performing day-to-day product operations such as provisioning new customers, creating databases and schemas, database restores, configuring applications, patch management, and systems administration. To ensure consistent service availability, you will monitor the stability and performance of our environments using appropriate metrics and tooling. You will also be involved in incident and problem management, executing incident response plays, leading major incident bridges, and participating in the post-incident review process for incident prevention. Additionally, you will develop and manage automation to reduce manual processes and tasks, drive capacity planning by monitoring system resource utilization, and document system architectures and operational processes. Participation in maintenance activities and on-call rotations may be required, including nights and weekends, as well as executing disaster recovery plans and reporting on related metrics.

Responsibilities

  • Troubleshoot complex problems and provide software fault diagnosis.
  • Resolve operational issues and performance bottlenecks.
  • Collaborate with Global SRE, Product Delivery, Product Engineering, and Customer Care teams.
  • Perform day-to-day product operations like provisioning new customers and creating databases.
  • Configure applications, manage patching, and perform systems administration.
  • Monitor environments' stability and performance using appropriate metrics and tooling.
  • Execute incident response plays and lead major incident bridges.
  • Participate in post-incident review processes for incident prevention.
  • Develop and manage automation to reduce manual processes and tasks.
  • Drive capacity planning by monitoring system resource utilization, errors, and alerts trends.
  • Document system architectures, configurations, and operational processes.
  • Participate in maintenance activities and on-call rotations as required.
  • Execute disaster recovery plans and report on related metrics.

Requirements

  • Bachelor's degree in Computer Science or equivalent.
  • U.S. Citizenship Required.
  • 4+ years supporting enterprise application platforms and systems at scale on public cloud infrastructure (Amazon Web Services is desired).
  • 4+ years of experience managing and operating enterprise-grade Windows or Linux production environments.
  • 3+ years of experience applying an automation-first approach to problem solving using configuration management tools and scripting (e.g., Bash, Python, PowerShell).
  • Experience with Incident Management and ITIL service operations (ServiceNow experience desired).
  • Experience with database administration management tasks in Oracle (preferred) or Microsoft SQL Server.
  • Experience with monitoring platforms like AppDynamics, Splunk, PRTG, SolarWinds DPA, Nagios, Relic, PagerDuty.
  • Detail-oriented, results-driven, and possess excellent English communication skills.
  • Ability to work effectively in a team environment to accomplish goals and resolve problems.

Nice-to-haves

  • Experience with Deltek Costpoint, Time & Expense, Budget & Planning, and Enterprise Reporting.

Benefits

  • Health insurance coverage
  • 401(k) plan with company match
  • Paid vacation time and holidays
  • Short-term and long-term disability coverage
  • Basic life insurance
  • Tuition reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service