Trimble - Dayton, OH

posted 2 months ago

Full-time - Senior
Dayton, OH
Professional, Scientific, and Technical Services

About the position

As a Senior DevOps Engineer/Senior Site Reliability Engineer at Trimble e-Builder, you will play a crucial role in our AECO (Architecture, Engineering, Construction, and Owner) Software Solutions segment. This position offers the flexibility of hybrid or remote work within the USA. You will collaborate with a dedicated team to plan, design, and deploy cloud technologies, primarily focusing on AWS. Your responsibilities will include developing, modifying, supporting, and maintaining AWS-based components through Infrastructure as Code and automation. You will also design and implement cost control strategies to optimize our cloud expenditures. In this role, you will enhance availability and incident management by implementing self-healing solutions based on alerts. Your proactive approach will be essential in continuously improving our monitoring and alerting capabilities, allowing us to address issues before they escalate. You will support day-to-day operations by measuring, monitoring, and troubleshooting various systems, and participate in an on-call rotation with a focus on automation and improvement. You will be responsible for designing and maintaining custom monitoring dashboards for DEV/OPS/Support, creating and maintaining Cloud Operations processes and procedures, and enhancing our fault tolerance and high availability strategies. Your collaboration with product development teams will be vital in engineering creative solutions to complex challenges. Additionally, you will create processes and train engineers on common cloud administration tasks, ensuring that knowledge is shared and best practices are followed. Your leadership skills will be put to the test as you communicate effectively with customers, vendors, and partners across all levels of the organization. You will explain issues and present clear strategies around automation and cloud deployments, leading team and sector initiatives in infrastructure and server management. Your goals will include meeting and achieving Key Performance Indicators (KPIs), Service Level Agreements (SLAs), and Operating Level Agreements (OLAs), while maintaining high levels of system uptime and increasing the percentage of monitoring-detected service disruptions.

Responsibilities

  • Work with team to plan, design and deploy cloud technologies
  • Develop, modify, support and maintain AWS based components through Infrastructure as Code and automation
  • Design and implement cost control strategies
  • Enhance availability and incident management by implementing self-healing solutions based on alerts
  • Continuously improve monitoring and alerting capabilities
  • Support day-to-day operations, measuring, monitoring, and troubleshooting
  • Participate in on-call rotation with a mindset of automating and improving
  • Design and maintain custom monitoring dashboards for DEV/OPS/Support
  • Create and maintain Cloud Operations processes and procedures
  • Enhance fault tolerance and high availability strategy
  • Enhance cloud elasticity through automatic provisioning and destruction of services based on demand
  • Collaborate with product development teams to engineer creative solutions or solve complex challenges
  • Create processes and train engineers on common cloud administration tasks

Requirements

  • 5+ years of experience working within AWS
  • 5+ years of experience with monitoring solutions (Newrelic, PagerDuty, AWS)
  • 7+ years of experience in the IT field
  • 7+ years of experience supporting Windows
  • 7+ years of experience supporting Linux OS (CentOS, Amazon Linux, RHEL)
  • Familiarity with container technologies, like Docker, Kubernetes, ECS, EKS
  • Scripting experience, preferably in PowerShell or Bash
  • Experience with ticketing systems (Jira preferred)
  • Knowledge of storage technologies (SAN, Cloud, Enterprise)
  • Experience working in a virtualized environment (VMWare preferred)
  • Strong problem-solving and troubleshooting skills
  • Familiarity with continuous deployment methodology and other common DevOps tools including Git, Jenkins
  • Proficient with configuration management and provisioning tools (preferably Ansible and Terraform)
  • Knowledge in networking technologies & Cloud specific Network assets
  • Ability and flexibility to be on-call for escalations and support, migration and deployments

Nice-to-haves

  • Experience or familiarity with Security Certifications such as PCI, SOC2, ISO 27001, FISMA/FedRAMP and HIPAA
  • Any AWS certifications
  • Familiarity with ITIL
  • Database experience (Oracle, PostgreSQL)
  • Application Support (Tomcat, Java)
  • Experience with Active Directory or other domain management
  • Azure exposure
  • Exposure to SRE

Benefits

  • Competitive salary
  • Health insurance
  • 401k plan with matching contributions
  • Flexible work hours
  • Professional development opportunities
  • Paid time off and holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service