Trimble - Dayton, OH
posted 2 months ago
As a Senior DevOps Engineer/Senior Site Reliability Engineer at Trimble e-Builder, you will play a crucial role in our AECO (Architecture, Engineering, Construction, and Owner) Software Solutions segment. This position offers the flexibility of hybrid or remote work within the USA. You will collaborate with a dedicated team to plan, design, and deploy cloud technologies, primarily focusing on AWS. Your responsibilities will include developing, modifying, supporting, and maintaining AWS-based components through Infrastructure as Code and automation. You will also design and implement cost control strategies to optimize our cloud expenditures. In this role, you will enhance availability and incident management by implementing self-healing solutions based on alerts. Your proactive approach will be essential in continuously improving our monitoring and alerting capabilities, allowing us to address issues before they escalate. You will support day-to-day operations by measuring, monitoring, and troubleshooting various systems, and participate in an on-call rotation with a focus on automation and improvement. You will be responsible for designing and maintaining custom monitoring dashboards for DEV/OPS/Support, creating and maintaining Cloud Operations processes and procedures, and enhancing our fault tolerance and high availability strategies. Your collaboration with product development teams will be vital in engineering creative solutions to complex challenges. Additionally, you will create processes and train engineers on common cloud administration tasks, ensuring that knowledge is shared and best practices are followed. Your leadership skills will be put to the test as you communicate effectively with customers, vendors, and partners across all levels of the organization. You will explain issues and present clear strategies around automation and cloud deployments, leading team and sector initiatives in infrastructure and server management. Your goals will include meeting and achieving Key Performance Indicators (KPIs), Service Level Agreements (SLAs), and Operating Level Agreements (OLAs), while maintaining high levels of system uptime and increasing the percentage of monitoring-detected service disruptions.