Senior DevOps Engineer/ Senior Site Reliability Engineer

Trimble - Dayton, OH

posted 2 months ago

Full-time - Senior

Dayton, OH

Professional, Scientific, and Technical Services

About the position

As a Senior DevOps Engineer/Senior Site Reliability Engineer at Trimble e-Builder, you will play a crucial role in our AECO (Architecture, Engineering, Construction, and Owner) Software Solutions segment. This position offers the flexibility of hybrid or remote work within the USA. You will collaborate with a dedicated team to plan, design, and deploy cloud technologies, primarily focusing on AWS. Your responsibilities will include developing, modifying, supporting, and maintaining AWS-based components through Infrastructure as Code and automation. You will also design and implement cost control strategies to optimize our cloud expenditures. In this role, you will enhance availability and incident management by implementing self-healing solutions based on alerts. Your proactive approach will be essential in continuously improving our monitoring and alerting capabilities, allowing us to address issues before they escalate. You will support day-to-day operations by measuring, monitoring, and troubleshooting various systems, and participate in an on-call rotation with a focus on automation and improvement. You will be responsible for designing and maintaining custom monitoring dashboards for DEV/OPS/Support, creating and maintaining Cloud Operations processes and procedures, and enhancing our fault tolerance and high availability strategies. Your collaboration with product development teams will be vital in engineering creative solutions to complex challenges. Additionally, you will create processes and train engineers on common cloud administration tasks, ensuring that knowledge is shared and best practices are followed. Your leadership skills will be put to the test as you communicate effectively with customers, vendors, and partners across all levels of the organization. You will explain issues and present clear strategies around automation and cloud deployments, leading team and sector initiatives in infrastructure and server management. Your goals will include meeting and achieving Key Performance Indicators (KPIs), Service Level Agreements (SLAs), and Operating Level Agreements (OLAs), while maintaining high levels of system uptime and increasing the percentage of monitoring-detected service disruptions.

Responsibilities

Work with team to plan, design and deploy cloud technologies
Develop, modify, support and maintain AWS based components through Infrastructure as Code and automation
Design and implement cost control strategies
Enhance availability and incident management by implementing self-healing solutions based on alerts
Continuously improve monitoring and alerting capabilities
Support day-to-day operations, measuring, monitoring, and troubleshooting
Participate in on-call rotation with a mindset of automating and improving
Design and maintain custom monitoring dashboards for DEV/OPS/Support
Create and maintain Cloud Operations processes and procedures
Enhance fault tolerance and high availability strategy
Enhance cloud elasticity through automatic provisioning and destruction of services based on demand
Collaborate with product development teams to engineer creative solutions or solve complex challenges
Create processes and train engineers on common cloud administration tasks

Requirements

5+ years of experience working within AWS
5+ years of experience with monitoring solutions (Newrelic, PagerDuty, AWS)
7+ years of experience in the IT field
7+ years of experience supporting Windows
7+ years of experience supporting Linux OS (CentOS, Amazon Linux, RHEL)
Familiarity with container technologies, like Docker, Kubernetes, ECS, EKS
Scripting experience, preferably in PowerShell or Bash
Experience with ticketing systems (Jira preferred)
Knowledge of storage technologies (SAN, Cloud, Enterprise)
Experience working in a virtualized environment (VMWare preferred)
Strong problem-solving and troubleshooting skills
Familiarity with continuous deployment methodology and other common DevOps tools including Git, Jenkins
Proficient with configuration management and provisioning tools (preferably Ansible and Terraform)
Knowledge in networking technologies & Cloud specific Network assets
Ability and flexibility to be on-call for escalations and support, migration and deployments

Nice-to-haves

Experience or familiarity with Security Certifications such as PCI, SOC2, ISO 27001, FISMA/FedRAMP and HIPAA
Any AWS certifications
Familiarity with ITIL
Database experience (Oracle, PostgreSQL)
Application Support (Tomcat, Java)
Experience with Active Directory or other domain management
Azure exposure
Exposure to SRE

Benefits

Competitive salary
Health insurance
401k plan with matching contributions
Flexible work hours
Professional development opportunities
Paid time off and holidays

Senior DevOps Engineer/ Senior Site Reliability Engineer

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company