Zenoss - Austin, TX

posted 6 days ago

Full-time - Mid Level
Remote - Austin, TX
Publishing Industries

About the position

Zenoss is seeking an experienced Site Reliability Engineer (SRE) to join a team focused on creating innovative ITOps and AIOps platforms. The role involves building software and tools to support operations and support teams in delivering SaaS offerings within cloud and microservices architectures. The SRE will be responsible for the architecture, design, implementation, and delivery of features that support and monitor Zenoss Cloud, contributing to a positive work environment and a healthy work/life balance.

Responsibilities

  • Develop, deploy, operate, and support cloud infrastructure primarily utilizing GCP.
  • Work with development, operations, and support personnel to identify, isolate, diagnose issues, handle support escalations, and deliver high-value monitoring and alerting features.
  • Review technical designs/information, automate processes through scripting, install and configure software, and validate technical environments.
  • Maintain the security and availability of applications based on business requirements and operational models.
  • Ensure production systems are running continuously with redundancy to meet SLAs.
  • Provide non-routine technical support for production operations to enhance performance, reliability, and scale.
  • Document environment topology, installation details, and incident reviews.
  • Automate tasks using scripting and configuration management systems.
  • Communicate technical information to both technical and non-technical personnel.
  • Troubleshoot and resolve technical issues with customers.
  • Monitor network performance, perform intrusion monitoring, and maintain disaster recovery procedures.
  • Plan for capacity expansion, upgrades, patches, and new applications and equipment as necessary.
  • Participate in IT and infrastructure project development.
  • Document and understand application architecture and system configuration across platforms.
  • Determine root causes of outages and recommend resolutions.
  • Provide 24x7 support for critical network and server systems.

Requirements

  • Bachelor's degree in Computer Science/Engineering or equivalent relevant experience.
  • 3-6 years of professional hands-on experience with Cloud production environments hosted on GCP using BigTable, BigQuery, Dataflow, GKE, and other GCP services.
  • Experience with CI/CD tools like Spinnaker and Jenkins and cloud-based software development and delivery processes/methodologies.
  • Strong scripting skills and demonstrated ability to automate tasks (SaltStack, Python, Terraform preferred).
  • Strong understanding of networking, firewalls, load balancers, and databases.
  • Strong verbal and written communication skills.
  • Project and task-oriented with a focus on details and proactive communication.
  • Strong organization skills and ability to work both within a team and independently.
  • Ability to make sound decisions based on customer needs and technical knowledge.
  • Self-motivated and able to work under pressure to deliver high-quality solutions.
  • Ability to work after hours including weekends and nights when required.

Benefits

  • Healthy work/life balance
  • Positive work environment
  • Remote work options or office work in Austin, TX
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service