Unclassified - Boise, ID

posted 2 months ago

Full-time - Mid Level
Remote - Boise, ID

About the position

The Observability Site Reliability Engineer (SRE) position is a remote role focused on delivering projects related to the Observability product functions. The successful candidate will be an individual contributor who collaborates closely with various teams, including Command Center Operations, Network Operations, Cloud Operations Group, Automation & Observability teams, and peer development teams. This role requires effective communication and partnership with cross-functional IT teams, business stakeholders, and leadership distributed across different geographical locations. The position emphasizes the importance of observability in IT services, ensuring that all implementations meet the prescribed requirements through the effective use of approved processes, methodologies, and deliverables. In this role, the engineer will lead Observability initiatives as a Lead Engineer, responsible for the development and implementation of build release pipelines. This includes managing deployment schedules, addressing issues, and mitigating risks and impediments. The candidate will also be accountable for Agile development practices, ensuring team members are committed to and deliver on their responsibilities each sprint. The engineer will provide technical leadership in the design, development, and testing of observability solutions, tracking infrastructure delivery and dependencies to implementation. The role requires the engineer to prepare and present potential technical solutions, advising teams on approaches and trade-offs. The engineer will define the structure of systems, their interfaces, and the principles guiding software design and implementation. Additionally, they will support the creation of reusable application components from both business and technology perspectives, providing coding and technical direction to less experienced staff or developing highly complex original code. The position demands a strong understanding of Azure components, experience with various programming languages, and familiarity with cloud platforms and DevOps practices.

Responsibilities

  • Lead Observability initiatives as Lead Engineer.
  • Develop and implement build release pipelines, managing deployment schedules, issues, risks, and impediments.
  • Ensure Agile development practices are followed, with accountability for team commitments and delivery each sprint.
  • Implement observability solutions that meet IT Services requirements through approved processes and methodologies.
  • Provide expertise and design solutions for observability applications and system integration with internal and external systems.
  • Provide technical leadership in the design, development, and testing of observability solutions.
  • Track infrastructure delivery and dependencies to implementation.
  • Prepare and present potential technical solutions, advising teams on approaches and trade-offs.
  • Define the structure of systems, their interfaces, and guiding principles for software design and implementation.
  • Support the creation of reusable application components from business and technology perspectives.
  • Provide coding and technical direction to less experienced staff or develop highly complex original code.

Requirements

  • Experience with gathering and organizing large volumes of data for instrumentation into an Enterprise Observability solution.
  • Experience with recommending baseline monitoring thresholds, performance monitoring KPIs, and SLAs.
  • Experience with installing agents, forwarders, APIs, performance monitoring alerts, dashboards, and data trend analysis.
  • Good knowledge and understanding of Azure foundation components (e.g., App GW, APIM, Virtual Network, NSG, Load Balancer, Azure VM).
  • Experience with at least one programming language: Java (required), with desired experience in Python, Go, C, or C++.
  • Experience with databases such as Azure SQL, PostgreSQL, MySQL, MongoDB, TSDB, or similar.
  • Experience on cloud platforms, specifically Microsoft Azure or GCP.
  • Experience with PCF, Docker, and Kubernetes platforms.
  • Experience with DevOps and CI/CD tools and processes.
  • Experience in high-performance and high-frequency data streaming (using Kafka) and handling large volumes of batch data (preferred but not required).
  • Experience with Agile/Scrum methodologies.

Nice-to-haves

  • Hands-on experience with tools and technology related to Observability/Monitoring frameworks.
  • Experience working with open-source platforms and Open Telemetry libraries (e.g., Grafana).
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service