Veradigm - Houston, TX

posted 5 days ago

Full-time - Mid Level
Remote - Houston, TX
Ambulatory Health Care Services

About the position

As a Senior Site Reliability Engineer at Veradigm, you will play a crucial role in ensuring the reliability and performance of our systems. This position involves not only incident management but also mentoring other engineers and fostering a culture of continuous improvement. You will collaborate with cross-functional teams to design and maintain robust systems while utilizing your expertise in service level objectives and incident management to enhance service reliability.

Responsibilities

  • Serve as an on-call engineer, managing and resolving incidents affecting system availability and performance.
  • Collaborate with development, operations, and infrastructure teams to design, implement, and maintain reliable systems.
  • Proactively monitor and analyze system metrics to identify and mitigate potential issues.
  • Conduct thorough root cause analysis of incidents and implement long-term solutions.
  • Automate manual processes to improve efficiency and reduce human error.
  • Participate in capacity planning and performance optimization efforts.
  • Stay updated with industry trends and emerging technologies related to cloud services and Site Reliability Engineering.

Requirements

  • Bachelor's degree in computer science, engineering, or a related field (or equivalent work experience).
  • 4-7 years of experience in development, operations, and infrastructure, with at least 2-3 years as a Site Reliability Engineer or DevOps Engineer.
  • Proficiency in a high-level programming language (C# preferred) and knowledge of Object-Oriented Programming (Java, Objective-C, C#, C/C++, Python).
  • Experience in scripting and automation using languages such as Python, Bash, or PowerShell.
  • 3+ years of experience with service-oriented architectures and microservices.
  • Solid understanding of Site Reliability Engineering principles and experience applying SLAs, SLIs, and SLOs.
  • Extensive experience in incident management and on-call support in high-availability environments.
  • Strong knowledge of cloud services, particularly Azure and AWS.
  • Excellent troubleshooting and problem-solving skills with attention to detail.
  • Strong communication and interpersonal skills for effective collaboration.

Nice-to-haves

  • Certifications in Azure, AWS, Terraform, Kubernetes.
  • Familiarity with DevOps practices and tools, such as CI/CD pipelines and infrastructure-as-code.
  • Experience with monitoring and logging tools like Splunk, Prometheus, Grafana, or ELK stack.

Benefits

  • Remote working options available.
  • Professional development opportunities.
  • Supportive workplace culture promoting diversity and inclusion.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service