Business Wire - San Francisco, CA

posted 21 days ago

Full-time - Senior
San Francisco, CA
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

As a Senior Site Reliability Engineer (SRE) at Business Wire, you will play a critical role in ensuring the availability, reliability, and scalability of our company's infrastructure and applications. This position is essential for maintaining the smooth operation of Business Wire services, which are relied upon by organizations of all sizes to publicize market-moving news and multimedia. You will collaborate closely with software engineering, architecture, and operations teams to design and implement highly automated systems that enhance the overall performance and reliability of our services. In this senior technical role, you will be responsible for providing technical support across all of Business Wire's SaaS-based applications and infrastructure. The ideal candidate will possess a deep understanding of cloud infrastructure, systems operations, network architecture, and software development. You will be expected to continuously improve infrastructure and application design to ensure 99.99% uptime while simplifying architectural complexity. This role offers a unique opportunity to make a significant impact in supporting our customers and enhancing the reliability of our services. You will also be part of a small team that supports mission-critical programs within the company, making it essential for you to be a lifelong learner with a passion for appraising environments and designing innovative solutions. Your expertise in Linux systems, Java application technology stacks, networking, and system/networking troubleshooting fundamentals will be crucial in this role. Additionally, you will participate in on-call rotations to ensure 24/7 application availability and drive incident root cause analysis during outage events.

Responsibilities

  • Design and implement highly automated systems/services that ensure the availability, reliability, and scalability of infrastructure and applications.
  • Build and maintain monitoring and alerting to provide timely feedback on the performance and health of systems, network, and applications.
  • Continuously improve infrastructure and application design to ensure 99.99% uptime while removing architectural complexity.
  • Work with software development to design and implement systems/applications that are resilient to failure and highly scalable.
  • Achieve material application performance improvements based on insights from observability metrics.
  • Develop and maintain disaster recovery plans and procedures.
  • Participate in on-call rotations to ensure 24/7 application availability.
  • Triage incoming Web Support escalation requests.
  • Drive incident root cause analysis, service restoration, and serve as an incident commander during outage events.

Requirements

  • 7+ years of experience as a software engineer with 5 years as an SRE supporting Infrastructure, Networking, and Application Operations in a high availability, 24x7 hybrid environment (Colo/Cloud).
  • Strong record of automation (e.g., Python, Bash, Ansible, Terraform, CloudFormation).
  • Strong experience with AWS cloud infrastructure and container orchestration (Kubernetes, ArgoCD) operating in a GitOps framework.
  • Strong experience with application monitoring, observability, and alerting systems (e.g., New Relic, Grafana).
  • Strong experience with at least one programming language (Python, Java).
  • Advanced experience with Linux system administration, Java-based applications, and network architecture.
  • Ability to participate in architecture reviews.
  • AWS related certifications (Architecture, DevSecOps, Cloud Engineer) are a plus.

Benefits

  • Ability to work remotely
  • Excellent health benefits that begin on your first day of employment
  • $100 monthly fitness allotment
  • Tuition reimbursement program
  • Enhanced mental health resources
  • 401(k) plan with generous company match
  • Annual profit sharing contribution (subject to company performance)
  • PTO, Floating Holidays, Wellness Day Off, Birthday Day Off, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service