IBM - Lowell, MA

posted 7 days ago

Full-time
Lowell, MA
Computer and Electronic Product Manufacturing

About the position

The Site Reliability Engineer (SRE) role at IBM focuses on ensuring the smooth and stable operation of the company's infrastructure and applications. The position emphasizes observability, automation, and reliability, requiring collaboration with developers and leadership to deliver inventive solutions to complex problems. The SRE team is responsible for maintaining high availability and performance levels, proactively preventing outages, and supporting other teams with infrastructure-related tasks.

Responsibilities

  • Scale systems sustainably through mechanisms like automation.
  • Ownership of monitoring system.
  • Maintain services in production by measuring and monitoring availability, latency, and overall system health.
  • Application expansion and horizontal scaling.
  • Work closely with developers, support, and QA teams on maintaining and improving the whole lifecycle of services.
  • Practice sustainable incident response and blameless post-mortems.
  • Provide primary operational support and engineering for multiple large distributed software applications.

Requirements

  • Knowledge of configuration management tools (e.g. Ansible or Puppet).
  • Experience with any scripting language (Bash, Python, PowerShell, etc.).
  • Experience with containerization (e.g., Docker, Podman, etc.).
  • Experience with container orchestration tools (e.g., Kubernetes, Open Shift, Docker Swarm, etc.).
  • Experience with database administration and management (MS SQL Server, PostgreSQL, MongoDB).
  • Familiarity with public cloud providers such as AWS, Azure, or IBM Cloud.
  • Experience with monitoring, observability & logging (e.g., DataDog, Prometheus, Grafana, ELK stack, Loki, etc.).
  • Familiarity with RESTful systems and their APIs.
  • Experience with any high-level programming languages (Golang, .Net, Java, etc.) is a plus.
  • Fluent English language skills.

Nice-to-haves

  • Ability to thrive in autonomy.
  • Experience in a large-scale, distributed Linux/Unix or Windows is a plus.
  • Mentoring peers and sharing skills.
  • Great communication skills.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service