91135789 - Jersey City, NJ

posted 6 days ago

Full-time - Senior
Jersey City, NJ

About the position

The Senior SRE/DevOps Engineer role is designed for an experienced professional with over 10 years of expertise in building, deploying, and maintaining scalable and reliable infrastructure. This position focuses on optimizing and automating systems to ensure maximum performance and reliability, with a strong emphasis on resilience and high availability of services.

Responsibilities

  • Design, build, and manage large-scale, highly available infrastructure environments in cloud platforms like AWS, Azure, or Google Cloud Platform.
  • Develop and implement automation tools to streamline deployment, monitoring, and infrastructure management processes.
  • Set up robust monitoring, alerting, and incident management systems to detect and mitigate production issues.
  • Develop and maintain CI/CD pipelines to improve deployment speed and reliability.
  • Conduct performance tuning, capacity planning, and load testing to ensure systems remain robust under heavy load.
  • Collaborate with cross-functional teams to enhance the development lifecycle and ensure seamless delivery.
  • Implement security best practices and compliance policies, performing periodic security audits and vulnerability assessments.
  • Document infrastructure designs, incident responses, and root-cause analyses.

Requirements

  • 10+ years of experience in DevOps, SRE, or related roles with a strong foundation in software engineering or systems administration.
  • Deep expertise in AWS, Azure, or Google Cloud Platform, with hands-on experience in infrastructure-as-code (IaC) tools such as Terraform, CloudFormation, or Ansible.
  • Proficiency in scripting languages such as Python, Bash, or PowerShell, and experience with automation frameworks.
  • In-depth knowledge of CI/CD tools (Jenkins, GitLab CI/CD, CircleCI) and best practices for continuous integration and delivery.
  • Experience with monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (ELK, Splunk).
  • Proficiency with containerization and orchestration tools, including Docker and Kubernetes.
  • Strong analytical and troubleshooting skills with the ability to resolve complex technical issues under pressure.
  • Excellent communication, documentation, and interpersonal skills.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service