Sovrn Holdings - Boulder, CO

posted 7 days ago

Full-time
Remote - Boulder, CO
Professional, Scientific, and Technical Services

About the position

The Reliability Engineer at Sovrn is responsible for building and maintaining a low latency, high performance, and scalable infrastructure. This role focuses on enabling frameworks that streamline full stack delivery through automation, ensuring that production service level commitments are met while increasing feature velocity.

Responsibilities

  • Oversee the management of AWS cloud infrastructure, including provisioning, configuration, and optimization across multiple accounts.
  • Implement best practices for resource allocation, cost optimization, and scalability within the AWS environment.
  • Monitor and maintain the health, performance, and security of AWS services and resources.
  • Design, implement, and maintain networking configurations for both cloud-based and office environments.
  • Ensure seamless connectivity between cloud resources and on-premises infrastructure.
  • Configure and manage virtual private clouds (VPCs), subnets, route tables, security groups, transit gateways, peering connections, access control lists (ACLs) and client VPN.
  • Provide expertise and insights into container automation and deployment strategies to ensure reliability and efficiency.
  • Analyze deployment processes to identify areas for improvement and optimization.
  • Implement monitoring and alerting solutions to detect and respond to issues affecting containerized applications.
  • Optimize container orchestration platforms such as Kubernetes, EKS, ECS for improved performance, scalability, and reliability.
  • Implement best practices for container lifecycle management, including deployment, scaling, and updates.
  • Work closely with development teams to streamline CI/CD pipelines for containerized applications.
  • Design and implement architectures to ensure high availability and fault tolerance of cloud-based services.
  • Implement redundancy and failover mechanisms to minimize downtime and service disruptions.
  • Conduct regular testing and simulations to validate the resilience of cloud environments.
  • Collaborate with development teams to understand application requirements and optimize deployment processes.
  • Provide guidance on infrastructure requirements and best practices for deploying and scaling applications in the cloud.
  • Implement monitoring and logging solutions to track the performance and health of network and cloud infrastructure.
  • Troubleshoot and resolve issues related to network connectivity, resource utilization, and application performance.
  • Develop and maintain incident response procedures to ensure timely resolution of critical issues.
  • Implement and enforce security best practices for cloud and network environments, including access control, encryption, and compliance.
  • Conduct regular security assessments and audits to identify and address vulnerabilities.
  • Develop automation scripts and tools using Python, shell scripting, and other programming languages to streamline operational tasks.
  • Automate infrastructure provisioning, configuration management, and deployment processes to improve efficiency and reliability.
  • Leverage infrastructure-as-code (IaC) tools such as Terraform to define and manage cloud resources programmatically.
  • Continuously evaluate and implement improvements to enhance the reliability, performance, and scalability of cloud infrastructure.

Requirements

  • Master's degree in Software Engineering, Computer Science or related engineering field.
  • Two (2) years of experience as Site Reliability Engineer, Software Engineer, Systems Engineer or related role.
  • Proficiency in AWS cloud platform.
  • Experience with container orchestration tools (Kubernetes and EKS).
  • Familiarity with Helm, Jenkins, GitHub, Ansible, CloudFormation or Terraform.
  • Knowledge of monitoring tools (CloudWatch, Datadog, Grafana or Prometheus).
  • Proficient in Shell, Bash, and Python.

Benefits

  • Telecommuting options available from anywhere in the U.S.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service