Motion Recruitment - Sandy Springs, GA

posted 6 days ago

Full-time - Mid Level
Sandy Springs, GA
Administrative and Support Services

About the position

The Azure Site Reliability Engineer II position in Sandy Springs, GA, is a full-time role focused on maintaining and optimizing the performance of Azure services for an e-commerce software company. The role involves working with cutting-edge technologies, including Azure Services and Datadog, to ensure system reliability and performance. The ideal candidate will have a strong background in site reliability engineering and a collaborative mindset to contribute to meaningful projects while enjoying a supportive work-life balance.

Responsibilities

  • Set up Datadog to monitor Azure resources, including Virtual Machines, AKS clusters, and storage accounts.
  • Use Datadog's dashboards and anomaly detection features to proactively detect and resolve system issues before they impact users.
  • Monitor deployments through Datadog to detect any application errors or performance issues introduced during updates.
  • Develop and optimize CI/CD pipelines for efficient, reliable application deployment.
  • Automate resource provisioning and deployment with IaC tools like Terraform or ARM templates.
  • Continuously monitor Azure infrastructure and applications using Datadog for performance, uptime, and resource utilization.
  • Use Infrastructure as Code (IaC) tools like Terraform or Ansible to provision, update, and manage cloud infrastructure.
  • Develop, maintain, and improve CI/CD pipelines to automate Docker image builds and Kubernetes deployments.
  • Respond to system alerts, production issues, and incidents, working to resolve outages quickly and perform root cause analysis to prevent future incidents.

Requirements

  • Proficiency with Azure services
  • Strong experience with Datadog
  • 5+ years of experience with Site Reliability
  • Proficient in scripting languages like Python, PowerShell, or Bash
  • Strong skills in diagnosing, troubleshooting, and optimizing system performance issues across large-scale environments.

Nice-to-haves

  • Knowledge of Datadog integrations for Azure services, Kubernetes, and CI/CD pipeline monitoring.
  • Familiarity with managing and optimizing databases such as Azure SQL, Cosmos DB, or MySQL.
  • Knowledge of SRE principles such as error budgets, automation, and incident postmortems.
  • Familiarity with IaC (Terraform and Ansible)
  • Understanding of compliance standards (ISO, SOC 2, GDPR) and security practices specific to cloud environments.

Benefits

  • Medical, Dental, and Vision Insurance
  • Vacation Time
  • 401(k) with a company match
  • Commuter benefits
  • Paid holidays
  • PTO
  • Quarterly bonuses
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service