This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

Allegis Group - Atlanta, GA

posted 2 months ago

Full-time - Mid Level
Atlanta, GA
10,001+ employees
Administrative and Support Services

About the position

The AI/ML Telemetry Engineer will be responsible for developing and managing telemetry systems for large-scale datasets, enhancing the reliability and performance of AI systems, and assisting in capacity management. This role requires expertise in monitoring and alerting solutions, as well as collaboration with data scientists to support AI/ML platforms.

Responsibilities

  • Develop and manage telemetry systems for large-scale datasets.
  • Implement monitoring and alerting solutions to ensure system reliability.
  • Collect and analyze data to improve AI system performance.
  • Automate processes to enhance efficiency and reduce manual intervention.
  • Manage and maintain Kubernetes clusters and Docker containers.
  • Utilize Prometheus and Grafana for monitoring and visualization.
  • Work with DCGM/DCGM Exporter (Nvidia Stack) for telemetry.
  • Collaborate with data scientists to support AI/ML platforms.
  • Troubleshoot and resolve issues related to telemetry systems.

Requirements

  • Expert knowledge in Prometheus, Grafana, or Git.
  • Solid understanding of telemetry concepts, metrics, logs, and tracing.
  • Experience with JSON/YAML.
  • Proficiency in Kubernetes and Docker/container technologies.
  • Experience with DCGM/DCGM Exporter (Nvidia Stack).
  • Strong skills in telemetry/observability, monitoring and alerting, data collection and analysis, and automation.

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service