This job is closed

We regret to inform you that the job you were interested in has been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

TEKsystems - Atlanta, GA

posted 2 months ago

Full-time - Mid Level
Atlanta, GA
10,001+ employees
Professional, Scientific, and Technical Services

About the position

The AI/ML Telemetry Engineer at TEKsystems is responsible for developing and managing telemetry systems for large-scale datasets, enhancing AI system reliability and performance, and assisting in capacity management. This role requires expertise in monitoring and alerting solutions, as well as collaboration with data scientists to support AI/ML platforms.

Responsibilities

  • Develop and manage telemetry systems for large-scale datasets.
  • Implement monitoring and alerting solutions to ensure system reliability.
  • Collect and analyze data to improve AI system performance.
  • Automate processes to enhance efficiency and reduce manual intervention.
  • Manage and maintain Kubernetes clusters and Docker containers.
  • Utilize Prometheus and Grafana for monitoring and visualization.
  • Work with DCGM/DCGM Exporter (Nvidia Stack) for telemetry.
  • Collaborate with data scientists to support AI/ML platforms.
  • Troubleshoot and resolve issues related to telemetry systems.

Requirements

  • Expert knowledge in Prometheus, Grafana, or Git.
  • Solid understanding of telemetry concepts, metrics, logs, and tracing.
  • Experience with JSON/YAML.
  • Proficiency in Kubernetes and Docker/container technologies.
  • Experience with DCGM/DCGM Exporter (Nvidia Stack).
  • Strong skills in telemetry/observability, monitoring and alerting, data collection and analysis, and automation.

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
Job Description Matching

Match and compare your resume to any job description

Start Matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service