Together AIposted about 1 month ago
$160,000 - $230,000/Yr
Full-time • Senior
Remote • San Francisco, CA

About the position

As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI's rapid growth. This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive and reliable abstraction for running AI workloads in them. You will get to be a technology thought leader, evangelize new, cutting-edge technologies, and solve complex problems. To be successful, you'll need to be deeply technical and possess excellent communication, collaboration, and diplomacy skills. You have experience practicing infrastructure-as-code, including using tools like Terraform and Ansible. You have strong software development fundamentals and skills. In addition, you have strong systems knowledge and troubleshooting abilities.

Responsibilities

  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance

Requirements

  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • Demonstrated experience with high performance or distributed cloud microservices architectures and ideally experience building them in operation at a global scale using multiple cloud providers such as AWS, Azure, or GCP
  • Excellent understanding of low level operating systems concepts including multi-threading, memory management, networking and storage, performance, and scale
  • Pragmatic, methodical, well-organized, detail-oriented, and self-starting
  • Experience with Kubernetes and containerization, VPNs, AI workloads, and blockchain based protocols a plus
  • GPU programming, NCCL, CUDA knowledge a plus
  • Experience with Pytorch or Tensorflow a plus
  • 5+ years experience writing high-performance, well-tested, production quality code

Nice-to-haves

  • Experience with Kubernetes and containerization
  • VPNs, AI workloads, and blockchain based protocols
  • GPU programming, NCCL, CUDA knowledge
  • Experience with Pytorch or Tensorflow

Benefits

  • Competitive compensation
  • Startup equity
  • Health insurance
  • Flexibility in terms of remote work

Job Keywords

Hard Skills
  • Ansible
  • Kubernetes
  • PyTorch
  • TensorFlow
  • Terraform
  • As8RdfcpFE2X PyYW75MF4D
  • g259KGxb 1JZq9XDV
  • LCfcJYubF lTfpD4w
  • lzmEJDHZWSQcogj Dbc VbDJG
  • mF1PkDcUlSd0JG x5wYeZv1uNOM9WE
  • pPU8NAR4Z0I3 HtzZgw
  • pybqVw Hzer37hwcD
  • rHREV4e wBlMzmv
  • Rvpr0mqhest JNpWYry8QSav9
  • Sr1Lik4n HePcT7p8XD3
  • V9aYD6Fr0 20GbHJfeKBmg
Soft Skills
  • 10bomI4 5dqj6veZL
  • OK6Jq1wL Apw0bqs7 LXMuZheb
Build your resume with AI

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service