Rogoposted about 1 month ago
NY

About the position

We're building AI thought partners to make people smarter and more creative, accelerating the creation and sharing of knowledge in financial services. We're unabashedly ambitious, and we're dead set on building the biggest Financial AI company in the world. Our team is lean, smart, and enormously ambitious. We're growing fast out of our beautiful office in NYC.

Responsibilities

  • Design, deploy, and maintain cloud infrastructure on AWS and/or Azure, ensuring high availability and resilience.
  • Implement and manage monitoring solutions using Datadog to proactively identify and address system issues.
  • Manage Kubernetes clusters, utilizing Helm for package management and deployment automation.
  • Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, and create automation scripts in Bash or Python to streamline operations.
  • Work closely with development and operations teams to propagate DevOps culture, share best practices, and ensure seamless integration and deployment processes.
  • Troubleshoot and resolve complex cross-platform issues related to OS, networking, and databases in a cloud-based environment.
  • Maintain comprehensive documentation of system configurations, procedures, and troubleshooting guides.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • 3-5 years of hands-on experience with AWS and/or Azure cloud platforms, including services like EC2, S3, VPC, and Lambda.
  • 2-3 years of experience managing Kubernetes clusters in production environments.
  • 2-3 years of experience with Helm for Kubernetes package management.
  • 2-3 years of experience with Datadog or similar monitoring tools.
  • 3-5 years of experience with Linux system administration and shell scripting.
  • 2-3 years of experience with Infrastructure as Code (IaC) tools like Terraform.
  • Proficiency in scripting languages such as Bash and Python.
  • Strong understanding of networking fundamentals, including TCP/IP, DNS, and load balancing.
  • Experience with CI/CD pipelines and tools like Jenkins, GitLab CI, or GitHub Actions.
  • Experience with cloud-native security best practices and compliance frameworks.
  • Excellent problem-solving skills and the ability to navigate complex challenges effectively.
  • Strong communication and collaboration skills.

Nice-to-haves

  • Experience with MLOps monitoring and observability.
  • Experience with PostgreSQL, Elasticsearch, and vector databases such as Qdrant or similar technologies.
  • Experience with monitoring and security tools such as Datadog, AWS GuardDuty, CloudWatch, and CloudTrail.
  • Certifications in AWS, Azure, or Kubernetes.
  • Experience with other cloud platforms like Google Cloud Platform (GCP).
  • Experience with distributed tracing and observability tools.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service