Splunk

posted 17 days ago

Full-time - Mid Level
Remote
Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

About the position

The Infrastructure Software Engineer role at Splunk focuses on managing and enhancing the Splunk Cloud Observability platform, particularly in FedRAMP environments. This position is integral to ensuring the reliability, scalability, and efficiency of cloud-native applications, leveraging automation and infrastructure-as-code practices. The engineer will work collaboratively with teams to deliver high-quality products, mentor new engineers, and lead reliability projects that enhance application performance and operational efficiency.

Responsibilities

  • Own Splunk Cloud Observability in FedRAMP environments.
  • Work across the organization to deliver quality products that delight Splunk's passionate users.
  • Collaborate with teams of engineers to build a cloud-based environment for massive-scale data processing.
  • Mentor new engineers to achieve more than they thought possible.
  • Work on reliability projects including HA, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO, and chaos engineering.
  • Manage application uptime and performance, capacity management, SLIs, SLOs, error budgets, and monitoring dashboards.
  • Responsible for deployment and operations of large-scale distributed data stores and streaming services.
  • Establish design patterns for monitoring and benchmarking.
  • Document production run books and guidelines for developers.
  • Implement tooling, toil reduction, runbooks, and automation to handle production environments.
  • Manage incidents and improve MTTD/MTTR for services.
  • Optimize cloud costs.

Requirements

  • 7+ years of experience in handling large-scale cloud-native microservices platforms.
  • 3+ years of strong hands-on experience deploying, handling, and monitoring large-scale Kubernetes clusters in the public cloud (AWS or GCP).
  • Experience with infrastructure automation and scripting using Python and/or Golang.
  • Experience developing, deploying, and maintaining Java services.
  • Strong hands-on experience in monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc.
  • Experience with deployment, operations, and performance management of large-scale clusters such as Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis.

Nice-to-haves

  • AWS Solutions Architect certification preferred.
  • Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications preferred.
  • Experience with Infrastructure-as-Code using Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
  • Experience with CI/CD frameworks and Pipeline-as-Code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory.
  • Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software.
  • Bachelors/Masters in Computer Science, Engineering, or related technical field, or equivalent practical experience.

Benefits

  • Medical insurance
  • Dental insurance
  • Vision insurance
  • 401(k) plan with match
  • Paid time off
  • Flexible working arrangements
  • Incentive compensation
  • Equity or long-term cash awards
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service