Kubernetes) - Onsite

Endava - Berkeley Heights, NJ

posted 19 days ago

Full-time - Mid Level

Berkeley Heights, NJ

Professional, Scientific, and Technical Services

About the position

The Senior Site Reliability Engineer/DevOps Engineer is responsible for designing and implementing back-end services to ensure reliability, security, and scalability across all platform layers. This role involves hands-on support for existing environments, managing Amazon EKS clusters, and automating infrastructure and deployment strategies. The engineer will work closely with engineering teams to implement monitoring frameworks and participate in production support activities, ensuring optimal performance and capacity planning in a 24x7 environment.

Responsibilities

Provide hands-on support for existing environments including software installation, patch installation, upgrades, and system monitoring.
Manage and upgrade Amazon EKS clusters.
Implement tools and automation for build, configuration management, continuous integration (CI), deployment, and application monitoring.
Automate and evolve infrastructure, deployment strategies, and testing to support quick turnaround of deployments.
Maintain Infrastructure as Code (IaC) for provisioning, configuring, and scaling infrastructure in cloud environments.
Work closely with Engineering to implement relevant KPIs within the monitoring framework.
Participate in all Production Support activities during incidents and outages, resolving technical issues and recommending performance improvements.
Participate in capacity planning, tuning systems stability, and scaling of the application infrastructure.

Requirements

5 years of experience with AWS tech stack including Elastic Kubernetes, ElastiCache, KMS, EC2, AutoScaling, Load Balancers, SQL, RDS, Dynamo, PostgreSQL, IAM, S3, CloudWatch, CloudFront, Pulsar, Elasticsearch.
Experience scaling large-scale, high-performance Kubernetes clusters.
Experience with Infrastructure as Code tools such as Terraform or AWS CloudFormation.
Ability to work in a Linux environment as the primary setting.
Experience working with third-party vendors.
Demonstrated desire to automate processes and participate in on-call rotation.

Nice-to-haves

Experience with containerization technologies like Docker or Podman.
Familiarity with Continuous Integration/Continuous Delivery tools such as AWS CodePipeline or Azure DevOps.
Linux system administration skills including bash scripting.
Solid understanding of routing and networking concepts.
Experience working in an Agile development environment.
Knowledge of web server configurations such as Nginx or Apache.

Benefits

Competitive salary package and performance bonuses.
Career coaching and global career opportunities.
Access to training, certifications, and online learning platforms.
Hybrid work and flexible working hours.
Global internal wellbeing program and access to wellbeing apps.
Participation in inclusion and diversity programs.

Senior Site Reliability Engineer/ DevOps Engineer (AWS/Kubernetes) - Onsite

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company