Site Reliability Engineer Job in Thousandeyes

Site Reliability Engineer

Apply Now
Job Summary

What You'll Do

  • Collaborate and work closely with the software engineers to ensure that the ThousandEyes platform infrastructure and services are designed and optimized for availability, latency and performance
  • Design and implement solutions to manage our platform s infrastructure as we grow to multi-region scale.
  • Design, deploy, and maintain cloud native services in AWS and GCP that are elastic and resilient to failure.
  • Drive and build automation wherever possible, enabling our infrastructure and platforms to scale effortlessly. Think self service.
  • Participate in and contribute to improve our 24x7 incident response and on-call rotation.
  • Capacity planning for the infrastructure and platform and help teams prepare for growth.
  • Troubleshoot and debug issues across our infrastructure and platform services.

About You

  • MUST be able to write high quality code in Python, Go, or equivalent languages.
  • Ability to design and implement scalable and well tested solutions.
  • Good understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols.
  • Good knowledge of cloud providers, ideally AWS.
  • Strong Infrastructure as Code skills, ideally with Terraform, Puppet and Kubernetes.
  • Strong communication and documentation skills.
  • Strong sense of ownership, drive and an obsessive attention to detail.


Experience Required :

Fresher

Vacancy :

2 - 4 Hires

Similar Jobs for you

See more recommended jobs