Sr. Site Reliability Engineer Job in Springboard

Sr. Site Reliability Engineer

Apply Now
Job Summary The Opportunity
As a Senior Site Reliability Engineer (SRE) at Springboard, you will be a key member for our cloud infrastructure and tech-operations initiatives. You will utilise your diverse background in operations, cloud, systems engineering, and monitoring to ensure uptime, reliability, efficiency and health of our web-services on staging and production. You ll learn quickly, be hands-on, own key processes, and make continuous improvements to the quality of our services and operations, as we scale. Responsibilities
  • Being one of the primary people responsible for reliability, health, and performance of our cloud services.
  • Learning, advocating and adopting processes and industry best-practices.
  • Gaining deep knowledge and understanding of Springboard s application ecosystem and services
  • Setting a high bar for reliability, quality and operational efficiency through continuous improvement.
  • Thinking, innovating and engineering solutions to detect and solve complex problems, which are hard to solve using conventional tools
  • Using your excellent communication skills, empathy and training skills to groom junior engineers towards building a strong SRE function at Springboard.
  • Analysing, designing, and implementing strategies across our Google Cloud infrastructure with emphasis on security, traffic management, cluster configuration, monitoring and operations.
  • Conduct system tests and put processes in place to monitor security, performance, and availability of the service by setting up telemetry (logs, metrics and events) on production systems as well as deployment pipelines
  • Owning Infrastructure Operations: Handling and addressing requests from the engineering team. Defining, implementing & streamlining processes for service & audit.
  • Making recommendations to the development team on areas related to the reliability, maintainability, availability, security and performance of the system as well as efficiency of the team by identifying potential bottlenecks given rate of growth, and scale
  • Developing scripts and CLI tools for day-to-day tasks
You:
  • Must have 5+ years experience with a diverse mix of SRE, DevOps, System Administration or equivalent software-engineering role. You are passionate about enabling teams to build, test and deploy software faster and more reliably
  • Must be an experienced *nix power-user with foundational knowledge on operating systems, system-administration, containers & runtimes, system health & performance.
  • Must be experienced with shell scripting (bash) and/or Python
  • Are passionate about SRE, with a strong desire to learn new technologies, mentor junior engineers, and aspire to grow yourself whilst keeping pace with the company s growth.
  • Must be knowledgeable and experienced with IAM, security network, server, and application-status monitoring. You are comfortable deploying and configuring tools to suit evolving needs by setting up tools and dashboards (Ex: Nagois, DataDog, Prometheus, Graphana etc)
  • Are experienced in working day-to-day with Git (version control), with knowledge of semantic versioning, release and change management.
  • Are deeply interested in identifying, innovating, exploring and solving complex problems related to system performance and scale.
  • Are a preferred candidate if you are Google Cloud Certified (Example: Cloud Architect, Cloud DevOps Engineer)
Experience Required :

Fresher

Vacancy :

2 - 4 Hires

Similar Jobs for you

See more recommended jobs