Site Reliability Operations-iii Job in Exotel Techcom Pvt. Ltd

Site Reliability Operations-iii

Apply Now
Job Summary

What we are looking for?

  • Lead & drive root cause analysis efforts across multiple infrastructure layers( OS/ Network/App)
  • Design & Manage complex & large scale Data Center infrastructures. (e.g. Servers/Network/Security/vendors/software upgrades, patches, hot fixes ) per business requirement.
  • Drive automation strategies and deployment processes following SDLC processes
  • Automate systems administration-related solutions for various project and operational needs
  • Monitor and react to security related incidents as necessary and involve required stakeholders for short term and long term solutions.
  • Provide on call and out of hours support for business critical services.
  • Troubleshoot issues in detail whenever there is failure with any component - Server/Monitoring/Service related issues following a solid data-driven approach while arriving with hypothesis. Drive & implement short term and long term solutions.
  • Administer monitoring services such as Grafana, Nagios and custom-scripts
  • Explore and implement latest technologies to improve the stability, security, efficiency, and scalability of the environment
  • Drive initiatives to reduce TAT, MTTR for existing processes and practices
  • Perform benchmarking exercises for different system components
  • Drive initiatives to improve the stability, security, efficiency, and scalability of the environment
  • Mentor juniors in the team

What you will do?

Must-haves

  • [Must Have] 4-6 years strong hands-on working knowledge of RHEL/CentOS 5/6/7 in an enterprise environment & good understanding of the design and configuration of UNIX/Linux systems.
  • [Must Have] Handson experience of Orchestration/Configuration Management tools (e.g. Ansible, Chef, or Puppet)
  • [Must Have] 4-6 years experience in supporting and managing a large number of complex multi-server, multi-vendor, multi-technology infrastructures.
  • [Must Have] 4-6 years of experience in leading projects from technical design all the way through to delivery.
  • [Must Have] Hands on experience of one or more scripting languages (e.g. Bash, Python)
  • [Must Have] Strong in Computer Science fundamentals and strong exploratory skills for exploring new age technologies
  • [Must Have] Exposure with few of the following: Logging (Rsyslog), Monitoring frameworks (Prometheus, Nagios), Linux Security , Databases - mysql/sql
  • [Must Have] A "SRE" mindset. You own what you will setup & manage.

Good-to-haves

  • 4+ years of hands on experience of setting-up and managing physical DataCenter environments

Experience Required :

4 to 6 Years

Vacancy :

2 - 4 Hires

Similar Jobs for you

See more recommended jobs