Site Reliability Operations-iii Job in Exotel Techcom Pvt. Ltd

Site Reliability Operations-iii

Save Job

Job Summary

What we are looking for?

Lead & drive root cause analysis efforts across multiple infrastructure layers( OS/ Network/App)
Design & Manage complex & large scale Data Center infrastructures. (e.g. Servers/Network/Security/vendors/software upgrades, patches, hot fixes ) per business requirement.
Drive automation strategies and deployment processes following SDLC processes
Automate systems administration-related solutions for various project and operational needs
Monitor and react to security related incidents as necessary and involve required stakeholders for short term and long term solutions.
Provide on call and out of hours support for business critical services.
Troubleshoot issues in detail whenever there is failure with any component - Server/Monitoring/Service related issues following a solid data-driven approach while arriving with hypothesis. Drive & implement short term and long term solutions.
Administer monitoring services such as Grafana, Nagios and custom-scripts
Explore and implement latest technologies to improve the stability, security, efficiency, and scalability of the environment
Drive initiatives to reduce TAT, MTTR for existing processes and practices
Perform benchmarking exercises for different system components
Drive initiatives to improve the stability, security, efficiency, and scalability of the environment
Mentor juniors in the team

What you will do?

Must-haves

[Must Have] 4-6 years strong hands-on working knowledge of RHEL/CentOS 5/6/7 in an enterprise environment & good understanding of the design and configuration of UNIX/Linux systems.
[Must Have] Handson experience of Orchestration/Configuration Management tools (e.g. Ansible, Chef, or Puppet)
[Must Have] 4-6 years experience in supporting and managing a large number of complex multi-server, multi-vendor, multi-technology infrastructures.
[Must Have] 4-6 years of experience in leading projects from technical design all the way through to delivery.
[Must Have] Hands on experience of one or more scripting languages (e.g. Bash, Python)
[Must Have] Strong in Computer Science fundamentals and strong exploratory skills for exploring new age technologies
[Must Have] Exposure with few of the following: Logging (Rsyslog), Monitoring frameworks (Prometheus, Nagios), Linux Security , Databases - mysql/sql
[Must Have] A "SRE" mindset. You own what you will setup & manage.

Good-to-haves

4+ years of hands on experience of setting-up and managing physical DataCenter environments

Experience Required :

4 to 6 Years

Vacancy :

2 - 4 Hires

To Receive email alerts for similar jobs

Similar Jobs for you

Site Reliability Operations-iii
Exotel Techcom Pvt. Ltd
- Bengaluru, Bangalore Urban, Karnataka
4+ weeks ago
Site Reliability Op...
Exotel Techcom Pvt. Ltd
- Bengaluru, Bangalore Urban, Karnataka
4+ weeks ago