Devops Sre Manager Job in Talentica Software (i) Pvt. Ltd.

Devops Sre Manager

Apply Now
Job Summary

About Talentica Software:

Talentica Software is a boutique software development company founded by industry veterans and alumni from IIT Bombay. We specialize in helping startups build innovative products by leveraging the latest tools and technologies to solve real-world challenges. With over 21 years of experience, we've partnered with 180+ startups, primarily in the US, and contributed to numerous successful exits.

In 2022, Talentica Software was recognized by Great Place to Work as one of India s Great Mid-Size Workplaces.

What We re Looking For:

We are seeking a DevOps SRE Manager to lead our cloud operations, with a primary focus on Google Cloud Platform (GCP) and secondary support for AWS. In this role, you will manage two critical teams: one DevOps team responsible for GCP infrastructure, and a CloudOps/SRE team ensuring 24/7 uptime for our mission-critical services.

This position requires a blend of technical expertise, leadership skills, and customer relationship management. You ll be responsible for ensuring the reliability, scalability, and security of our infrastructure while overseeing smooth cloud operations.

What You ll Be Doing:

As a DevOps SRE Manager, your responsibilities will include:

  • Managing GCP Operations: Oversee DevOps operations within Google Cloud Platform using tools like Terraform, Kubernetes (GKE), Prometheus, and Grafana.

  • Infrastructure Automation: Ensure timely execution of tasks and optimize infrastructure automation to improve operational efficiency.

  • CI/CD Enhancement: Drive improvements to CI/CD pipelines, enforce cloud security best practices, and enhance software delivery processes.

  • System Reliability: Improve system reliability through advanced monitoring, logging, and alerting solutions.

  • Cloud Optimization: Optimize cloud infrastructure for cost-effectiveness, scalability, and security, ensuring long-term operational efficiency.

  • Leading CloudOps/SRE Teams: Manage a 24x7 CloudOps/SRE team focused on maintaining service uptime and providing prompt incident response.

  • Incident Management: Lead incident management processes, including conducting Root Cause Analysis (RCA) and ensuring adherence to SLAs.

  • Implement Observability Best Practices: Utilize Grafana, Prometheus, and Opsgenie to implement observability best practices.

  • Promote Automation: Foster self-healing, automated infrastructure to reduce manual interventions and improve operational efficiency.

  • Customer Relationship Management: Build and maintain strong customer relationships through transparent and clear communication.

  • Mentorship and Leadership: Lead and mentor cross-functional teams of DevOps and CloudOps/SRE engineers, ensuring high productivity, continuous professional growth, and performance reviews.

  • AWS Support: Provide basic-to-intermediate support for AWS services (IAM, EC2, S3, Lambda, CloudFormation) and assist in hybrid cloud integration when required.

To Be Successful in This Role, You Should Have:

Qualifications:

  • BE/BTech from a reputable engineering institute.

Experience:

  • 8-12 years of experience in DevOps, CloudOps, or SRE roles.

Technical Expertise:

  • Primary Cloud Platform: Expertise in Google Cloud Platform (GCP).

  • Secondary Cloud Platform: Experience with AWS.

  • Infrastructure as Code (IaC): Strong experience with Terraform.

  • Containerization & Orchestration: Hands-on experience with Kubernetes (GKE).

  • CI/CD & Automation: Expertise in tools such as Jenkins, GitOps, and Ansible.

  • Monitoring & Observability: Proficient in Prometheus, Grafana.

  • Incident & Alerting: Familiarity with Opsgenie.

  • Big Data & Streaming: Experience with Kafka, Airflow, Druid.

  • AWS Services: Experience with IAM, EC2, S3, Lambda, CloudFormation, and CloudWatch.

Additional Skills:

  • Proven experience managing 24x7 operations and multi-cloud environments.

  • Hands-on expertise with GCP infrastructure, Terraform, Kubernetes, and CI/CD pipelines.

  • Experience with incident management, RCA, monitoring, and alerting.

  • Strong understanding of reliability engineering, automation, and cloud security best practices.

Bonus Points If You Have:

  • Experience working with Kafka, Airflow, and Druid in large-scale environments.

  • Certifications such as GCP Professional DevOps Engineer, AWS Solutions Architect, or Kubernetes.

  • Working knowledge of AWS cloud services, especially in hybrid-cloud environments.

What You ll Find Here:

  • A Culture of Innovation: We focus exclusively on cutting-edge development. Our clients seek our expertise for innovative solutions, not maintenance work.

  • Endless Learning Opportunities: Constantly expand your skills and stay on top of the latest trends and advancements in cloud technologies.

  • Talented Peers: Work alongside top-tier engineers from India s best institutes (IITs, NITs, and others), fostering a collaborative and growth-oriented environment.

  • Work-Life Balance: We value your well-being and offer flexible schedules and remote work options to help you maintain a healthy work-life balance.

  • A Great Culture: 82% of our employees recommend Talentica to their peers (according to Glassdoor), which speaks to the positive work environment we ve built.

  • Recognition & Rewards: We celebrate success and ensure that your contributions are recognized and appreciated.

Why Talentica?

At Talentica, we invite you to take ownership of large-scale, impactful projects and work with cutting-edge technologies. If you re ready to make a real difference in shaping the future of our industry, we d love to have you join us.


Qualification :
BE/BTech from a reputable engineering institute.
Experience Required :

8 to 12 Years

Vacancy :

2 - 4 Hires

Apply Now
Similar Jobs for you

See more recommended jobs

Your 4 Step Guide to Career Success

Apply for jobs
Create Profile
Schedule Interview
Get Hired