SRE Jobs in Pune
8 Jobs Found
Devops Sre Manager
Talentica Software (i) Pvt. Ltd.
About Talentica Software: Talentica Software is a boutique software development company founded by industry veterans and alumni from IIT Bombay. We specialize in helping startups build innovative products by leveraging the latest tools and technologies to solve real-world challenges. With over 21 years of experience, we've partnered with 180+ startups, primarily in the US, and contributed to numerous successful exits. In 2022, Talentica Software was recognized by Great Place to Work as one of India s Great Mid-Size Workplaces. What We re Looking For: We are seeking a DevOps SRE Manager to lead our cloud operations, with a primary focus on Google Cloud Platform (GCP) and secondary support for AWS. In this role, you will manage two critical teams: one DevOps team responsible for GCP infrastructure, and a CloudOps/SRE team ensuring 24/7 uptime for our mission-critical services. This position requires a blend of technical expertise, leadership skills, and customer relationship management. You ll be responsible for ensuring the reliability, scalability, and security of our infrastructure while overseeing smooth cloud operations. What You ll Be Doing: As a DevOps SRE Manager, your responsibilities will include: Managing GCP Operations: Oversee DevOps operations within Google Cloud Platform using tools like Terraform, Kubernetes (GKE), Prometheus, and Grafana. Infrastructure Automation: Ensure timely execution of tasks and optimize infrastructure automation to improve operational efficiency. CI/CD Enhancement: Drive improvements to CI/CD pipelines, enforce cloud security best practices, and enhance software delivery processes. System Reliability: Improve system reliability through advanced monitoring, logging, and alerting solutions. Cloud Optimization: Optimize cloud infrastructure for cost-effectiveness, scalability, and security, ensuring long-term operational efficiency. Leading CloudOps/SRE Teams: Manage a 24x7 CloudOps/SRE team focused on maintaining service uptime and providing prompt incident response. Incident Management: Lead incident management processes, including conducting Root Cause Analysis (RCA) and ensuring adherence to SLAs. Implement Observability Best Practices: Utilize Grafana, Prometheus, and Opsgenie to implement observability best practices. Promote Automation: Foster self-healing, automated infrastructure to reduce manual interventions and improve operational efficiency. Customer Relationship Management: Build and maintain strong customer relationships through transparent and clear communication. Mentorship and Leadership: Lead and mentor cross-functional teams of DevOps and CloudOps/SRE engineers, ensuring high productivity, continuous professional growth, and performance reviews. AWS Support: Provide basic-to-intermediate support for AWS services (IAM, EC2, S3, Lambda, CloudFormation) and assist in hybrid cloud integration when required. To Be Successful in This Role, You Should Have: Qualifications: BE/BTech from a reputable engineering institute. Experience: 8-12 years of experience in DevOps, CloudOps, or SRE roles. Technical Expertise: Primary Cloud Platform: Expertise in Google Cloud Platform (GCP). Secondary Cloud Platform: Experience with AWS. Infrastructure as Code (IaC): Strong experience with Terraform. Containerization & Orchestration: Hands-on experience with Kubernetes (GKE). CI/CD & Automation: Expertise in tools such as Jenkins, GitOps, and Ansible. Monitoring & Observability: Proficient in Prometheus, Grafana. Incident & Alerting: Familiarity with Opsgenie. Big Data & Streaming: Experience with Kafka, Airflow, Druid. AWS Services: Experience with IAM, EC2, S3, Lambda, CloudFormation, and CloudWatch. Additional Skills: Proven experience managing 24x7 operations and multi-cloud environments. Hands-on expertise with GCP infrastructure, Terraform, Kubernetes, and CI/CD pipelines. Experience with incident management, RCA, monitoring, and alerting. Strong understanding of reliability engineering, automation, and cloud security best practices. Bonus Points If You Have: Experience working with Kafka, Airflow, and Druid in large-scale environments. Certifications such as GCP Professional DevOps Engineer, AWS Solutions Architect, or Kubernetes. Working knowledge of AWS cloud services, especially in hybrid-cloud environments. What You ll Find Here: A Culture of Innovation: We focus exclusively on cutting-edge development. Our clients seek our expertise for innovative solutions, not maintenance work. Endless Learning Opportunities: Constantly expand your skills and stay on top of the latest trends and advancements in cloud technologies. Talented Peers: Work alongside top-tier engineers from India s best institutes (IITs, NITs, and others), fostering a collaborative and growth-oriented environment. Work-Life Balance: We value your well-being and offer flexible schedules and remote work options to help you maintain a healthy work-life balance. A Great Culture: 82% of our employees recommend Talentica to their peers (according to Glassdoor), which speaks to the positive work environment we ve built. Recognition & Rewards: We celebrate success and ensure that your contributions are recognized and appreciated. At Talentica, we invite you to take ownership of large-scale, impactful projects and work with cutting-edge technologies. If you re ready to make a real difference in shaping the future of our industry, we d love to have you join us. Qualification : BE/BTech from a reputable engineering institute.
Senior Site Reliability Engineer
Nvidia
NVIDIA s Infrastructure, Planning and Processes (IPP) organization is seeking a hard-working and experienced Site Reliability/DevOps Engineer, with strong background in Infrastructure Management, Monitoring, Automation, & System Administration, to join our Sanity Operations Team in Pune. The IPP Org provides Infrastructure, Products & Services for multiple software teams including GPU, Mobile, and Automotive divisions working on Nvidia's extraordinary products & services. The team is responsible for hosting, enabling & running the large scale private cloud systems & services, for our in-house Testing CI framework. The cloud hosts a heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android, etc.), running with NVIDIA GPUs and Tegra Processors. What you ll be doing: Create resilient, scalable, and efficient test and deployment pipelines. Design and implement complex automation platforms to identify & resolve operational inefficiencies. Triaging software, hardware and infrastructure issues and maintaining high availability for our infrastructure & services. Deploying & Monitoring critical high performance, large scale services running on Geo-distributed systems. Continuously Strive for efficient utilization & management of the infrastructure. Automate processes for enabling developers to adopt self-service practices, while ensuring compliance with security standards. Work with architects and engineers across the teams to review the designs & solutions during development and deployment phases. Collaborate with our other engineering teams to deliver reliable, robust, and high-performance capability of the underlying infra. Mine & analyze data from multiple sources for identifying scaling & optimization opportunities. What we need to see: Bachelor s or Master s degree in computer science, Software Engineering, or equivalent experience with 7+ years of experience in a DevOps environment. Strong hands-on experience in Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc) Working Experience in monitoring & maintaining large-scale infrastructure applications running in a microservice-based architecture. Proficient with Virtualization architecture with strong experience in Kubernetes, VMs, Dockers. Experience with continuous integration and continuous delivery systems such as GitLab, GitOps, Jenkins, Packer, and Terraform. Strong Python scripting skills, with proven background of using/writing JSON/REST APIs. Fluency in using MySQL or equivalent NoSQL databases queries Solid understanding of configuration management tools like, Chef, Puppet, Ansible, etc. Working Experience with Perforce, GIT or any other version control system is necessary. Experience with telemetry and alerting systems such as Kibana, Elastic Search, Grafana, and Prometheus to create rich visualizations of system health over time. Ability to self-manage, show leadership, mentor others and communicate well. Ways to stand out from the crowd: Understanding of networking concepts like TCP/IP and firewall management. Exposure to web apps/dashboards on frameworks like Django, AngularJS, VueJS, etc. High level understanding of Build and Test systems. Experience in Building regression detection systems by analyzing real-time production data, emphasizing important metrics. Innovating with industry-standard tools and collaborating with the open source community Qualification : Bachelors or Masters degree in computer science, Software Engineering, or equivalent experience with 7+ years of experience in a DevOps environment.
Lead Site Reliability Engineer (azure)
Epam Systems
We are seeking a highly skilled and experienced Lead Site Reliability Engineer with a focus on Azure environments to join our team. In this crucial role, you will leverage your expertise to enhance the reliability and scalability of our cloud-based platforms, ensuring efficient operation and optimal performance. This position involves collaborating closely with cross-functional teams to migrate existing services to the OpenShift platform and make our infrastructure Cloud agnostic. As a leader, you ll guide your team in creating resilient systems and processes that support both internal and external customers relying on our desktop applications and services. Responsibilities Oversee migration of services to OpenShift and work towards making our infrastructure Cloud agnostic Run pipelines using Azure DevOps for environment configuration and application deployment Leverage Python, bash, and PowerShell to automate routine and complex tasks Implement and manage Kubernetes and container-based environments Monitor cloud resources efficiently and improve system performance in line with SLI metrics Debug and resolve operational issues swiftly and effectively Collaborate with development and operations teams to ensure system reliability and security Mentor team members and lead by example in maintaining best practices for site reliability Continuously assess, improve and optimize existing system architecture and applications Stay up-to-date with technological advancements and integrate innovative tools and techniques Requirements 5+ years of experience as a Systems Engineer with a development background 1+ years of relevant leadership experience Proficiency in Linux and Docker with hands-on experience in Kubernetes Capability to use at least one of the following scripting languages: Python, Bash, PowerShell Background in infrastructure management including networking and operating systems Familiarity with monitoring tools in cloud environments and understanding of SLI concepts Familiarity with Azure and/or GCP as cloud service providers Nice to have Experience working with Windows Knowledge of CI/CD pipelines, particularly Azure DevOps Understanding of Istio and GitOps tools like ArgoCD We offer Opportunity to work on technical challenges that may impact across geographies Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications Opportunity to share your ideas on international platforms Sponsored Tech Talks & Hackathons Unlimited access to LinkedIn learning solutions Possibility to relocate to any EPAM office for short and long-term projects Focused individual development Benefit package: Health benefits Retirement benefits Paid time off Flexible benefits Forums to explore beyond work passion (CSR, photography, painting, sports, etc.) Qualification : 5+ years of experience as a Systems Engineer with a development background
Devops Engineer
Bmc Software
Company Overview: BMC Software is an award-winning, equal opportunity employer that fosters a diverse and culturally rich work environment. The company is committed to giving back to the community, and innovation is at the heart of everything we do. We create an atmosphere where your contributions are celebrated, and your ideas are heard. Our SaaS Ops department focuses on delivering exceptional SaaS experiences to our customers by utilizing cutting-edge technologies. We continuously strive to grow by adopting the latest innovations, and we offer a global, versatile environment where professionals can thrive. Role Overview: As a DevOps Engineer, you will join our dynamic SaaS Ops team to design, develop, and implement complex enterprise applications using the latest technologies. You will play a key role in driving the adoption of DevOps processes and tools across the organization. This position offers exciting opportunities to work with industry-leading tools and practices, contributing to the growth and success of both BMC and your own professional development. Responsibilities: End-to-End Product Development: Participate in all aspects of SaaS product development, from requirements analysis to product release and ongoing sustenance. Ensure the delivery of high-quality enterprise SaaS solutions within the specified schedule. DevOps Process Adoption: Drive the adoption of DevOps processes and tools throughout the organization. Develop and maintain Continuous Delivery Pipelines to optimize the deployment process. Technology Integration: Learn and implement cutting-edge technologies to build enterprise SaaS solutions at scale. Work with cloud technologies, containerized environments, and automation tools to enhance application performance and reliability. Collaboration: Collaborate with cross-functional teams, including R&D, Operations, Support, and others, to ensure seamless integration of DevOps practices into the workflow. Documentation & Troubleshooting: Design and document Standard Operating Procedures (SOPs), architecture artifacts, and design documents. Use troubleshooting skills to address issues across different platforms, ensuring minimal downtime and optimal performance. Required Skills & Qualifications: Experience: 3+ years in a software engineering function, preferably with experience in DevOps, SaaS, and automation. Technical Expertise: Strong experience with CI/CD pipelines, containerized deployments, and maintaining production environments. Proficiency in automation scripting languages such as Python, Groovy, Ansible, or Shell scripting. Hands-on experience with Jenkins, Docker, Helm, Git, Terraform, and Jira. Knowledge of Web service protocols (REST, JSON) and experience working with Relational Databases (e.g., PostgreSQL, MS SQL). Containerization & Cloud Technologies: Familiarity with Kubernetes (PODs, persistent storage, ingress, routes) and cloud deployment models (public, private, hybrid). Exposure to ElasticSearch, Grafana, Prometheus, and other monitoring tools. Site Reliability Engineering (SRE): Understanding of SRE principles and their implementation for SaaS services to ensure scalability, performance, and reliability. Operating Systems: Proficient working on Windows and Linux platforms. Agile Methodology: Experience working in an Agile environment with cross-functional teams. Soft Skills: Excellent troubleshooting, communication, and collaboration skills. Hardworking, dedicated, and capable of handling time-sensitive deadlines. Education: Bachelor s degree in IT or a related field, or equivalent professional experience. Bonus Skills (Nice-to-Have): Familiarity with BMC Helix products (ITSM, Digital Workplace, Helix Platform) is an advantage. Previous experience in Site Reliability Engineering (SRE) for SaaS products will be a plus. Work Schedule & Benefits: This position may require occasional weekend work during scheduled production activities and after-hours work as needed. As part of BMC's commitment to equal opportunity, employees benefit from a supportive and inclusive culture, with opportunities for professional growth. Compensation & Rewards: The midpoint of the salary band for this role is 1,638,100 INR. Actual salary will depend on factors such as skills, experience, certifications, and other business needs. BMC offers a comprehensive compensation package, including a variable pay plan and country-specific benefits. Why Join BMC? BMC is a company that thrives on innovation, collaboration, and a commitment to creating a work environment that allows you to bring your best self to work every day. If you re looking for a place where you can make an impact, work with cutting-edge technology, and grow alongside talented professionals, this is the place for you. Be yourself at BMC, and help us shape the future of SaaS! Qualification : Bachelors degree in IT or a related field, or equivalent professional experience.
Devops Architect
Cybage Software Private Limited
DevOps Engineer - Media & Advertising Sector - Job Description We are seeking an experienced DevOps Engineer to join our team and optimize DevOps solutions tailored for the media and advertising sector. This role will focus on leveraging cloud platforms for cost optimization, driving the implementation of CI/CD pipelines, and improving development efficiency and system performance. The ideal candidate will bring expertise in cloud environments, DevOps practices, and performance monitoring tools to enhance the overall software delivery process. Technical and Professional Requirements: Cloud Platforms: Experience with GCP, Azure, or AWS for managing cloud infrastructure and services. DevOps Tools: Expertise in tools such as Terraform, CircleCI, GitLab CI, and Jenkins for building and managing CI/CD pipelines. Monitoring & Performance Tools: Familiarity with Prometheus, Grafana, Dynatrace, New Relic, or AppDynamics for monitoring and troubleshooting system performance. Cloud Cost Optimization: Experience in Cloud Cost Optimization to improve cost efficiency while maintaining system performance. Media & Advertising Domain: Knowledge of the media and advertising sector and experience building DevOps solutions that align with the specific needs of this domain. Job Responsibilities: Facilitate Development and Operations: Improve and streamline the development process and day-to-day operations using DevOps practices. Identify Design Flaws and Bottlenecks: Recognize any design flaws, performance bottlenecks, and areas for improvement in the software development pipeline. Build DevOps Channels: Establish and manage effective DevOps channels throughout the organization to improve collaboration and efficiency. Continuous Build Environments: Set up and maintain continuous integration environments to speed up the software development cycle. Best Practices Delivery: Design, implement, and deliver DevOps best practices across the organization to enhance system performance and development workflow. Cloud Solution Guidance: Provide guidance and expertise to the development team, ensuring optimal end-to-end solutions from a cloud and DevOps perspective. Educational Requirements: Any Graduate with 60% and above academic performance. Qualification : Any Graduate with 60% and above academic performance.
Devops Engineer
Ansys
Summary / Role Purpose The DevOps Engineer supports the development of software products, processes, and supporting systems. In this role, the DevOps Engineer will collaborate with a team of expert professionals to accomplish development objectives and oversee software releases. Key Duties and Responsibilities Responsible for managing and implementing all phases of build, release, and environment management for a distributed team developing engineering software. Deploys, maintains, and supports current software development environments (e.g., Visual Studio, Compilers, IDEs, MPI, etc.). Performs basic DevOps activities, including maintenance, monitoring, documenting, and testing of product builds and packaging to provide quality production builds of ANSYS FBU software products on Windows and Linux systems. Maintains and enhances the in-house testing tool and test results database. Maintains and updates third-party dependencies as needed. Troubleshoots and resolves issues in development, testing, and production environments. Works closely with development to adjust builds and packaging to changing requirements. Automates build processes and integrates with Continuous Integration systems like Azure DevOps. Prepares, configures, deploys, and maintains build agents. Investigates and addresses build and runtime failures; fixes compilation and linker errors. Works in a collaborative manner with members of the software development, infrastructure, and testing teams. Works with IT to maintain DevOps infrastructure. Operates under direct supervision with work subject to frequent review by more experienced staff or the DevOps Manager. Performs other job-related duties that may be assigned by management from time to time. Minimum Education/Certification Requirements and Experience BS in Engineering, Computer Science, or a related field. Preferred Qualifications and Skills MS degree or foreign equivalent in Engineering, Computer Science, or a related field, or 1-3 years of related experience. Experience building software (C/C++/Fortran) on Linux and Windows operating systems. Strong scripting skills (Python, Linux shell scripting, Windows batch scripting, and Perl). Experience with Makefiles/Scons/CMake is preferred. Knowledge of MySQL and PostgreSQL. Strong knowledge of Windows and Linux operating systems is preferred. Experience with a continuous integration system like Azure DevOps or GitHub. Experience with YAML and JSON programming languages. Knowledge of Visual Studio, Intel, and GCC compilers. Experience with a configuration management software like GIT. Familiarity with C/C++ and Fortran programming. Experience working with open-source tools. Solid troubleshooting and problem-solving skills. Ability to plan and complete high-quality work on a schedule. Good communication and interpersonal skills. Ability to learn quickly and collaborate with others in a geographically distributed team.
Manager, Cloud Operations
Druva
Job Title: Manager, Cloud Operations Company: Druva Location: Pune, Maharashtra, India About Druva: Druva is a global leader in data security solutions, empowering organizations to protect and recover their data from all threats. Our Druva Data Security Cloud is a fully managed SaaS platform providing air-gapped and immutable data protection across cloud, on-premises, and edge environments. By centralizing data protection, we strengthen traditional security approaches and enable faster incident response, effective cyber remediation, and robust data governance. Trusted by nearly 7,500 customers including 75 of the Fortune 500 Druva safeguards critical business data in an increasingly connected world. Learn more at druva.com and follow us on LinkedIn, X, and Facebook. Role Overview: As Manager of Cloud Operations, you will lead the team responsible for the stability, scalability, and performance of Druva s cloud infrastructure within our large-scale SaaS environment. This hands-on leadership role demands a deep technical background in AWS cloud operations combined with strong people management skills. You will drive operational excellence through automation, cost management, and rigorous adherence to security and compliance standards, ensuring our services remain highly available 24x7. Key Responsibilities: Team Leadership & Development: Lead, mentor, and support a team of cloud engineers to deliver high-quality results. Foster a collaborative environment and remove blockers to maximize team productivity. Manage hiring, coaching, and retention to build a high-performing team. Technical Strategy & Execution: Drive automation initiatives to minimize manual tasks, boost reliability, and optimize operational workflows. Enforce compliance with security policies and industry regulations. Collaborate closely with DevOps and SRE teams to continuously enhance infrastructure and processes. Champion cost-efficiency while maintaining top-tier system performance. System Reliability & Performance: Monitor and review system health regularly; identify and address any breaches in Service Level Objectives (SLOs). Ensure cloud infrastructure is secure, scalable, and highly available through proactive incident management. Lead incident response, root cause analysis, and post-mortems to improve service resilience. Cross-Functional Collaboration: Partner with engineering teams to ensure smooth deployment of SaaS services to production. Conduct cross-team meetings to communicate deployment quality and status with Release and Host Domain (RHD) owners. Cost of Goods Sold (COGS) Management: Maintain adherence to reservation posture and optimize cloud resource usage. Detect and report COGS anomalies using automated tools and internal alerts. Analyze unit cost trends and customer behavior to identify and address cost irregularities. Security & Compliance: Conduct regular compliance validations and audits. Work with security teams to plan and execute quarterly security roadmap initiatives. Respond promptly to critical security alerts and incidents. Qualifications: 8 10 years of experience in Cloud Operations with at least 2 years in a leadership role. Strong expertise in AWS cloud infrastructure management. Proven track record in driving automation and operational improvements. Deep understanding of system reliability engineering (SRE) and incident management. Experience with cost management and security compliance in cloud environments. Excellent communication and people management skills. If you re passionate about leading cloud operations teams and building secure, reliable SaaS infrastructure at scale, we d love to hear from you!
Java Support Engineer/software Engineer
Hsbc
About HSBC HSBC is one of the largest banking and financial services organizations in the world, operating in 64 countries and territories. Our mission is to enable businesses to thrive, economies to prosper, and help people fulfill their hopes and ambitions. Whether you're aiming for the top of your career or exploring new directions, HSBC offers the support, opportunities, and rewards to help you realize your potential. Role: Software Engineer HSBC is seeking a Software Engineer to join our team. This role focuses on providing critical support to our production services and ensuring smooth transitions from development to production. You will assist in the daily production monitoring of Internet Banking services, collaborate across multiple teams, and help drive automation and performance improvements. If you re passionate about working in an agile and customer-first environment, this is a great opportunity to further your career. Key Responsibilities: 24x7 Support: Provide on-call support for production services, ensuring timely resolution of any issues. Service Transition: Coordinate implementation activities for a smooth transition from development to production. Production Monitoring: Handle daily production Internet Banking level 1 case checking, investigation, and provide solutions or workarounds. Collaboration: Work closely with server, infrastructure, and business teams to address technical and operational challenges. SRE & DevOps: Understand and assist in Site Reliability Engineering (SRE) and DevOps production support activities. Requirements: Technical Expertise: Strong understanding of web-based application support in a multi-tier architecture, especially within the banking domain. Familiarity with Mobile SRE support activities. Expertise in troubleshooting Java/J2EE/Microservices applications deployed on industry-standard front-end application servers such as IBM WAS, IBM WPS, JBoss, and Tomcat. Experience with UNIX systems and hands-on troubleshooting. Python-based automation experience is a plus. Proficiency in using application performance monitoring tools like AppDynamics, Splunk, BMC Patrol, HP BSM, Dynatrace, etc. Tech Stack: Knowledge of Java / J2EE, AngularJS, node.js, React, Spring, DOJO, HTML, JavaScript. Familiarity with ticketing tools such as BMC Remedy, GSD, or ServiceNow. Cloud & Systems Knowledge: Familiarity with cloud management, Oracle databases, LDAP, MQ, TCP/IP networks, web servers, and data center management. Understanding of Microservices, REST APIs, and integration with platforms like Mule Gateway, AnyPoint, and PCF. Experience in application migration and a clear understanding of the process. Preferred Qualifications: Experience with business-critical server application support. Background working in cloud environments. Understanding of application migration processes and related activities. At HSBC, you ll join a global team working at the forefront of technology and banking services. We offer opportunities for growth, support for work-life balance, and the ability to make a meaningful impact on the global economy. If you re ready to take on a dynamic, customer-facing role and thrive in a fast-paced, collaborative environment, we want to hear from you! Qualification : Candidates with good understanding of Cloud, Oracle database, LDAP, MQ, TCP/IP Networks, Webservers, Data Centre will be preferred.
1 - 20 of 0 jobs
* No exact matches found. Showing closest results insteadNo results found
Modify search criteria or create an alert to get relevant jobs as soon as they’re posted