Site Reliability Engineer SRE GO Kubernetes Jobs in Bengaluru

470 Jobs Found

TV

Lead Software Engineer - Scale & Performance

Team Vunet Systems

6-12 Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Lead Software Engineer - Scale & Performance Location: Bengaluru Experience: 6 12 years About VuNet VuNet is a pioneer in Business Journey Observability, using Big Data and Machine Learning to revolutionize digital experiences in the financial services industry. Our platform delivers end-to-end visibility into customer journeys, helping organizations proactively resolve issues, ensure operational resilience, and deliver superior user satisfaction. With over 28 billion digital transactions monitored every month and serving more than 300 million users globally, VuNet is shaping the future of observability for some of the largest banks and financial institutions. We are Series B funded, part of NASSCOM s DeepTech Club, and recognized by global analysts such as Gartner and Omdia. Your Role: Lead Software Engineer - Scale & Performance As a Lead Software Engineer for Scale & Performance, you ll own the performance and scalability benchmarks for VuNet s observability platform. You will work with cutting-edge technologies, design robust test frameworks, and ensure that our platform scales seamlessly to meet the demands of millions of users. Roles & Responsibilities Own performance and scalability benchmarking for key platform components (ingestion pipelines, data storage, and query services). Design and execute load, stress, soak, and capacity tests across microservices, agents, and ingestion layers. Identify and resolve performance bottlenecks in both infrastructure (CPU/memory/IO) and application layers (API latency, throughput, GC behavior). Develop and maintain performance test frameworks, preferably using Kubernetes-based environments. Collaborate with DevOps and SRE teams to optimize system configurations (Kubernetes, Postgres/TimescaleDB, ClickHouse, Kafka) for scale. Implement OpenTelemetry for service instrumentation to monitor system health and latency (p50/p95/p99 metrics). Contribute to capacity planning, scaling strategies (horizontal/vertical), and resource optimization. Analyze production incidents related to scaling issues and drive permanent fixes. Work with engineering teams to design scalable architecture patterns and define SLIs/SLOs for system performance. Document performance baselines, tuning guides, and scalability best practices for internal use. What You Bring Mandatory Skills: Strong background in performance engineering for large-scale distributed systems or SaaS platforms. Expertise in Kubernetes, container runtimes (containerd/Docker), and resource profiling in containerized environments. Solid understanding of Linux internals, CPU/memory profiling, and network stack tuning. Hands-on experience with observability tools (Prometheus, Grafana, OpenTelemetry, Jaeger, Loki, Tempo, etc.). Familiarity with observability platform datastores like ClickHouse, PostgreSQL/TimescaleDB, Elasticsearch, or Cassandra. Experience with performance benchmarking tools such as k6, Locust, JMeter, or custom Golang/Python scripts. Ability to interpret system metrics (CPU usage, memory, GC, latency) and correlate across different layers. Nice-to-Have Skills: Experience with agent benchmarking (OpenTelemetry Collector, custom data shippers). Exposure to streaming systems like Kafka, NATS, or Pulsar. Familiarity with CI/CD pipelines for performance testing and regression tracking. Knowledge of cost optimization and capacity forecasting in cloud environments (AWS/GCP/Azure). Proficiency in Go, Python, or Bash scripting for automation and data analysis. Life at VuNet: At VuNet, we're building a world-class observability platform, and we re just getting started. You ll be part of a passionate, problem-solving team that embraces collaboration, fast learning, and staying ahead of emerging technologies like Gen AI. We foster a high-trust, inclusive culture where collaboration, ownership, and innovation are central to our success. If you're looking to work on cutting-edge tech, make a real impact, and grow with a supportive team you ll fit right in at VuNet. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness and 1:1 counseling support. A culture that promotes continuous learning, innovation, and career growth. Transparent, inclusive, and high-trust workplace. Opportunities for skill enhancement with training programs focused on new Gen AI technologies.

Lead Software Software lead Engineer Lead Engineer
TV

Senior Qa Engineer

Team Vunet Systems

6-10 Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Senior QA Engineer - AI-Powered Observability Platform Location: Bengaluru Experience: 6 10 years About VuNet VuNet is at the forefront of Business Journey Observability, revolutionizing the financial services industry with Big Data and Machine Learning. Our deep-tech platform provides comprehensive visibility into customer journeys, enabling proactive issue resolution, operational resilience, and superior user experiences. We monitor over 28 billion digital transactions monthly, serving 300 million users globally, and we re powering some of the largest banks and financial institutions in India and MEA. VuNet is Series B funded, part of NASSCOM s DeepTech Club, and recognized by analysts like Gartner and Omdia. Your Role: Senior QA Engineer - AI-Powered Observability Platform As a Senior QA Engineer at VuNet, you ll play a crucial role in ensuring the quality and reliability of our VuSmartMaps Observability Platform. You ll lead the design and implementation of cutting-edge test automation, performance validation, and reliability frameworks across distributed systems that handle billions of telemetry events. Working closely with development, operations, and QA teams, you will drive quality across the entire platform and play a key role in ensuring that our systems are scalable, resilient, and performant. Roles & Responsibilities Quality Strategy Ownership: Own the end-to-end quality strategy for observability platform components (metrics, logs, tracing, alerting, dashboards, MLOps). Automated Testing: Build and maintain automated test suites for data pipelines, APIs, and integration flows involving tools like Prometheus, Grafana, Loki, Elastic, and OpenTelemetry. Performance Validation: Design and execute tests to validate high-throughput, distributed systems under real-world load conditions, ensuring performance benchmarks are met. Test Frameworks Development: Develop and maintain test frameworks and tools using Python, Go, Bash, pytest, k6, Playwright, and others. System Reliability & Alerting: Define and implement test coverage for system reliability, alerting accuracy, and visualization correctness. Collaboration: Partner with developers, SREs, and DevOps teams to shift quality left in the development lifecycle, contributing to CI/CD pipelines and automation workflows using GitOps tools. Automation Integration: Integrate automated test suites into smoke, functional, and regression pipelines using Jenkins, Spinnaker, and other CI/CD tools. Mentorship: Mentor junior QA engineers, establish best practices, and ensure consistency in the QA discipline across the team. What You Bring Mandatory Skills: Experience: Minimum 6+ years in software quality engineering, with a focus on automated testing, performance, and reliability. Scripting/Programming: Proficiency in at least one scripting or programming language (JavaScript, Python, Go). CI/CD Systems: Experience with CI/CD systems such as GitHub Actions, Jenkins, or ArgoCD. Debugging Skills: Excellent debugging skills and the ability to analyze code quality and system performance. Distributed Systems Knowledge: Familiarity with Kafka, Kafka Streams, ClickHouse DB, and distributed systems. Kubernetes & Microservices: Strong experience testing Kubernetes-native systems, Helm deployments, and microservices. Observability Tools: Knowledge of observability tools like Prometheus, Grafana, Elastic Stack, OpenTelemetry, Loki, or Jaeger. Tooling & Deployment: Proficiency in Jenkins, Spinnaker, GitOps, Kubernetes, and Docker. Testing Experience: Hands-on experience in various types of testing (functional, performance, load, etc.) and knowledge of testing tools. Documentation Skills: Ability to create clear documentation (e.g., release notes, troubleshooting guides, and migration guides). Nice-to-Have Skills: Performance Testing: Experience designing and executing performance and load testing for high-traffic applications. Web Services & Systems Design: Understanding of web services and distributed systems architecture. Cross-Functional Communication: Excellent communication skills with the ability to coordinate across multiple teams. Life at VuNet: At VuNet, we re building a world-class observability platform proudly Made in India and we re just getting started. Join a passionate team of problem-solvers who love tackling complex challenges and stay ahead of the curve with technologies like Gen AI. We offer an environment where collaboration, innovation, and learning are at the core of everything we do. You ll have the opportunity to work on cutting-edge technologies and make a real impact on a product that powers leading banks and financial institutions globally. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness support and 1:1 counseling. A learning culture that promotes growth, innovation, and ownership. Transparent, inclusive, and high-trust workplace culture. Exposure to Gen AI and integrated technology workspaces. Support for career development with various training programs to enhance your skills and expertise.

Senior Qa Senior qa Engineer Senior engineer
OK

Staff Software Engineer, Ai & Automation

Okta

8+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Staff Software Engineer AI & Automation Location: Bengaluru Company: Okta, The World s Identity Company Experience: 8+ Years Type: Full-Time About Okta Okta is the world s leading identity platform. We empower people to securely access any technology, anywhere, on any device. With products like the Okta Platform and Auth0, we place identity at the core of business security, enabling growth through safe digital transformation. At Okta, we value diverse perspectives and experiences. We believe in learning, collaboration, and building an inclusive environment where everyone belongs. About the Team The Business Technology - Shared Services team is at the forefront of Okta s internal digital transformation. We focus on building intelligent, automated platforms that simplify operations and deliver smarter, faster experiences to both employees and customers. We collaborate across engineering, data science, security, and business units to deliver cutting-edge solutions powered by Generative AI (GenAI), virtual agents, workflow orchestration, and intelligent recommendations. The Opportunity As a Staff Software Engineer, you ll play a critical role in designing and developing AI-powered platforms that drive automation, scale, and intelligence across Okta s business. You ll help make LLM-powered solutions and intelligent automation a reality for the enterprise ensuring performance, security, and reliability at scale. This is a hands-on, individual contributor (IC) role, ideal for engineers who are passionate about solving complex problems, architecting scalable systems, and pushing the boundaries of AI integration. What You ll Do Design & Build: Develop scalable backend services that embed GenAI and automation into core business workflows (e.g., virtual agents, document intelligence, smart routing). Collaborate Across Teams: Work closely with product managers, data scientists, and other engineers from ideation to production. Architect for Scale: Make key architectural decisions around LLM integration, API design, data flow, and observability. Code with Excellence: Write clean, secure, and maintainable code in Python, Java, or similar languages. Build for Production: Use Docker, Kubernetes, and CI/CD pipelines to build and deploy high-availability services. Champion Best Practices: Promote high standards for testing, security, code reviews, and operational readiness. Mentor & Guide: Support a collaborative team culture through peer mentorship and design reviews. What You ll Bring 8+ years of experience in software engineering with a strong track record of building and maintaining production-grade, cloud-native services. Expertise in distributed systems, API development, and cloud infrastructure (AWS, GCP, or Azure). Proficiency in Python, Java, or Go. Experience with Docker, Kubernetes, and observability tools (e.g., Prometheus, Grafana, ELK). Exposure to AI/ML concepts and eagerness to work with LLMs, NLP, or automation platforms. A strong sense of ownership, collaborative mindset, and a bias toward action. Passion for learning and working with emerging technologies especially in the AI and automation space. Why Join Okta Make AI Real: Help move GenAI from experimentation to enterprise-wide impact. Build with Purpose: Work on challenges that simplify and secure Okta s internal operations. Grow in a Human-Centered Culture: Join a humble, technically driven team that values learning, excellence, and personal growth. Join Okta and shape how identity, AI, and automation come together to power the modern enterprise.

Software Engineer Staff Engineer Software Engineer Engineer software
TV

Lead Platform Engineer

Team Vunet Systems

6-10 Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Lead Platform Engineer Observability Solutions Location: Bengaluru Experience: 6 10 Years Function: Observability Engineering | Platform Architecture | SRE Enablement Join VuNet Redefining Digital Observability at Scale VuNet is transforming the future of digital experiences through Business Journey Observability, combining Big Data and AI/ML to empower real-time visibility across payments, banking, and financial services. Monitoring 28+ billion transactions/month, our platform is trusted by top financial institutions and powers over 300 million users. Backed by Series B funding and recognized by Gartner, NASSCOM, and Forbes, we are leading the charge in building a new category of observability, proudly Made in India for global impact. Your Role: Lead Platform Engineer As the Lead Platform Engineer, you will architect and drive the development of packaged observability solutions across 100+ infrastructure and application technologies. You will define **golden signals**, build **data collection strategies**, and lead the standardization of alerts, dashboards, and RCA workflows for platforms like **Kubernetes, Oracle DB, and Tomcat**. This is a cross-functional leadership role that sits at the intersection of product, platform, DevOps, and SRE. You will **lead a team** and influence how observability is delivered, scaled, and adopted across complex environments. Key Responsibilities Observability Solution Development Design and lead the delivery of observability packages for databases, middleware, cloud-native, and legacy platforms. Define and implement data collection pipelines, including agents, APIs, logs, metrics, traces, and service discovery. Establish **golden signals, SLIs/SLOs**, and health KPIs for performance, availability, and anomaly detection. Dashboards, Alerts & RCA Develop standardized, reusable dashboards, alerts, reports, and troubleshooting playbooks. Automate **RCA workflows** to improve MTTR and reduce alert fatigue. Platform Enablement & Integration Work with engineering to enhance agent capabilities and support new data sources/formats. Guide implementation of platform features for better observability at scale. Team Leadership & Governance Lead and mentor a team of observability engineers and specialists. Define design patterns, reusable modules, and version-controlled libraries. Stakeholder Collaboration Partner with product managers, DevOps, SREs, and customer teams to gather requirements, align priorities, and validate use cases. Ensure deliverables are scalable, well-documented, and production-ready. What You Bring Must-Have Skills 6 10 years of experience in observability, platform engineering, or SRE roles. Hands-on with tools like Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, Splunk. Strong understanding of logs, metrics, traces, profiling, and collection strategies. Experience developing solutions for platforms like Kubernetes, Oracle, PostgreSQL, Tomcat, etc. Proficient in Python, Shell scripting, APIs, and automation tools (**Terraform**, etc.). Familiar with alert fatigue mitigation, anomaly detection, and RCA frameworks. Excellent communication, technical leadership, and documentation skills. Nice to Have Experience managing an observability marketplace or solution catalog. Contributions to open-source observability projects. Certifications in Kubernetes, Observability platforms, or cloud providers (AWS/GCP/Azure). Background in ITSM tools, CMDBs, or incident workflow automation. At VuNet, you ll help build a category-defining observability platform that s already transforming critical infrastructure for leading financial institutions. You ll work with passionate engineers, push technical boundaries, and grow in a high-trust, high-impact environment. What You ll Experience: Ownership of key observability initiatives impacting 300M+ users. Collaboration with SRE, DevOps, and product teams across real-time financial systems. Opportunity to experiment with and shape Gen AI, ML, and emerging telemetry trends. Perks & Benefits Health insurance for you, your parents, and dependents. 1:1 mental wellness support. Training programs, certifications, and career growth opportunities. Transparent, inclusive, and high-trust work culture. Access to cutting-edge technology and Gen AI-powered workspaces.

Lead Platform Engineer Lead Engineer Engineer lead
GR

Site Reliability Engineer

Groww

4-6 Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Position: Site Reliability Engineer Location: Bengaluru About Groww At Groww, we re on a mission to make financial services simple, accessible, and transparent for every Indian. As one of India s fastest-growing financial platforms, we help millions take control of their financial future through a wide range of products. We re a team driven by ownership, radical customer-centricity, and a deep passion for challenging the status quo. From intuitive design to robust engineering, everything we build is grounded in what our customers need. If you re excited about building systems that power the future of finance in India, we d love to hear from you. Our Vision To empower every Indian with the knowledge, tools, and confidence to make sound financial decisions. Our goal is to be the most trusted financial partner for millions across the country. Our Core Values Customer Obsession We put our users first, always. Extreme Ownership We own everything we do, end-to-end. Simplicity We keep things simple, effective, and intuitive. Long-term Thinking We focus on sustainable, impactful decisions. Transparency We believe in open communication and collaboration. Role Overview: As a Site Reliability Engineer (SRE) at Groww, you will be responsible for ensuring our systems are highly available, performant, and secure. You will work closely with engineering and infrastructure teams to improve reliability, automate deployments, and manage mission-critical services that power our platform. Key Responsibilities: Monitor and troubleshoot issues related to system performance, availability, and security. Define and maintain SLIs, SLOs, and Error Budgets to improve system reliability. Use tools like Grafana to analyze and report on metrics and trace data. Participate in the on-call rotation for 24/7 support of production systems. Collaborate with developers to ensure scalability and reliability are built into new services. Roll out security and infrastructure features proactively. Manage automated deployments, version control, and release rollouts. Perform Root Cause Analysis (RCA) for incidents and implement long-term fixes. Optimize system performance, conduct capacity planning, and create recovery strategies. Identify and automate repetitive tasks to reduce toil. Leverage CI/CD tools such as Git, Jira, Jenkins to streamline development workflows. Requirements: 4 6 years of relevant experience in SRE, DevOps, or infrastructure engineering. Bachelor's or Master's degree in Computer Science or a related field. Strong background in Linux/Unix system administration and networking. Hands-on experience with cloud platforms like GCP or AWS. Proficiency in programming languages such as Python, Java, or Go. Experience with monitoring and alerting tools: Grafana, Prometheus, New Relic, etc. Familiarity with configuration management tools. Experience with Kubernetes, Docker, and container orchestration tools is a strong plus. Excellent problem-solving, communication, and team collaboration skills. Be a part of one of India s fastest-growing fintech startups. Build and scale systems that impact millions of users daily. Work with passionate, driven teammates who are redefining financial services. A culture that encourages continuous learning, ownership, and transparency. If you're ready to help shape the future of fintech infrastructure in India, Groww is the place for you. Let s build something extraordinary together. Qualification : Bachelor's or Master's degree in Computer Science or a related field

Site Reliability Site reliability Engineer Site engineer
SL

Technical Lead Devops

Subex Limited

3-6 Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Position: Technical Lead - DevOps Location: Bangalore Rural, Karnataka, India Department: Data Platform and DevOps Employment Type: Subexian Experience Required: 3 to 6 years Job Overview: We are seeking an experienced Kubernetes Administrator with a strong background in managing containerized environments. The ideal candidate will have 4+ years of hands-on experience in deploying, configuring, and optimizing Kubernetes clusters to drive scalability, reliability, and performance. This is an excellent opportunity to leverage your expertise in Kubernetes orchestration while contributing to the overall success of our platform. Key Responsibilities: Cluster Management: Deploy, configure, and manage Kubernetes clusters both on-premises and across cloud platforms such as AWS, Azure, and GCP. Security & Compliance: Implement best practices for cluster security, including role-based access control (RBAC), network policies, and data encryption at rest and in transit. Automation: Automate cluster provisioning and ongoing management using tools like Terraform, Ansible, or Helm charts, streamlining operations and reducing manual tasks by 40%. Monitoring & Performance: Continuously monitor cluster health and performance metrics using tools like Prometheus, Grafana, ensuring high availability and optimal performance. CI/CD Pipelines: Design and implement CI/CD pipelines for containerized applications using tools such as Jenkins, GitLab CI/CD, and CircleCI to enable smooth continuous delivery. Collaboration: Work closely with development teams to troubleshoot issues, optimize application performance, and ensure compatibility with Kubernetes environments. Security Audits: Conduct regular security audits to identify vulnerabilities and ensure compliance with industry standards. Documentation: Maintain clear and comprehensive documentation for deployment procedures, configuration settings, and troubleshooting guides to enhance knowledge sharing within the team. Infrastructure Management: Administer and maintain Linux/Unix servers and virtualization platforms such as VMware or KVM, ensuring seamless operations across the infrastructure. Backup & Recovery: Implement and manage robust backup and disaster recovery solutions to ensure data integrity and minimize system downtime. Technical Support: Provide expert-level technical support for server and network infrastructure-related issues. Required Skills & Qualifications: Proven experience in Kubernetes deployment, configuration, and administration. Strong command of containerization technologies, including Docker and containerd. Hands-on experience with cloud platforms such as AWS, Azure, and GCP. Proficiency in Infrastructure as Code (IAC) tools like Terraform and Ansible. Familiarity with CI/CD pipelines and automation tools like Jenkins and GitLab CI/CD. Excellent troubleshooting and problem-solving skills. Strong communication and collaboration abilities, with the capability to work effectively across cross-functional teams. If you re passionate about DevOps, Kubernetes, and driving the success of containerized environments, we d love to hear from you!

Technical Lead Technical lead DevOps Lead devops
GC

Systems Development Engineer, Google Cloud

Google Careers

3+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Systems Development Engineer Google Cloud Location: Bengaluru, Karnataka, India Company: Google Minimum Qualifications Bachelor s degree in Computer Science, Information Technology, or a related field; or equivalent practical experience. 2+ years of experience with systems automation. 2+ years of experience in technical infrastructure (e.g., deployment, maintenance, troubleshooting). Preferred Qualifications 3+ years of experience in systems design and implementation. About the Role As a Systems Development Engineer (SDE) at Google Cloud, you will be part of a team responsible for managing and scaling critical services and infrastructure. This role emphasizes automation, reliability, and observability, using engineering practices to eliminate manual toil and improve system efficiency. Google SDEs design and build the tools and systems that power the infrastructure for Google s services, transforming telemetry into actionable insights and proactively solving operational challenges. You ll have the opportunity to work on impactful, large-scale projects in an environment that fosters learning, collaboration, and growth. Key Responsibilities Participate in on-call rotations and incident response, managing services within your domain. Troubleshoot infrastructure and system issues, evaluate diagnostic data, and recommend solutions. Resolve tickets and bugs within defined service-level objectives (SLOs). Collaborate with primary responders to maintain high availability and reliability of systems. Contribute to the design and implementation of systems and services in related domains. Work directly with customers to gather requirements, define distributed system needs, and propose solutions. Develop automation tools and systems to improve efficiency and reduce operational overhead. About Google Cloud Google Cloud helps organizations transform their business with advanced technologies and enterprise-grade solutions. With a focus on sustainability, innovation, and scalability, Google Cloud serves customers in over 200 countries and territories, providing the tools and infrastructure necessary to solve the world s most complex business challenges. Qualification : Bachelor's degree in Computer Science or IT-related field, or equivalent practical experience.

Systems Development Systems development Engineer Systems Engineer
EI

Staff Engineer - Core Infrastructure

Eightfold

10+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Staff Engineer - Core Infrastructure Location: Bangalore, Karnataka, India Employment Type: Full-Time | Hybrid Work Model About Eightfold.ai At Eightfold.ai, we re transforming the future of work by leveraging artificial intelligence to connect individuals with career opportunities based on their skills and potential, not just their network. Our Talent Intelligence Platform powers a more diverse, inclusive workforce by helping organizations plan, hire, develop, and retain top talent. With $410M+ in funding and a $2B+ valuation, we are revolutionizing how the world thinks about skills, potential, and careers. If you re passionate about cutting-edge technology, infrastructure, and creating scalable solutions that impact the world, we want you to join us. The Opportunity We re looking for a Staff Engineer to join our Core Infrastructure Team and help scale the backbone of Eightfold s platform. This high-impact role will involve designing, building, and optimizing foundational systems that power everything from search and machine learning infrastructure to developer platforms and observability tools. You will drive system design across our stack and mentor engineering teams to build scalable, resilient systems that enable Eightfold to grow and deliver AI-powered solutions for our customers. What You ll Own & Drive Architect & Scale Core Systems: Design and build scalable infrastructure systems that support Eightfold s AI-driven products, including search, compute, storage, and machine learning infrastructure. Cross-Functional Leadership: Lead cross-team technical initiatives, collaborating with Product, Security, Data, and Platform teams to align with company-wide goals. Hands-On Development: Contribute directly to system design, code reviews, and incident response, ensuring best practices are followed. Mentorship & Leadership: Guide and mentor engineers to help them grow into future leaders, fostering a culture of technical excellence across teams. Advocate for Engineering Excellence: Champion best practices across areas such as cloud architecture, CI/CD, security, and observability. Solve Complex Infrastructure Challenges: Tackle problems around reliability, scalability, and infrastructure performance, ensuring the systems are robust and perform well at scale. Bring Emerging Tech to Life: Stay on top of the latest trends and technologies, incorporating new scalable design patterns into our architecture. What You Bring 10+ years of experience in backend or infrastructure engineering, with a strong background in building distributed, cloud-native systems. Proven track record in designing and delivering reliable, high-scale services (ideally in AWS, GCP, or Azure environments). Expertise in Infrastructure Technologies: Deep knowledge of containerization, orchestration (Kubernetes), and infrastructure-as-code. Experience with one or more of the following: search infrastructure, ML/AI infrastructure, databases/data warehouses, developer tooling, or platform security. Leadership Experience: A passion for mentoring and guiding engineers, influencing teams and peers, and driving excellence across projects. Strong communication skills, able to translate complex technical challenges into strategic business impact. (Bonus) Experience with SRE principles, cloud security, and compliance for enterprise/government environments. Our Engineering Culture At Eightfold, we believe in ownership over tasks. You won t just be given directions; you ll be trusted to take responsibility and make a measurable impact. We have a growth mindset and continuously improve in all aspects of our work. Collaboration, transparency, and speed are core to everything we do. You ll work in a dynamic, supportive environment where your work directly influences the success of the company and its mission. Meaningful Work: Help shape the future of work by building products that impact careers and businesses globally. Growth Opportunities: Be part of a rapidly scaling company where your contributions are highly valued. Competitive Compensation: Attractive salary, equity, and comprehensive benefits package (including medical, vision, and dental coverage). Hybrid Work Model: Work from our Bangalore office twice a week, with flexibility for remote work. Inclusive Culture: We are committed to fostering a diverse and inclusive work environment where everyone feels valued. Equal Opportunity Employer Eightfold.ai is an Equal Opportunity Employer. We do not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, or disability. If you re a hands-on, innovative engineer with a passion for building scalable systems and tackling infrastructure challenges, we want to hear from you.

Engineer Staff Engineer Core Infra Infra engineer
SA

Devops Engineer

Sarvam

Fresher | Not Disclosed | Bengaluru, Karnataka, India | Full-time

DevOps Engineer Location: Bengaluru, Karnataka, India (On-Site) Department: Engineering Employment Type: Full-Time About Sarvam.ai Sarvam.ai is a cutting-edge generative AI startup headquartered in Bengaluru, India, with a mission to make generative AI accessible and impactful for Bharat. Founded by AI experts, we are dedicated to developing high-performance, cost-effective AI agents tailored for the Indian market. We enable enterprises to tap into new opportunities, build deeper customer connections, and reshape the future of AI for India and beyond. Role Overview We are looking for a DevOps Engineer to join our team and help build and manage scalable, secure, and high-performance infrastructure. In this role, you will be a key contributor to automating deployments, managing cloud infrastructure, optimizing CI/CD workflows, and ensuring system reliability. You will work with cutting-edge technologies, including cloud platforms, containerization, and infrastructure as code (IaC), to deliver impactful solutions for AI-driven products. Key Responsibilities CI/CD Pipelines: Design, implement, and manage CI/CD pipelines for seamless software deployment and integration. Cloud Infrastructure: Deploy and manage cloud infrastructure using Terraform, Kubernetes, and Docker for scalability and high performance. Automation & Scaling: Automate infrastructure provisioning, scaling, and security compliance to support high-availability environments. Monitoring & Optimization: Implement logging, monitoring, and alerting solutions using tools like Prometheus, Grafana, ELK Stack, or CloudWatch to monitor system performance and optimize resource utilization. Security & Compliance: Enhance security and compliance by managing IAM policies, encryption, and vulnerability scanning. Troubleshooting & Root Cause Analysis: Troubleshoot system failures, perform root cause analysis, and implement improvements to ensure reliability and uptime. Collaboration: Work closely with development teams to ensure smooth deployment and operation of AI models and applications. Must-Have Skills & Qualifications Educational Background: Bachelor s degree in Computer Science, Engineering, or related field (2024/2025 graduates). Cloud Expertise: Strong experience with AWS, Azure, or GCP for deploying and managing cloud-based applications. Containerization: Proficiency in Docker and Kubernetes for building and managing containerized applications. Infrastructure as Code (IaC): Experience with Terraform, Ansible, or CloudFormation to automate infrastructure management. CI/CD Pipelines: Experience in setting up automated workflows using tools like GitHub Actions, Jenkins, or GitLab CI/CD for smooth deployments. Monitoring & Logging: Experience with Prometheus, Grafana, ELK, or similar tools to implement effective monitoring and logging solutions. Networking & Security: Strong understanding of firewalls, VPNs, SSL, and cloud security best practices for secure infrastructure. Version Control: Proficiency with Git for managing code repositories and version control workflows. Problem Solving: Strong debugging, troubleshooting, and analytical skills to resolve complex system issues. Good to Have (Preferred Experience) Serverless Computing: Exposure to serverless computing models such as AWS Lambda or Azure Functions. Message Queues: Experience with message queues like Kafka, RabbitMQ, or SQS. Site Reliability Engineering (SRE): Familiarity with SRE practices to ensure the reliability and availability of large-scale systems. Open Source Contributions: Contributions to open-source projects or a strong GitHub portfolio showcasing DevOps expertise and best practices. Impactful Work: Work on AI-driven products that are reshaping the future of technology in India. Innovative Team: Collaborate with a team of AI experts and engineers pushing the boundaries of technology. Career Growth: Opportunity to grow in a fast-growing startup at the forefront of the generative AI revolution. Cutting-edge Technologies: Work with cloud technologies, automation, and AI infrastructure to create high-impact products. Qualification : Bachelors degree in Computer Science, Engineering, or related field

DevOps Engineer Devops engineer Full-Time Continuous integration
AP

Sr. Software Engineer, .net Fullstack Reactjs

Apttus

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Senior Software Engineer .NET Core | Cloud | Microservices | Bangalore, India Location: Bangalore, India Department: Software Engineering Reports To: Sr. Manager, Software Engineering Experience: 5+ Years Tech Stack: .NET Core 6, C#, JavaScript, Angular/React, AWS/Azure, Kubernetes, ElasticSearch, NoSQL About Conga: At Conga, we thrive on innovation, career growth, and team collaboration. We are a leading provider of Revenue Lifecycle Management solutions, helping businesses streamline configuration, fulfillment, and renewal with AI-driven insights and a unified data model. Our culture is rooted in the Conga Way, which emphasizes employee empowerment, customer transformation, and product excellence. About the Role: We are looking for a Senior Software Engineer to join our Platform Services team in Bangalore. You will play a pivotal role in building cloud-agnostic, scalable platforms that power the Conga product suite. You will collaborate with cross-functional teams, design high-performance backend systems, and contribute to all phases of the Agile development lifecycle. This is a career-defining opportunity to work on enterprise-scale microservices architecture and cutting-edge cloud platforms. As a Senior Software Engineer, you will not only write clean, scalable code but also mentor junior engineers and solve complex technical challenges in a fast-paced SaaS environment. Why This Role Matters: This isn't just a coding job it's an opportunity to shape the future of enterprise software. You will have the chance to influence technical direction, ensure high-quality engineering practices, and contribute to scalable solutions that are deployed globally. As part of a forward-thinking team, you ll drive innovation and build solutions that make a real impact. Key Responsibilities: Backend Development: Design, develop, and maintain scalable services using .NET Core 6 and C#, ensuring code quality and adherence to best practices. Frontend Collaboration: Work with React or AngularJS, JavaScript, HTML5, and CSS to support full-stack development. Cloud Platforms: Implement solutions on AWS or Azure, utilizing modern cloud architecture and design patterns. Microservices Architecture: Develop cloud-agnostic microservices, using containers and Kubernetes, optimized for performance and reliability. Agile Participation: Actively participate in Agile ceremonies (planning, backlog grooming, reviews) and contribute to end-to-end project delivery. Problem Solving: Act as a technical subject matter expert (SME) for issue resolution, architecture discussions, and performance optimization. Collaboration: Work closely with global teams across time zones, fostering a highly agile and transparent development culture. Mentorship: Provide guidance to junior developers, conduct code reviews, and promote a culture of technical excellence. Qualifications & Experience: 5+ years of software development experience with a strong background in .NET Core, C#, and JavaScript frameworks (React or AngularJS). Hands-on experience with React or AngularJS for front-end development. Proficiency in cloud platforms such as AWS or Azure, as well as ElasticSearch and NoSQL databases. Experience in building and deploying containerized applications using Kubernetes. Deep understanding of cloud-native architecture, CI/CD pipelines, and scalable microservices. Bachelor's degree in Computer Science, Engineering, or a related field. What Sets You Apart: Strong Communication Skills: Ability to clearly explain complex technical concepts and collaborate across globally distributed teams. Leadership Potential: Ability to take ownership of deliverables, motivate others, and contribute meaningfully to team goals. Problem Solver: A passion for diagnosing and resolving issues across stacks with elegant solutions. Global Mindset: Comfort working outside traditional hours to coordinate with international teams and ensure timely delivery. At Conga, we re passionate about building a collaborative culture where innovation and growth go hand-in-hand. As a Senior Software Engineer, you ll be at the forefront of developing cutting-edge solutions that directly impact the way businesses manage their revenue lifecycle. If you're excited about scalable solutions, microservices architecture, and the cloud-first future, this is the role for you. Qualification : Bachelors degree in Computer Science, Engineering, or a related field.

Sr. Software Sr. software Engineer Sr. engineer
CS

Principal Cloud Development Engineer

Cloud Software Group

14+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Job Title: Principal Cloud Development Engineer Location: Bengaluru, India About Cloud Software Group: Cloud Software Group (CSG), home to Citrix and TIBCO, is one of the largest global providers of cloud-based technologies, empowering over 100 million users worldwide. As a Principal Cloud Development Engineer, you will play a pivotal role in shaping the future of Desktop-as-a-Service (DaaS) solutions helping deliver secure, scalable, and intelligent platforms that drive modern work experiences from anywhere. We re entering an era of accelerated innovation and transformation now is the perfect time to bring your technical leadership, cloud expertise, and mentorship mindset to the forefront. About This Team: The DaaS team at CSG is responsible for designing and building scalable and resilient cloud-native microservices that power Citrix s core virtualization offerings. This team collaborates across product, architecture, operations, and customer success groups to build next-gen capabilities on Azure, AWS, and other hybrid environments. Your Role and Responsibilities: As a Principal Cloud Development Engineer, you will be expected to: Lead design and architecture discussions for cloud-native solutions within the Citrix DaaS product line. Drive the development of scalable and secure backend features, with emphasis on business logic, cloud security, and performance. Mentor junior and senior engineers, guiding them in coding best practices, design decisions, and technical growth. Collaborate with Product Managers, UX Designers, Support, and Site Reliability Engineers to build customer-centric features and maintain high service uptime. Contribute to strategic technical initiatives, including the adoption of Gen AI tools, DevSecOps automation, and performance tuning of production systems. Participate in on-call escalation support, helping debug complex issues and lead incident resolution. Promote a culture of continuous learning and improvement through code reviews, technical sessions, and post-incident analysis. Required Experience and Skills: 14+ years of experience in cloud software development using .NET (C#), Java, or equivalent Object-Oriented Programming languages. Strong computer science fundamentals (algorithms, data structures, systems design). Proven track record in building and leading cloud-native microservices with modern deployment practices (CI/CD, IaC, Kubernetes, Docker). Strong cloud platform expertise, especially in Microsoft Azure or Amazon EC2. Deep understanding of cloud security, including identity/access management, encryption, compliance, and incident response. Advanced knowledge in automation scripting (Python, PowerShell). Familiarity with troubleshooting tools like Sumo Logic, Splunk, or equivalent observability platforms. Experience with Terraform, CI/CD pipelines, and managing Kubernetes-based deployments. Strong communication, collaboration, and mentoring abilities. Preferred Qualifications: Prior experience building secure services in the DaaS, VDI, or enterprise SaaS domain. Hands-on experience with Azure Active Directory, Microsoft AD, or other identity solutions. Moderate understanding of cryptographic protocols and encryption standards. Familiarity with Agile/SAFe development methodologies. Contributions to open-source or technical publications are a plus. Impact: Influence the architecture and direction of mission-critical cloud platforms used globally. Mentorship: Be a technical leader shaping the next generation of engineers. Innovation: Work with a company at the edge of a "Cambrian leap" in cloud evolution. Culture: Inclusive, forward-thinking, and driven by curiosity and collaboration. Flexibility & Benefits: Competitive salary, performance bonus, flexible work model, health insurance, wellness programs, and more. Equal Opportunity Statement: Cloud Software Group is committed to Equal Employment Opportunity and prohibits unlawful discrimination of any kind. All qualified applicants will receive consideration without regard to race, color, religion, gender, gender identity or expression, national origin, age, disability, veteran status, or any other characteristic protected by law.

Principal Cloud Development Cloud development Engineer
IB

Infrastructure Specialist-cloud Application Operations

International Business Machines

Fresher | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Infrastructure Specialist Cloud Application Operations Location: Bangalore, Karnataka, India Job Type: Full-Time Experience Level: Mid to Senior-Level Industry: IT Consulting / Cloud Infrastructure Company: IBM Consulting Client Innovation Center Introduction: At IBM Consulting, your career is powered by collaboration, innovation, and the opportunity to work with visionary clients across industries. You'll be part of a global team committed to driving transformation across hybrid cloud and AI. Backed by our cutting-edge technology and strong ecosystem of strategic partners, you'll help shape the future of cloud operations. In this role, you will be based out of one of our IBM Client Innovation Centers in Bangalore, delivering localized skills and deep technical expertise to clients in both the public and private sectors. Your work will help clients adopt next-gen technologies and innovate faster. Your Role & Responsibilities: Provide technical operations support for cloud-based applications, middleware, DevOps processes, security systems, and infrastructure components. Manage Application ID provisioning and access control in accordance with client standards. Enable infrastructure elasticity by implementing auto-scaling mechanisms to optimize resources based on business needs. Collaborate with global teams to ensure seamless incident management, change control, and service delivery. Share expertise and assist in training peers on technical and procedural workflows. Support business continuity by managing Disaster Recovery (DR) protocols and executing manual failovers when needed. Prepare and present daily, weekly, and monthly integrated service management reports summarizing infrastructure health and operations. Required Skills & Experience: Bachelor's degree in Computer Science, Information Technology, or a related field. Strong communication, collaboration, and teamwork skills. Experience working in technical support or cloud operations environments. Familiarity with application support, DevOps workflows, middleware, and security in cloud ecosystems. Ability to train team members on both procedural and technical topics. Preferred Qualifications: Master s degree in a relevant field is a plus. In-depth understanding of Platform-as-a-Service (PaaS) environments, high availability (HA) infrastructures, and load balancer configurations. Experience with service reporting, performance monitoring tools, and integrated ITSM frameworks. Be a part of a global innovation leader. Work on challenging and impactful projects that influence industries. Collaborate in a culture of growth, continuous learning, and mentorship. Enjoy a dynamic work environment with a strong emphasis on client success and personal development. Apply now and become part of IBM s journey to reshape the future of infrastructure and application support. Qualification : Bachelor's degree in Computer Science, Information Technology, or a related field.

Infrastructure Specialist Infrastructure specialist Cloud Cloud Infrastructure
OR

Site Reliability Developer 2/3

Oracle

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Job Description: Site Reliability Engineer - OCI Cloud Engineering Team Role: Site Reliability Engineer (SRE) Team: OCI OLTP (Online Transaction Processing) Location: Kiev Career Level: IC2 Experience: 5+ years Overview: Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you! As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure. This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement. Key Responsibilities: Cloud Service Operations & Reliability: Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment. Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services. Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions. Automation & Improvements: Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime. Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services. Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations. Security & Incident Response: Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards. Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution. Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance. Collaborative Problem-Solving: Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services. Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders. Documentation & Knowledge Sharing: Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals. Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations. Continuous Learning: Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work. Technical and Professional Requirements: Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments. Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations. Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes). Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems. Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations. Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction. Preferred Qualifications: Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability. Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools. Experience in networking and storage technologies in a cloud environment. Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services. If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!

Site Reliability Site reliability Developer Site developer
PS

Devops Engineer

Pure Storage

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Join Us in Revolutionizing the Data Storage Industry We're leading an exciting transformation in the data storage industry. By joining us, you will have the opportunity to make a significant impact, grow alongside a brilliant team, and be at the forefront of cutting-edge technology. If you re passionate about reshaping the future of tech, this is your chance to leave your mark. The Challenge: As a DevOps Engineer, you will be responsible for managing, automating, and optimizing our VMware, cloud, and on-prem Kubernetes infrastructure. Your role will be critical in ensuring the high availability, performance, and scalability of our services. You will work closely with cross-functional teams to architect solutions that improve our engineering operations and align with business needs. Key Responsibilities: Infrastructure Design & Management: Design, deploy, and maintain scalable VMware, Cloud (AWS, GCP, Azure, IBM), and Kubernetes environments. Custom Solutions Development: Develop and implement tailored OpenStack/VMware/Kubernetes solutions based on organizational needs. Automation & CI/CD: Build automation tools and frameworks (CI/CD pipelines) to streamline infrastructure operations and manage workloads (VMs/Pods) lifecycles. Observability & Best Practices: Implement observability best practices for OpenStack/VMware/Kubernetes environments, ensuring reliability and performance. Continuous Improvement: Provide expertise in DevOps methodologies to support continuous improvement and optimize infrastructure efficiency. Troubleshooting & Support: Assist with deployment, configuration, troubleshooting, and provide infrastructure support for AWS/Azure environments. Documentation & Knowledge Sharing: Create and maintain documentation for OpenStack/VMware/Kubernetes architecture and stay current with emerging technologies and industry best practices. On-Call Support: Participate in the follow-the-sun on-call rotation for infrastructure support. What You ll Need to Bring: Education: Bachelor s degree in Computer Science, Information Technology, or a related field. Experience: 5+ years of experience managing and automating large-scale VMware, OpenStack, or Kubernetes environments. Programming Skills: Proficiency in Python, Java, or Go for scripting and automation. DevOps Knowledge: Strong understanding of DevOps principles with hands-on experience using tools like Ansible, Terraform, or Puppet. Observability Expertise: Experience implementing observability solutions with tools like Prometheus, Grafana, Logstash, Elastic, Fluent-bit, or Fluentd. Linux & Networking: Familiarity with Linux, TCP/IP, DNS, DHCP, and related networking concepts. OpenStack/Vmware/Kubernetes: Proven experience working with OpenStack (e.g., OpenStack Yoga), VMware, or Kubernetes (particularly enterprise Kubernetes clusters on bare metal). Virtualization & Containerization: Expertise in virtualization technologies (KVM, VMware, KubeVirt) and container technologies (Docker, Kubernetes). Cloud Experience: Experience with infrastructure support and automation for AWS, Azure, or GCP. Problem-Solving Skills: Excellent troubleshooting abilities with great attention to detail. Communication & Collaboration: Strong interpersonal skills and the ability to collaborate across teams. Preferred Qualifications: Experience with PureStorage products such as FlashArray, FlashBlade, or Portworx. Certifications in Kubernetes (CKA, CKAD), VMware (VCP), or OpenStack (COA). Experience running production VMs in KubeVirt. Familiarity with agile environments and project management tools. What You Can Expect from Us: Pure Innovation: We celebrate critical thinkers, challenges, and trailblazers who strive to innovate. Pure Growth: We support your growth and provide the space for you to contribute meaningfully. We're proud to be named in Fortune's Best Workplaces and certified as a Great Place to Work. Pure Team: We focus on collaboration and teamwork, setting aside ego for the greater good. Qualification : Bachelors degree in Computer Science, Information Technology, or related field.

DevOps Engineer Devops engineer Full-Time Continuous Integration (CI)
BY

Senior DevOps / Site Reliability Engineer

Blue Yonder

10-13 Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Job Title: Senior DevOps / Site Reliability Engineer Location: Pune, India Company: Blue Yonder Experience: 10 to 13 years Education: Bachelor s Degree in Computer Science, Engineering, or related STEM fields Company Overview Blue Yonder is a leading AI-driven Global Supply Chain Solutions provider and consistently recognized as one of Glassdoor s Best Places to Work. We are driving the next wave of digital transformation in manufacturing and retail, delivering innovative SaaS solutions that power intelligent supply chains across the globe. We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to lead the design, development, deployment, and operational management of our Azure SaaS solution. This role requires strong DevOps, cloud delivery, and infrastructure automation expertise, along with leadership capabilities to guide a growing global team. Role Overview In this role, you will be responsible for architecting, planning, and executing end-to-end delivery pipelines, supporting both product development and operational stability. Working closely with platform, product, and architecture teams, you will implement best-in-class DevOps and SRE practices, ensuring scalability, resilience, and cost optimization. Key Responsibilities Architect, design, and manage CI/CD pipelines and infrastructure for a cloud-native, multi-tenant SaaS solution on Azure. Lead sprint planning, backlog grooming, and architecture discussions. Develop quality automation scripts and tools to reduce manual efforts and enable self-healing, self-service capabilities. Identify and resolve operational bottlenecks and proactively improve observability (monitoring, alerting, logging). Participate in code reviews, ensure secure and scalable designs, and mentor junior and mid-level engineers. Collaborate with stakeholders to understand business and technical requirements and translate them into actionable user stories. Implement and enforce cloud cost optimization strategies. Conduct post-incident reviews with a blameless culture to identify root causes and drive continuous improvements. Automate service requests and standard operational procedures. Drive improvements to the team s continuous integration pipeline, ensuring rapid and reliable deployments. Stay updated with the latest DevOps, SRE, and cloud technologies and bring innovative ideas to the table. Participate in team hiring and actively contribute to onboarding new team members. Technical Environment Languages: Java, Python, PowerShell, Shell Scripting DevOps Tools: Azure DevOps, GitHub Actions, Jenkins Cloud: Microsoft Azure (ARM Templates, AKS, Event Hub, HDInsight, Azure AD, Application Gateway, Virtual Networks) Architecture: Microservices, Kubernetes, Docker, Event-driven architecture Frameworks: Spring Boot, Hibernate Monitoring & Logging: Elasticsearch, Spark, Kafka Databases: RDBMS, NoSQL Version Control: Git Required Skills & Experience Bachelor s Degree (STEM preferred) with 10 to 13 years of experience in DevOps, Cloud Delivery, or Site Reliability Engineering. Proven hands-on experience with Azure Cloud Services. Expertise in setting up and optimizing CI/CD pipelines. Strong scripting experience: Shell and PowerShell are mandatory; Python is a plus. Strong understanding of container technologies (Docker, Kubernetes) and microservices architecture. Experience integrating and managing third-party monitoring and logging tools. Strong problem-solving skills and ability to work with global, cross-functional teams. Excellent communication and stakeholder management skills. Nice to Have Development experience in Java or Python. Experience working in agile teams with a product-centric mindset. Experience working in manufacturing or retail domains. Exposure to AI/ML-driven monitoring and observability tools. Work with cutting-edge technologies on globally impactful solutions. Collaborate with diverse and talented teams across the US, India, and the UK. Foster your career growth through mentorship, continuous learning, and leadership opportunities. Experience an inclusive, flexible work culture where innovation and creativity thrive. Diversity, Inclusion, Value & Equality (DIVE) At Blue Yonder, we are committed to building an inclusive environment where everyone feels empowered to be themselves. All qualified applicants will receive consideration for employment regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. Qualification : Bachelors Degree in Computer Science, Engineering, or related STEM fields

Software Engineer Staff Engineer Software Engineer Staff software engineer
CO

Senior Site Reliability Engineer

Couchbase

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Job Title: Site Reliability Engineer (SRE) Cloud Platform & Production Pipeline Initiatives Location: Bangalore, India (Office-based role) About Couchbase: As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI-driven world. By uniting transactional, analytical, mobile, and AI workloads into a seamless, fully managed solution, Couchbase empowers developers and enterprises to build and scale applications with unmatched flexibility, performance, and cost-efficiency from cloud to edge. Trusted by over 30% of the Fortune 100, Couchbase is unlocking innovation, accelerating AI transformation, and redefining customer experiences. Come join our mission! Job Overview: As a Site Reliability Engineer (SRE), you will play a pivotal role in managing, optimizing, and maintaining Couchbase s cloud infrastructure for Capella, our Database as a Service (DBaaS) platform. You will be responsible for ensuring the reliability and performance of our cloud service while collaborating closely with engineering teams to improve deployment pipelines, security practices, and overall system health. You will work across cloud platforms and multiple tools to provide guidance, mentorship, and contribute to the strategic direction of cloud operations. Responsibilities: Infrastructure Management: Manage, monitor, and maintain the infrastructure for Capella to ensure reliable operations. Security & Compliance: Implement and manage cloud environments in accordance with company security guidelines, including vulnerability management, penetration testing, and compliance requirements (SOC 2, PCI-DSS, GDPR, HIPAA, etc.). CI/CD & Release Pipeline: Collaborate with engineering teams to optimize CI/CD processes, aiming for a highly resilient deployment strategy, ideally with zero downtime. Cloud Optimization: Stay up-to-date with new technologies and industry trends to continuously improve cloud platform architecture and meet the evolving needs of the business. Security Integration: Work with development teams to integrate security scanners within the DevOps lifecycle, enhancing security posture. Leadership & Mentorship: Provide guidance on architecture, code reviews, and technical feedback to improve service reliability, security, cost, and performance. Incident Management: Demonstrate exceptional problem-solving skills, proactively identifying and addressing potential issues before they affect business operations. Collaboration: Partner with development teams, application owners, and stakeholders to integrate best practices and ensure seamless service delivery. Requirements: Experience: 5+ years in Site Reliability Engineering (SRE), DevSecOps, or similar roles, with significant experience working in public cloud environments. Programming & Scripting: Proficiency in languages such as Go, Python, Java, or Ruby. Linux Expertise: High proficiency with Linux operating systems. Kubernetes Management: Experience in managing and maintaining Kubernetes clusters (both self-managed and managed platforms like AWS EKS). Security & Vulnerability Management: In-depth knowledge of security tools and practices (vulnerability management, pen testing, SCA, DAST, SAST), with hands-on experience using tools like Sysdig, Synk, and Blackduck. Cloud Platforms & Tools: Strong experience with cloud platforms (AWS, GCP, Azure) and open-source tools like Artifactory, Jira, Jenkins, Grafana, Prometheus, Datadog, Thanos, etc. Configuration Management: Proficiency with Terraform, Git, and CI/CD platforms (e.g., CircleCI, GitHub, Spinnaker). Networking Security: Solid understanding of TCP/IP, DNS, HTTP, Firewalls, VPNs, and other networking security concepts. Preferred Skills: Availability & Reliability: Knowledge of SLO/SLA, availability, reliability, and performance concepts. Incident Management: Experience with on-call rotations and incident management. Database Experience: Familiarity with databases, particularly Couchbase. Security Certifications: Relevant certifications in security or cloud technologies are a plus. Couchbase reimagines database technology to deliver a fast, flexible, and affordable cloud database platform, empowering developers to build applications with exceptional customer experiences. Trusted by over 30% of the Fortune 100, Couchbase drives innovation and customer success through its Capella platform. Benefits at Couchbase: Generous Time Off Program: Flexibility to care for yourself and your family. Wellness Benefits: Access to world-class medical plans, dental, vision, life insurance, and employee assistance programs. Financial Planning: RSU equity program, ESPP, retirement planning, and business travel insurance. Career Growth: Focused on your career development and success. Fun Perks: Ergonomic and comfortable office setup, food & snacks for in-office employees, and more!

Senior Site Reliability Site reliability Engineer
MT

Devops

Mirafra Technologies

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

DevOps Engineer Location: Bangalore Experience: 5+ Years Education Qualification: B.E. in Computer Science / Electronics About Mirafra Founded in 2004, Mirafra is a fast-growing global product engineering services company specializing in Semiconductor Design, Embedded Systems, Digital Solutions, and Application Software. With over 1,500+ professionals worldwide, we provide cutting-edge solutions to Fortune 500 clients across industries such as Semiconductor, Internet, Aerospace, Networking, Telecom, Medical Devices, and Consumer Electronics. Recognitions: Best Company to Work For SiliconIndia (2016) Most Promising Design Services Provider SiliconIndia (2018) Top 10 Admired Companies for Software Services DigiTech Insight (2022) Key Responsibilities DevOps & Automation Develop automated CI/CD pipelines and manage build & deployment processes. Implement infrastructure automation using scripting (Shell, Batch, Python). Manage configuration, integration, and deployment using DevOps tools. Version Control & Build Management Work with Git, Gitlab, Bitbucket for version control. Maintain build systems like Make, CMake and manage dependencies using Pip, Conda, Poetry, Maven. Handle binary management tools like Artifactory, Nexus. Code Quality & Security Utilize Static Code Analysis tools (SonarQube, Pylint, Coverity) for code quality enforcement. Monitor and ensure security compliance in the DevOps lifecycle. Cloud & Containerization Manage cloud-based deployments and monitoring using ELK, Docker, Kubernetes. Implement scalable and resilient infrastructure solutions. Agile & Collaboration Work in an Agile/Scrum environment, collaborating with cross-functional teams. Utilize UML modeling and software development best practices. Skills & Qualifications Education: B.E. in Computer Science / Electronics Technical Expertise: Scripting & Automation: Shell, Batch, Python CI/CD & Build Tools: Jenkins, Gitlab, Make, CMake Version Control: Git, Bitbucket, Gitlab SCM Static Code Analysis: SonarQube, Pylint, Coverity Package Management: Pip, Conda, Poetry, Maven Binary Management: Artifactory, Nexus Cloud & Containerization: Docker, Kubernetes, ELK Stack Programming Languages: Python, C, C++ Operating Systems: Linux, Unix, Windows Soft Skills: Strong problem-solving and analytical skills. Excellent communication and team collaboration. Ability to work in fast-paced Agile environments. Cutting-edge projects in Semiconductor, Aerospace, Networking, and IoT. Global work environment with top-tier clients. Career growth opportunities and exposure to the latest technologies. Award-winning workplace culture and industry recognition. Excited to take on a challenging DevOps role? Apply now!

DevOps Full-Time CI/CD (Continuous Integration & Continuous Deployment) Infrastructure as Code (IaC) Automation
GC

Software Engineer Iii, Infrastructure, Core

Google Careers

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Job Title: Software Engineer About the Role: At Google, our Software Engineers are at the forefront of innovation, designing and developing cutting-edge technologies that shape how billions of users connect, explore, and interact with information. Our products operate at an immense scale, extending far beyond web search, and require engineers who bring fresh perspectives from diverse technical domains, including information retrieval, distributed computing, large-scale system design, networking, security, artificial intelligence, natural language processing, UI design, and mobile development. As a Software Engineer, you will contribute to mission-critical projects, collaborating with teams across Google to develop, test, deploy, maintain, and enhance software solutions. Your versatility, leadership abilities, and enthusiasm for solving complex challenges will be crucial as you navigate projects across the full technology stack. The Core Team serves as the backbone of Google s technical infrastructure, building the foundational elements behind our flagship products. This team is responsible for developing essential developer platforms, product components, and infrastructure that drive innovation across Google s ecosystem. As a member of this team, you will play a pivotal role in breaking down technical barriers, optimizing existing systems, and making key architectural decisions that influence the entire organization. Key Responsibilities: Design, develop, and maintain high-quality software solutions that support Google's technical infrastructure and products. Participate in and lead design reviews with peers and stakeholders, evaluating available technologies to determine optimal solutions. Conduct thorough code reviews to ensure adherence to best practices, including code quality, efficiency, accuracy, testability, and compliance with style guidelines. Contribute to documentation and educational resources, updating content based on product enhancements and user feedback. Troubleshoot and debug complex system issues, analyzing their impact on hardware, networks, and service operations to maintain optimal performance and reliability. At Google, we foster a culture of continuous learning, innovation, and technical excellence. If you're passionate about solving challenging problems and building world-class technology, we invite you to be part of our journey. Qualification : Bachelors degree or equivalent practical experience.

Software Engineer Software Engineer Engineer software Engineer iii
QU

Autoit Solutioning Engineer, Lead

Qualcomm

Fresher | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Job Title: Site Reliability Engineer (SRE) General Summary: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. This role is critical in ensuring the stability, scalability, and security of our infrastructure and services. As an SRE, you will work collaboratively with software engineers, data scientists, and product managers to optimize system reliability while driving automation and continuous improvement. You will be responsible for modernizing traditional services, implementing cutting-edge technology, and proactively managing infrastructure to maintain operational excellence. If you are passionate about automation, DevSecOps, system performance, and infrastructure resilience, this role offers an exciting opportunity to make a meaningful impact. Key Responsibilities: System Monitoring & Incident Response: Continuously monitor system health, detect anomalies, and respond to incidents promptly. Investigate and troubleshoot service-related issues, ensuring minimal disruption. Implement proactive measures to prevent downtime and optimize system stability. Infrastructure Automation & DevOps Implementation: Develop and maintain Infrastructure-as-Code (IaC) scripts to automate deployments and scaling. Automate routine operational tasks to improve efficiency and reduce manual intervention. Leverage DevSecOps practices to ensure secure and resilient deployments. Performance Optimization & Capacity Planning: Collaborate with development teams to enhance software performance and system responsiveness. Identify and resolve system bottlenecks to improve speed, efficiency, and reliability. Forecast resource requirements based on traffic patterns and business growth. Security, Compliance & Risk Management: Implement security best practices and compliance measures across all infrastructure layers. Conduct security audits and ensure systems meet industry-standard security guidelines. Proactively assess and mitigate risks associated with infrastructure and deployments. Required Qualifications & Skills: Technical Expertise: Extensive experience with Linux-based environments (Ubuntu, RedHat), including system administration and troubleshooting. Strong proficiency in scripting and automation using Python, Bash, or Go. Experience with containerization and orchestration technologies such as Docker and Kubernetes. Familiarity with CI/CD pipelines and tools like Jenkins, Puppet, Vault, and Splunk. Hands-on experience with cloud platforms (AWS, Azure, or GCP). Problem-Solving & Leadership: Strong analytical skills with the ability to diagnose and resolve complex system issues. Self-driven, highly motivated, and able to work independently in a fast-paced environment. Ability to collaborate cross-functionally and communicate technical solutions effectively. Security & Reliability Focus: Solid understanding of DevSecOps principles and secure system design. Ability to implement monitoring, logging, and alerting solutions to maintain system resilience. Passion for continuous learning and leveraging data-driven approaches for system improvement. Work in a high-impact role that directly contributes to the reliability and scalability of mission-critical systems. Be part of an innovative, forward-thinking team that values automation, collaboration, and continuous improvement. Competitive salary, professional development opportunities, and an environment that fosters growth and innovation. If you are a passionate, results-driven SRE, we invite you to join us and play a pivotal role in shaping the future of our infrastructure.

Engineer Lead Engineer lead Lead Engineer Full-Time
DA

Sr. Noc Engineer

Databricks

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

We re growing fast and attracting the best talent in the world. Bricksters as we call ourselves are a special mix of smart, curious, quick thinkers. If you ask a Brickster what they love about working here, you ll likely hear about our culture. We are seeking an experienced NOC Engineer to join our team. The successful candidate will be responsible for monitoring critical Databricks infrastructure and developing monitoring tools and alerting dashboards. They will also work closely with stakeholders to investigate and resolve incidents, perform root cause analysis, and propose solutions to increase the reliability and stability of the Databricks unified analytics platform. The impact you will have here: Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents. Investigate incidents and propose solutions to improve platform reliability and stability. Perform root cause analysis for recurring incidents and provide proactive solutions. Develop toolings or automate processes to improve platform monitoring and alerting. Contribute to software development efforts to improve overall service reliability and stability. Communicate effectively with internal stakeholders, including executive staff, to provide incident analysis. Participate in war rooms and temporary communication channels during outages. Demonstrate cross-functional leadership and establish ownership of incidents and outages. Multitask on several incidents and/or projects Minimum of 5 years of experience as a NOC, SRE, or DevOps engineer Strong knowledge of cloud technologies such as Azure, AWS, and GCP Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, Grafana, Pager Duty, etc. Experience with containers and orchestration technologies such as Docker and Kubernetes. Proficiency in automation and scripting Linux systems administration skills. Excellent communication skills. Willingness to learn Databricks products Bachelor's degree in Computer Science or a related field About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide including Comcast, Cond Nast, Grammarly, and over 50% of the Fortune 500 rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark , Delta Lake and MLflow. To learn more, follow Databricks on Twitter,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visithttps://www.mybenefitsnow.com/databricks. Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone. Qualification : Bachelor's degree in Computer Science or a related field is required.

Sr. Noc Engineer Sr. engineer Noc engineer

1 - 20 of 0 jobs

* No exact matches found. Showing closest results instead
Sort by:

No results found

Modify search criteria or create an alert to get relevant jobs as soon as they’re posted

Create an alert

Continue to Save

Please login to your jobseeker account, or create a new one to save this job.

Feedback

Share Feedback