ADS Reliability Engineer SRE Jobs in Bengaluru
63 Jobs Found
Site Reliability Engineer
Groww
Position: Site Reliability Engineer Location: Bengaluru About Groww At Groww, we re on a mission to make financial services simple, accessible, and transparent for every Indian. As one of India s fastest-growing financial platforms, we help millions take control of their financial future through a wide range of products. We re a team driven by ownership, radical customer-centricity, and a deep passion for challenging the status quo. From intuitive design to robust engineering, everything we build is grounded in what our customers need. If you re excited about building systems that power the future of finance in India, we d love to hear from you. Our Vision To empower every Indian with the knowledge, tools, and confidence to make sound financial decisions. Our goal is to be the most trusted financial partner for millions across the country. Our Core Values Customer Obsession We put our users first, always. Extreme Ownership We own everything we do, end-to-end. Simplicity We keep things simple, effective, and intuitive. Long-term Thinking We focus on sustainable, impactful decisions. Transparency We believe in open communication and collaboration. Role Overview: As a Site Reliability Engineer (SRE) at Groww, you will be responsible for ensuring our systems are highly available, performant, and secure. You will work closely with engineering and infrastructure teams to improve reliability, automate deployments, and manage mission-critical services that power our platform. Key Responsibilities: Monitor and troubleshoot issues related to system performance, availability, and security. Define and maintain SLIs, SLOs, and Error Budgets to improve system reliability. Use tools like Grafana to analyze and report on metrics and trace data. Participate in the on-call rotation for 24/7 support of production systems. Collaborate with developers to ensure scalability and reliability are built into new services. Roll out security and infrastructure features proactively. Manage automated deployments, version control, and release rollouts. Perform Root Cause Analysis (RCA) for incidents and implement long-term fixes. Optimize system performance, conduct capacity planning, and create recovery strategies. Identify and automate repetitive tasks to reduce toil. Leverage CI/CD tools such as Git, Jira, Jenkins to streamline development workflows. Requirements: 4 6 years of relevant experience in SRE, DevOps, or infrastructure engineering. Bachelor's or Master's degree in Computer Science or a related field. Strong background in Linux/Unix system administration and networking. Hands-on experience with cloud platforms like GCP or AWS. Proficiency in programming languages such as Python, Java, or Go. Experience with monitoring and alerting tools: Grafana, Prometheus, New Relic, etc. Familiarity with configuration management tools. Experience with Kubernetes, Docker, and container orchestration tools is a strong plus. Excellent problem-solving, communication, and team collaboration skills. Be a part of one of India s fastest-growing fintech startups. Build and scale systems that impact millions of users daily. Work with passionate, driven teammates who are redefining financial services. A culture that encourages continuous learning, ownership, and transparency. If you're ready to help shape the future of fintech infrastructure in India, Groww is the place for you. Let s build something extraordinary together. Qualification : Bachelor's or Master's degree in Computer Science or a related field
Site Reliability Developer 2/3
Oracle
Job Description: Site Reliability Engineer - OCI Cloud Engineering Team Role: Site Reliability Engineer (SRE) Team: OCI OLTP (Online Transaction Processing) Location: Kiev Career Level: IC2 Experience: 5+ years Overview: Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you! As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure. This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement. Key Responsibilities: Cloud Service Operations & Reliability: Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment. Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services. Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions. Automation & Improvements: Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime. Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services. Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations. Security & Incident Response: Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards. Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution. Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance. Collaborative Problem-Solving: Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services. Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders. Documentation & Knowledge Sharing: Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals. Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations. Continuous Learning: Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work. Technical and Professional Requirements: Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments. Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations. Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes). Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems. Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations. Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction. Preferred Qualifications: Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability. Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools. Experience in networking and storage technologies in a cloud environment. Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services. If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!
Senior DevOps / Site Reliability Engineer
Blue Yonder
Job Title: Senior DevOps / Site Reliability Engineer Location: Pune, India Company: Blue Yonder Experience: 10 to 13 years Education: Bachelor s Degree in Computer Science, Engineering, or related STEM fields Company Overview Blue Yonder is a leading AI-driven Global Supply Chain Solutions provider and consistently recognized as one of Glassdoor s Best Places to Work. We are driving the next wave of digital transformation in manufacturing and retail, delivering innovative SaaS solutions that power intelligent supply chains across the globe. We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to lead the design, development, deployment, and operational management of our Azure SaaS solution. This role requires strong DevOps, cloud delivery, and infrastructure automation expertise, along with leadership capabilities to guide a growing global team. Role Overview In this role, you will be responsible for architecting, planning, and executing end-to-end delivery pipelines, supporting both product development and operational stability. Working closely with platform, product, and architecture teams, you will implement best-in-class DevOps and SRE practices, ensuring scalability, resilience, and cost optimization. Key Responsibilities Architect, design, and manage CI/CD pipelines and infrastructure for a cloud-native, multi-tenant SaaS solution on Azure. Lead sprint planning, backlog grooming, and architecture discussions. Develop quality automation scripts and tools to reduce manual efforts and enable self-healing, self-service capabilities. Identify and resolve operational bottlenecks and proactively improve observability (monitoring, alerting, logging). Participate in code reviews, ensure secure and scalable designs, and mentor junior and mid-level engineers. Collaborate with stakeholders to understand business and technical requirements and translate them into actionable user stories. Implement and enforce cloud cost optimization strategies. Conduct post-incident reviews with a blameless culture to identify root causes and drive continuous improvements. Automate service requests and standard operational procedures. Drive improvements to the team s continuous integration pipeline, ensuring rapid and reliable deployments. Stay updated with the latest DevOps, SRE, and cloud technologies and bring innovative ideas to the table. Participate in team hiring and actively contribute to onboarding new team members. Technical Environment Languages: Java, Python, PowerShell, Shell Scripting DevOps Tools: Azure DevOps, GitHub Actions, Jenkins Cloud: Microsoft Azure (ARM Templates, AKS, Event Hub, HDInsight, Azure AD, Application Gateway, Virtual Networks) Architecture: Microservices, Kubernetes, Docker, Event-driven architecture Frameworks: Spring Boot, Hibernate Monitoring & Logging: Elasticsearch, Spark, Kafka Databases: RDBMS, NoSQL Version Control: Git Required Skills & Experience Bachelor s Degree (STEM preferred) with 10 to 13 years of experience in DevOps, Cloud Delivery, or Site Reliability Engineering. Proven hands-on experience with Azure Cloud Services. Expertise in setting up and optimizing CI/CD pipelines. Strong scripting experience: Shell and PowerShell are mandatory; Python is a plus. Strong understanding of container technologies (Docker, Kubernetes) and microservices architecture. Experience integrating and managing third-party monitoring and logging tools. Strong problem-solving skills and ability to work with global, cross-functional teams. Excellent communication and stakeholder management skills. Nice to Have Development experience in Java or Python. Experience working in agile teams with a product-centric mindset. Experience working in manufacturing or retail domains. Exposure to AI/ML-driven monitoring and observability tools. Work with cutting-edge technologies on globally impactful solutions. Collaborate with diverse and talented teams across the US, India, and the UK. Foster your career growth through mentorship, continuous learning, and leadership opportunities. Experience an inclusive, flexible work culture where innovation and creativity thrive. Diversity, Inclusion, Value & Equality (DIVE) At Blue Yonder, we are committed to building an inclusive environment where everyone feels empowered to be themselves. All qualified applicants will receive consideration for employment regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. Qualification : Bachelors Degree in Computer Science, Engineering, or related STEM fields
Senior Software Engineer, Google Ads
Google Careers
About the Job Google s software engineers develop next-generation technologies that transform how billions of users connect, explore, and interact with information. Our products operate at massive scale and extend well beyond web search. We seek engineers who bring fresh ideas from diverse areas, including: Information retrieval Distributed computing Large-scale system design Networking and data storage Security Artificial intelligence Natural language processing UI design and mobile As a Software Engineer at Google, you ll work on a specific project critical to Google s needs, with opportunities to switch teams and projects as you and our fast-paced business evolve. We need engineers who are versatile, show leadership qualities, and are enthusiastic about solving new problems across the full stack as we continue to push technology forward. With your technical expertise, you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions. About Google Ads Google Ads powers the open internet with cutting-edge technology that creates value for users, publishers, advertisers, and Google. Our teams build Google s advertising products across search, display, shopping, travel, and video advertising, as well as analytics. We create trusted experiences between people and businesses with useful ads that help grow companies from small businesses to large brands and YouTube creators. Responsibilities Write and test product or system development code. Participate in, or lead design reviews with peers and stakeholders to evaluate technologies and approaches. Review code developed by other developers, providing feedback to ensure best practices (e.g., style guidelines, code accuracy, testability, and efficiency). Contribute to and adapt documentation and educational content based on product updates and user feedback. Triage and debug product or system issues by analyzing sources of problems and their impact on hardware, networks, or service operations. Minimum Qualifications Bachelor s degree or equivalent practical experience. 5 years of experience in software development in one or more programming languages, with expertise in data structures and algorithms. 3 years of experience in testing, maintaining, or launching software products. 1 year of experience in software design and architecture. Preferred Qualifications Master s degree or PhD in Computer Science or a related technical field. 1 year of experience in a technical leadership role. Experience developing accessible technologies. Qualification : Bachelors degree or equivalent practical experience.
Senior Staff Software Engineer, Google Cloud
Google Careers
About the Job Google's software engineers develop next-generation technologies that transform how billions of users connect, explore, and interact with information and one another. Beyond web search, our products must manage information at a massive scale, leveraging expertise in fields such as distributed computing, large-scale system design, networking, data storage, artificial intelligence, natural language processing, UI design, and mobile. As a Software Engineer, you'll work on mission-critical projects, with opportunities to switch teams and projects as you and our fast-paced business evolve. We seek engineers who are versatile, demonstrate leadership qualities, and are enthusiastic about solving new challenges across the full-stack as we continue pushing technology forward. With your technical expertise, you will manage project priorities, deadlines, and deliverables while designing, developing, testing, deploying, maintaining, and enhancing large-scale software solutions. Google Cloud helps organizations digitally transform with cutting-edge infrastructure, platforms, and solutions, all while operating on the cleanest cloud in the industry. Trusted by customers in more than 200 countries and territories, Google Cloud is a partner that enables growth and solves critical business challenges. Responsibilities Provide technical leadership on high-impact projects. Influence and mentor a distributed team of engineers. Facilitate alignment across teams, ensuring clarity on goals, outcomes, and timelines. Manage project priorities, deadlines, and deliverables. Design, develop, test, deploy, maintain, and enhance large-scale software solutions. Minimum Qualifications Bachelor s degree or equivalent practical experience. 8 years of experience in software development. 5 years of experience in software design and architecture, including testing and launching software products. Preferred Qualifications Master s degree or PhD in Engineering, Computer Science, or a related technical field. 8 years of experience with data structures and algorithms. 5 years of experience in a technical leadership role, leading project teams and setting technical direction. 3 years of experience working in complex, matrixed organizations on cross-functional or cross-business projects.
Staff Software Engineer (go, Microservices, Kubernetes)
Netapp
About NetApp NetApp is the intelligent data infrastructure company, turning disruption into opportunity for every customer. We help customers unlock new business possibilities, no matter the data type, workload, or environment. At NetApp, it all starts with our people. We embrace diversity and openness because it's part of our DNA. Collaboration is at the core of what we do asking for help, partnering across teams, and driving innovation together. "At NetApp, we fully embrace and advance a diverse, inclusive global workforce that fosters belonging and high performance." George Kurian, CEO Job Summary As a Senior Software Engineer on the AI Data Platform team, you will be involved in the design and development of the AI Data Platform, built on NetApp s flagship ONTAP storage operating system the #1 Storage Operating System in the world, trusted by over 30,000 customers and managing hundreds of exabytes of data. Join us in transforming how data shapes the world. Your work will support cutting-edge technologies that enable life-saving medical analytics, improve autonomous vehicle navigation, monitor environmental hazards, and unlock new possibilities for businesses globally. An ideal candidate is results-driven, curious, creative, and collaborative, with broad experience in Big Data processing, AI/ML workflows, MLOps, Kubernetes, and distributed systems. Job Responsibilities Design, develop, and support AI Data Platform components built on NetApp ONTAP. Build and maintain microservices and REST APIs for scalable, reliable solutions. Work closely with cross-functional teams to solve complex, data-intensive problems and deliver innovative solutions. Participate in technical discussions and contribute to system design, architecture, and best practices. Support and collaborate with other engineers to ensure seamless development, testing, and deployment processes. Stay current with emerging technologies, continuously improving your skill set and applying new concepts to ongoing projects. Required Skills Programming Languages: Proficiency in Go and Python. AI/ML Experience: Familiarity with PyTorch, TensorFlow, Keras, OpenAI frameworks, LLMs (Open Source), LangChain. Cloud & Kubernetes: Hands-on experience with Linux, Kubernetes control plane, auto-scaling, orchestration, and containerization in AWS/Azure/GCP environments. Big Data Technologies: Experience with platforms like Spark, Hadoop, and distributed storage systems for large-scale data processing. NoSQL Databases: Proficiency in MongoDB, Cassandra, Cosmos DB, and DocumentDB. Microservices Architecture: Proven experience building microservices and developing REST APIs and related frameworks. Preferred Skills Experience in the storage domain or with distributed file systems, networking, or file/cloud protocols. Familiarity with MLOps practices and workflows. Proven experience leading mid- to large-sized projects and collaborating across teams. Strong understanding of computer architecture, data structures, and programming best practices. Education and Experience Bachelor s degree with 12+ years of experience, Master s degree with 12 years, or PhD with 10 years of experience. Equivalent experience is also considered. Work Environment NetApp offers a hybrid work environment to enhance connection, collaboration, and culture. In-office expectations will be discussed during the recruitment process. Equal Opportunity Employer NetApp is an Equal Employment Opportunity (EEO) employer, committed to providing a workplace free of discrimination. We do not discriminate based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability, genetic information, pregnancy, or any protected classification. A Note to Applicants Research shows that women often apply only if they meet 100% of the qualifications but no one is ever 100% qualified. If this role excites you, we encourage you to apply anyway! Qualification : Bachelors degree with 12+ years of experience, Masters degree with 12 years, or PhD with 10 years of experience. Equivalent experience is also considered.
Devops Engineer
Sarvam
DevOps Engineer Location: Bengaluru, Karnataka, India (On-Site) Department: Engineering Employment Type: Full-Time About Sarvam.ai Sarvam.ai is a cutting-edge generative AI startup headquartered in Bengaluru, India, with a mission to make generative AI accessible and impactful for Bharat. Founded by AI experts, we are dedicated to developing high-performance, cost-effective AI agents tailored for the Indian market. We enable enterprises to tap into new opportunities, build deeper customer connections, and reshape the future of AI for India and beyond. Role Overview We are looking for a DevOps Engineer to join our team and help build and manage scalable, secure, and high-performance infrastructure. In this role, you will be a key contributor to automating deployments, managing cloud infrastructure, optimizing CI/CD workflows, and ensuring system reliability. You will work with cutting-edge technologies, including cloud platforms, containerization, and infrastructure as code (IaC), to deliver impactful solutions for AI-driven products. Key Responsibilities CI/CD Pipelines: Design, implement, and manage CI/CD pipelines for seamless software deployment and integration. Cloud Infrastructure: Deploy and manage cloud infrastructure using Terraform, Kubernetes, and Docker for scalability and high performance. Automation & Scaling: Automate infrastructure provisioning, scaling, and security compliance to support high-availability environments. Monitoring & Optimization: Implement logging, monitoring, and alerting solutions using tools like Prometheus, Grafana, ELK Stack, or CloudWatch to monitor system performance and optimize resource utilization. Security & Compliance: Enhance security and compliance by managing IAM policies, encryption, and vulnerability scanning. Troubleshooting & Root Cause Analysis: Troubleshoot system failures, perform root cause analysis, and implement improvements to ensure reliability and uptime. Collaboration: Work closely with development teams to ensure smooth deployment and operation of AI models and applications. Must-Have Skills & Qualifications Educational Background: Bachelor s degree in Computer Science, Engineering, or related field (2024/2025 graduates). Cloud Expertise: Strong experience with AWS, Azure, or GCP for deploying and managing cloud-based applications. Containerization: Proficiency in Docker and Kubernetes for building and managing containerized applications. Infrastructure as Code (IaC): Experience with Terraform, Ansible, or CloudFormation to automate infrastructure management. CI/CD Pipelines: Experience in setting up automated workflows using tools like GitHub Actions, Jenkins, or GitLab CI/CD for smooth deployments. Monitoring & Logging: Experience with Prometheus, Grafana, ELK, or similar tools to implement effective monitoring and logging solutions. Networking & Security: Strong understanding of firewalls, VPNs, SSL, and cloud security best practices for secure infrastructure. Version Control: Proficiency with Git for managing code repositories and version control workflows. Problem Solving: Strong debugging, troubleshooting, and analytical skills to resolve complex system issues. Good to Have (Preferred Experience) Serverless Computing: Exposure to serverless computing models such as AWS Lambda or Azure Functions. Message Queues: Experience with message queues like Kafka, RabbitMQ, or SQS. Site Reliability Engineering (SRE): Familiarity with SRE practices to ensure the reliability and availability of large-scale systems. Open Source Contributions: Contributions to open-source projects or a strong GitHub portfolio showcasing DevOps expertise and best practices. Impactful Work: Work on AI-driven products that are reshaping the future of technology in India. Innovative Team: Collaborate with a team of AI experts and engineers pushing the boundaries of technology. Career Growth: Opportunity to grow in a fast-growing startup at the forefront of the generative AI revolution. Cutting-edge Technologies: Work with cloud technologies, automation, and AI infrastructure to create high-impact products. Qualification : Bachelors degree in Computer Science, Engineering, or related field
Senior Site Reliability Engineer
Couchbase
Job Title: Site Reliability Engineer (SRE) Cloud Platform & Production Pipeline Initiatives Location: Bangalore, India (Office-based role) About Couchbase: As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI-driven world. By uniting transactional, analytical, mobile, and AI workloads into a seamless, fully managed solution, Couchbase empowers developers and enterprises to build and scale applications with unmatched flexibility, performance, and cost-efficiency from cloud to edge. Trusted by over 30% of the Fortune 100, Couchbase is unlocking innovation, accelerating AI transformation, and redefining customer experiences. Come join our mission! Job Overview: As a Site Reliability Engineer (SRE), you will play a pivotal role in managing, optimizing, and maintaining Couchbase s cloud infrastructure for Capella, our Database as a Service (DBaaS) platform. You will be responsible for ensuring the reliability and performance of our cloud service while collaborating closely with engineering teams to improve deployment pipelines, security practices, and overall system health. You will work across cloud platforms and multiple tools to provide guidance, mentorship, and contribute to the strategic direction of cloud operations. Responsibilities: Infrastructure Management: Manage, monitor, and maintain the infrastructure for Capella to ensure reliable operations. Security & Compliance: Implement and manage cloud environments in accordance with company security guidelines, including vulnerability management, penetration testing, and compliance requirements (SOC 2, PCI-DSS, GDPR, HIPAA, etc.). CI/CD & Release Pipeline: Collaborate with engineering teams to optimize CI/CD processes, aiming for a highly resilient deployment strategy, ideally with zero downtime. Cloud Optimization: Stay up-to-date with new technologies and industry trends to continuously improve cloud platform architecture and meet the evolving needs of the business. Security Integration: Work with development teams to integrate security scanners within the DevOps lifecycle, enhancing security posture. Leadership & Mentorship: Provide guidance on architecture, code reviews, and technical feedback to improve service reliability, security, cost, and performance. Incident Management: Demonstrate exceptional problem-solving skills, proactively identifying and addressing potential issues before they affect business operations. Collaboration: Partner with development teams, application owners, and stakeholders to integrate best practices and ensure seamless service delivery. Requirements: Experience: 5+ years in Site Reliability Engineering (SRE), DevSecOps, or similar roles, with significant experience working in public cloud environments. Programming & Scripting: Proficiency in languages such as Go, Python, Java, or Ruby. Linux Expertise: High proficiency with Linux operating systems. Kubernetes Management: Experience in managing and maintaining Kubernetes clusters (both self-managed and managed platforms like AWS EKS). Security & Vulnerability Management: In-depth knowledge of security tools and practices (vulnerability management, pen testing, SCA, DAST, SAST), with hands-on experience using tools like Sysdig, Synk, and Blackduck. Cloud Platforms & Tools: Strong experience with cloud platforms (AWS, GCP, Azure) and open-source tools like Artifactory, Jira, Jenkins, Grafana, Prometheus, Datadog, Thanos, etc. Configuration Management: Proficiency with Terraform, Git, and CI/CD platforms (e.g., CircleCI, GitHub, Spinnaker). Networking Security: Solid understanding of TCP/IP, DNS, HTTP, Firewalls, VPNs, and other networking security concepts. Preferred Skills: Availability & Reliability: Knowledge of SLO/SLA, availability, reliability, and performance concepts. Incident Management: Experience with on-call rotations and incident management. Database Experience: Familiarity with databases, particularly Couchbase. Security Certifications: Relevant certifications in security or cloud technologies are a plus. Couchbase reimagines database technology to deliver a fast, flexible, and affordable cloud database platform, empowering developers to build applications with exceptional customer experiences. Trusted by over 30% of the Fortune 100, Couchbase drives innovation and customer success through its Capella platform. Benefits at Couchbase: Generous Time Off Program: Flexibility to care for yourself and your family. Wellness Benefits: Access to world-class medical plans, dental, vision, life insurance, and employee assistance programs. Financial Planning: RSU equity program, ESPP, retirement planning, and business travel insurance. Career Growth: Focused on your career development and success. Fun Perks: Ergonomic and comfortable office setup, food & snacks for in-office employees, and more!
Autoit Solutioning Engineer, Lead
Qualcomm
Job Title: Site Reliability Engineer (SRE) General Summary: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. This role is critical in ensuring the stability, scalability, and security of our infrastructure and services. As an SRE, you will work collaboratively with software engineers, data scientists, and product managers to optimize system reliability while driving automation and continuous improvement. You will be responsible for modernizing traditional services, implementing cutting-edge technology, and proactively managing infrastructure to maintain operational excellence. If you are passionate about automation, DevSecOps, system performance, and infrastructure resilience, this role offers an exciting opportunity to make a meaningful impact. Key Responsibilities: System Monitoring & Incident Response: Continuously monitor system health, detect anomalies, and respond to incidents promptly. Investigate and troubleshoot service-related issues, ensuring minimal disruption. Implement proactive measures to prevent downtime and optimize system stability. Infrastructure Automation & DevOps Implementation: Develop and maintain Infrastructure-as-Code (IaC) scripts to automate deployments and scaling. Automate routine operational tasks to improve efficiency and reduce manual intervention. Leverage DevSecOps practices to ensure secure and resilient deployments. Performance Optimization & Capacity Planning: Collaborate with development teams to enhance software performance and system responsiveness. Identify and resolve system bottlenecks to improve speed, efficiency, and reliability. Forecast resource requirements based on traffic patterns and business growth. Security, Compliance & Risk Management: Implement security best practices and compliance measures across all infrastructure layers. Conduct security audits and ensure systems meet industry-standard security guidelines. Proactively assess and mitigate risks associated with infrastructure and deployments. Required Qualifications & Skills: Technical Expertise: Extensive experience with Linux-based environments (Ubuntu, RedHat), including system administration and troubleshooting. Strong proficiency in scripting and automation using Python, Bash, or Go. Experience with containerization and orchestration technologies such as Docker and Kubernetes. Familiarity with CI/CD pipelines and tools like Jenkins, Puppet, Vault, and Splunk. Hands-on experience with cloud platforms (AWS, Azure, or GCP). Problem-Solving & Leadership: Strong analytical skills with the ability to diagnose and resolve complex system issues. Self-driven, highly motivated, and able to work independently in a fast-paced environment. Ability to collaborate cross-functionally and communicate technical solutions effectively. Security & Reliability Focus: Solid understanding of DevSecOps principles and secure system design. Ability to implement monitoring, logging, and alerting solutions to maintain system resilience. Passion for continuous learning and leveraging data-driven approaches for system improvement. Work in a high-impact role that directly contributes to the reliability and scalability of mission-critical systems. Be part of an innovative, forward-thinking team that values automation, collaboration, and continuous improvement. Competitive salary, professional development opportunities, and an environment that fosters growth and innovation. If you are a passionate, results-driven SRE, we invite you to join us and play a pivotal role in shaping the future of our infrastructure.
Sr. Noc Engineer
Databricks
We re growing fast and attracting the best talent in the world. Bricksters as we call ourselves are a special mix of smart, curious, quick thinkers. If you ask a Brickster what they love about working here, you ll likely hear about our culture. We are seeking an experienced NOC Engineer to join our team. The successful candidate will be responsible for monitoring critical Databricks infrastructure and developing monitoring tools and alerting dashboards. They will also work closely with stakeholders to investigate and resolve incidents, perform root cause analysis, and propose solutions to increase the reliability and stability of the Databricks unified analytics platform. The impact you will have here: Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents. Investigate incidents and propose solutions to improve platform reliability and stability. Perform root cause analysis for recurring incidents and provide proactive solutions. Develop toolings or automate processes to improve platform monitoring and alerting. Contribute to software development efforts to improve overall service reliability and stability. Communicate effectively with internal stakeholders, including executive staff, to provide incident analysis. Participate in war rooms and temporary communication channels during outages. Demonstrate cross-functional leadership and establish ownership of incidents and outages. Multitask on several incidents and/or projects Minimum of 5 years of experience as a NOC, SRE, or DevOps engineer Strong knowledge of cloud technologies such as Azure, AWS, and GCP Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, Grafana, Pager Duty, etc. Experience with containers and orchestration technologies such as Docker and Kubernetes. Proficiency in automation and scripting Linux systems administration skills. Excellent communication skills. Willingness to learn Databricks products Bachelor's degree in Computer Science or a related field About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide including Comcast, Cond Nast, Grammarly, and over 50% of the Fortune 500 rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark , Delta Lake and MLflow. To learn more, follow Databricks on Twitter,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visithttps://www.mybenefitsnow.com/databricks. Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone. Qualification : Bachelor's degree in Computer Science or a related field is required.
Software Principal Engineer - Sre
Boomi Software
Position: Senior Site Reliability Engineer Join us as a Senior Site Reliability Engineer on our Reliability Team and do the best work of your career while making a profound social impact. In this role, you will design and build sophisticated systems and software that align with our customers business goals and environments. You will collaborate with product management, engineering teams, customer success, and support to deliver innovative features and enhancements across Boomi s product offerings. Key Responsibilities Incident Management & SLAs: Participate in detecting, remediating, and reporting production incidents, ensuring that SLAs and SLOs are well-defined and consistently met. On-Call Rotation: Provide on-call support for planned and unplanned events. Collaboration: Partner with engineering teams to implement improvements, standardize processes, and drive consistent results. Disaster Recovery: Lead DR exercises, game days, and readiness training with SRE and engineering counterparts. Observability & Tooling: Collaborate with service engineering teams to build and automate tooling, implement best practices in observability, and ensure the scalability and reliability of Boomi s production services. Infrastructure Automation: Automate provisioning and maintenance of Boomi s infrastructure using tools like Terraform and Ansible. Technical Mentorship: Guide and mentor other engineers through design collaboration and code reviews. What You ll Bring Essential Requirements Expertise in defining, measuring, and improving reliability metrics (SLOs, SLIs, error budgets). Strong experience in observability practices (monitoring, logging, distributed tracing), preferably using Splunk and New Relic, including the ability to create custom dashboards from scratch. Proficiency in infrastructure automation using Terraform, CloudFormation, and Ansible playbooks, with scripting experience in Python. Hands-on experience conducting and automating disaster recovery (DR) exercises in AWS, validating RPOs and RTOs. Deep understanding of AWS components and the ability to design and implement APIs for internal use. Desirable Requirements 7+ years of experience in the software engineering industry, with exposure to large-scale production systems. Cloud certification (AWS, Azure, GCP, Oracle), with experience in services such as compute, containers, and databases. Experience in containerization best practices, cloud-native concepts, and security awareness in the cloud. Working at Boomi means doing what you love, surrounded by trailblazers with an entrepreneurial spirit. Our culture fosters innovation, encourages collaboration, and celebrates the unique contributions of every individual. Take the first step toward your dream career at Boomi where ideas shape the future of technology.
Senior Software Developer
Oracle India
About Oracle Cloud Infrastructure (OCI) Oracle Cloud Infrastructure (OCI) offers a scalable, secure, and high-performance cloud environment designed to meet the needs of modern enterprises. Our mission is to build and operate a suite of integrated cloud services that support the most demanding applications across the globe. OCI empowers customers to tackle some of the world's biggest technology challenges by providing reliable, high-scale distributed services. Role Overview As a Senior Software Engineer, you will play a critical role in designing, developing, troubleshooting, and debugging high-performance, scalable software solutions across databases, applications, tools, and networks. You will contribute to defining and evolving standard engineering practices, ensuring the development of robust and resilient services. This role involves working on non-routine, highly complex problems, requiring deep technical expertise and strong problem-solving skills. As a leading individual contributor and team member, you will mentor engineers, drive technical direction, and deliver impactful solutions for Oracle's cloud platform. Career Level: IC3 Key Responsibilities Design, develop, troubleshoot, and debug software applications and distributed systems. Take an active role in defining engineering best practices and evolving Oracle Cloud Infrastructure (OCI). Build highly available, scalable, and resilient cloud services to support business-critical applications. Lead the entire software development lifecycle, from concept and architecture to deployment and operations. Optimize performance and reliability of cloud services, ensuring seamless user experience. Work on service-oriented architectures (SOA) and RESTful APIs to enable cloud interoperability. Develop and maintain CI/CD pipelines, enabling automated deployments with robust testing. Conduct security reviews, risk assessments, and compliance audits (e.g., FedRAMP, PCI DSS). Collaborate with Product Managers, UX designers, and internal customers to translate business needs into scalable engineering solutions. Monitor, troubleshoot, and improve system performance, proactively identifying and addressing anomalies. What is IAM at OCI? The Identity and Access Management (IAM) team at OCI is responsible for designing and building core security services that empower customers to control access to their cloud resources. As part of Oracle's Cloud Platform Organization, the IAM team delivers enterprise-grade authentication, authorization, and access control solutions used by internal and external customers. IAM engineers work on high-scale distributed systems, handling millions of requests per second, ensuring compliance with industry security regulations, and designing resilient, multi-region architectures. Who We Are Looking For We are seeking highly skilled software engineers with expertise in distributed systems and cloud services development. The ideal candidate will: Have experience designing and deploying highly available, large-scale services in a cloud environment. Understand how to build resilient, fault-tolerant services that operate across multiple availability domains (ADs) and regions. Be a hands-on engineer, capable of driving feature development from conception to production. Be proactive in identifying performance bottlenecks and improving system scalability. Have a deep understanding of security best practices, including threat modeling and risk assessments. Thrive in a fast-paced, collaborative, and agile engineering environment. Biggest Challenges for the Team Reliability & Performance: As the business grows, we must scale services to handle exponentially increasing workloads. Scalability & Resilience: Designing and operating services that can withstand regional outages while maintaining seamless performance. Security & Compliance: Ensuring that IAM services meet stringent security requirements while remaining flexible and user-friendly. Required Qualifications 7+ years of software engineering experience, specializing in distributed systems and cloud services. Strong proficiency in Java, C++, or C# for backend development. Experience with service-oriented architectures (SOA) and RESTful web services. Hands-on experience building and operating cloud-based services. Proficiency in at least one scripting language (Python, Bash, etc.) for automation and tooling. Experience with monitoring, debugging, and optimizing distributed systems. Preferred Qualifications Experience with public cloud platforms (AWS, Azure, Oracle Cloud). Knowledge of containerization technologies (Docker, Kubernetes). Experience with CI/CD pipelines and automated testing frameworks. Expertise in multi-region architecture and high-availability systems. Familiarity with compliance standards (FedRAMP, PCI DSS). Why Join Oracle Cloud Infrastructure (OCI)? Work on cutting-edge cloud technologies that shape the future of enterprise computing. Build and operate high-impact, large-scale distributed systems. Be part of a team that values innovation, collaboration, and continuous learning. Competitive compensation, benefits, and career growth opportunities. If you are passionate about solving complex engineering challenges and building the future of cloud computing, we invite you to join Oracle Cloud Infrastructure (OCI) and make an impact.
Senior Qa Engineer
Team Vunet Systems
Senior QA Engineer - AI-Powered Observability Platform Location: Bengaluru Experience: 6 10 years About VuNet VuNet is at the forefront of Business Journey Observability, revolutionizing the financial services industry with Big Data and Machine Learning. Our deep-tech platform provides comprehensive visibility into customer journeys, enabling proactive issue resolution, operational resilience, and superior user experiences. We monitor over 28 billion digital transactions monthly, serving 300 million users globally, and we re powering some of the largest banks and financial institutions in India and MEA. VuNet is Series B funded, part of NASSCOM s DeepTech Club, and recognized by analysts like Gartner and Omdia. Your Role: Senior QA Engineer - AI-Powered Observability Platform As a Senior QA Engineer at VuNet, you ll play a crucial role in ensuring the quality and reliability of our VuSmartMaps Observability Platform. You ll lead the design and implementation of cutting-edge test automation, performance validation, and reliability frameworks across distributed systems that handle billions of telemetry events. Working closely with development, operations, and QA teams, you will drive quality across the entire platform and play a key role in ensuring that our systems are scalable, resilient, and performant. Roles & Responsibilities Quality Strategy Ownership: Own the end-to-end quality strategy for observability platform components (metrics, logs, tracing, alerting, dashboards, MLOps). Automated Testing: Build and maintain automated test suites for data pipelines, APIs, and integration flows involving tools like Prometheus, Grafana, Loki, Elastic, and OpenTelemetry. Performance Validation: Design and execute tests to validate high-throughput, distributed systems under real-world load conditions, ensuring performance benchmarks are met. Test Frameworks Development: Develop and maintain test frameworks and tools using Python, Go, Bash, pytest, k6, Playwright, and others. System Reliability & Alerting: Define and implement test coverage for system reliability, alerting accuracy, and visualization correctness. Collaboration: Partner with developers, SREs, and DevOps teams to shift quality left in the development lifecycle, contributing to CI/CD pipelines and automation workflows using GitOps tools. Automation Integration: Integrate automated test suites into smoke, functional, and regression pipelines using Jenkins, Spinnaker, and other CI/CD tools. Mentorship: Mentor junior QA engineers, establish best practices, and ensure consistency in the QA discipline across the team. What You Bring Mandatory Skills: Experience: Minimum 6+ years in software quality engineering, with a focus on automated testing, performance, and reliability. Scripting/Programming: Proficiency in at least one scripting or programming language (JavaScript, Python, Go). CI/CD Systems: Experience with CI/CD systems such as GitHub Actions, Jenkins, or ArgoCD. Debugging Skills: Excellent debugging skills and the ability to analyze code quality and system performance. Distributed Systems Knowledge: Familiarity with Kafka, Kafka Streams, ClickHouse DB, and distributed systems. Kubernetes & Microservices: Strong experience testing Kubernetes-native systems, Helm deployments, and microservices. Observability Tools: Knowledge of observability tools like Prometheus, Grafana, Elastic Stack, OpenTelemetry, Loki, or Jaeger. Tooling & Deployment: Proficiency in Jenkins, Spinnaker, GitOps, Kubernetes, and Docker. Testing Experience: Hands-on experience in various types of testing (functional, performance, load, etc.) and knowledge of testing tools. Documentation Skills: Ability to create clear documentation (e.g., release notes, troubleshooting guides, and migration guides). Nice-to-Have Skills: Performance Testing: Experience designing and executing performance and load testing for high-traffic applications. Web Services & Systems Design: Understanding of web services and distributed systems architecture. Cross-Functional Communication: Excellent communication skills with the ability to coordinate across multiple teams. Life at VuNet: At VuNet, we re building a world-class observability platform proudly Made in India and we re just getting started. Join a passionate team of problem-solvers who love tackling complex challenges and stay ahead of the curve with technologies like Gen AI. We offer an environment where collaboration, innovation, and learning are at the core of everything we do. You ll have the opportunity to work on cutting-edge technologies and make a real impact on a product that powers leading banks and financial institutions globally. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness support and 1:1 counseling. A learning culture that promotes growth, innovation, and ownership. Transparent, inclusive, and high-trust workplace culture. Exposure to Gen AI and integrated technology workspaces. Support for career development with various training programs to enhance your skills and expertise.
Devops
Mirafra Technologies
DevOps Engineer Location: Bangalore Experience: 5+ Years Education Qualification: B.E. in Computer Science / Electronics About Mirafra Founded in 2004, Mirafra is a fast-growing global product engineering services company specializing in Semiconductor Design, Embedded Systems, Digital Solutions, and Application Software. With over 1,500+ professionals worldwide, we provide cutting-edge solutions to Fortune 500 clients across industries such as Semiconductor, Internet, Aerospace, Networking, Telecom, Medical Devices, and Consumer Electronics. Recognitions: Best Company to Work For SiliconIndia (2016) Most Promising Design Services Provider SiliconIndia (2018) Top 10 Admired Companies for Software Services DigiTech Insight (2022) Key Responsibilities DevOps & Automation Develop automated CI/CD pipelines and manage build & deployment processes. Implement infrastructure automation using scripting (Shell, Batch, Python). Manage configuration, integration, and deployment using DevOps tools. Version Control & Build Management Work with Git, Gitlab, Bitbucket for version control. Maintain build systems like Make, CMake and manage dependencies using Pip, Conda, Poetry, Maven. Handle binary management tools like Artifactory, Nexus. Code Quality & Security Utilize Static Code Analysis tools (SonarQube, Pylint, Coverity) for code quality enforcement. Monitor and ensure security compliance in the DevOps lifecycle. Cloud & Containerization Manage cloud-based deployments and monitoring using ELK, Docker, Kubernetes. Implement scalable and resilient infrastructure solutions. Agile & Collaboration Work in an Agile/Scrum environment, collaborating with cross-functional teams. Utilize UML modeling and software development best practices. Skills & Qualifications Education: B.E. in Computer Science / Electronics Technical Expertise: Scripting & Automation: Shell, Batch, Python CI/CD & Build Tools: Jenkins, Gitlab, Make, CMake Version Control: Git, Bitbucket, Gitlab SCM Static Code Analysis: SonarQube, Pylint, Coverity Package Management: Pip, Conda, Poetry, Maven Binary Management: Artifactory, Nexus Cloud & Containerization: Docker, Kubernetes, ELK Stack Programming Languages: Python, C, C++ Operating Systems: Linux, Unix, Windows Soft Skills: Strong problem-solving and analytical skills. Excellent communication and team collaboration. Ability to work in fast-paced Agile environments. Cutting-edge projects in Semiconductor, Aerospace, Networking, and IoT. Global work environment with top-tier clients. Career growth opportunities and exposure to the latest technologies. Award-winning workplace culture and industry recognition. Excited to take on a challenging DevOps role? Apply now!
Software Engineer Iii, Infrastructure, Core
Google Careers
Job Title: Software Engineer About the Role: At Google, our Software Engineers are at the forefront of innovation, designing and developing cutting-edge technologies that shape how billions of users connect, explore, and interact with information. Our products operate at an immense scale, extending far beyond web search, and require engineers who bring fresh perspectives from diverse technical domains, including information retrieval, distributed computing, large-scale system design, networking, security, artificial intelligence, natural language processing, UI design, and mobile development. As a Software Engineer, you will contribute to mission-critical projects, collaborating with teams across Google to develop, test, deploy, maintain, and enhance software solutions. Your versatility, leadership abilities, and enthusiasm for solving complex challenges will be crucial as you navigate projects across the full technology stack. The Core Team serves as the backbone of Google s technical infrastructure, building the foundational elements behind our flagship products. This team is responsible for developing essential developer platforms, product components, and infrastructure that drive innovation across Google s ecosystem. As a member of this team, you will play a pivotal role in breaking down technical barriers, optimizing existing systems, and making key architectural decisions that influence the entire organization. Key Responsibilities: Design, develop, and maintain high-quality software solutions that support Google's technical infrastructure and products. Participate in and lead design reviews with peers and stakeholders, evaluating available technologies to determine optimal solutions. Conduct thorough code reviews to ensure adherence to best practices, including code quality, efficiency, accuracy, testability, and compliance with style guidelines. Contribute to documentation and educational resources, updating content based on product enhancements and user feedback. Troubleshoot and debug complex system issues, analyzing their impact on hardware, networks, and service operations to maintain optimal performance and reliability. At Google, we foster a culture of continuous learning, innovation, and technical excellence. If you're passionate about solving challenging problems and building world-class technology, we invite you to be part of our journey. Qualification : Bachelors degree or equivalent practical experience.
Senior Performance Engineer
Boomi Software
Senior Performance Engineer Are you ready to work on world changing technologies? Today, organizations need to move with increased agility and insight to grow and thrive. Boomi is one of the hottest tech companies in the SaaS/Cloud industry, named a Leader for the eighth year in a row in the Gartner Enterprise iPaaS Magic Quadrant and recently recognized by Inc. Magazine as one of the best workplaces. Our award-winning, patented technology is transforming the world of integration by making enterprise-class integration technology accessible and affordable to companies of all sizes. Boomi provides the foundation on which your business can evolve and innovate. According to a recent survey by Vanson Bourne, connected businesses are far outpacing their competitors. We help organizations connect everything and engage everywhere across any channel, device or platform. More than 7,000 organizations are using Boomi to run better, faster and smarter. Working at Boomi means doing what you love. We hire trailblazers with an entrepreneurial spirit who can solve challenging problems, make a real impact in technology and want to build something big. If you are passionate about solving hard problems, enjoy working with world-class people and developing cutting edge technology, you should explore a career with Boomi. Learn more at http://www.boomi.com/ or visit Boomi Careers. Join us as a Performance Engineer on our Performance, Scalability and Resiliency(PSR) Engineering team in Bangalore/Hyderabad, India to do the best work of your career and make a profound social impact. What you ll achieve As a Performance Engineer, you will be responsible for validating and recommending performance optimizations in Boomi s computing infrastructure and software. You will work with our Product Development and Site Reliability Engineering teams on Performance monitoring, tuning and tooling. You will: Analyze Software Architecture (monolith and micro-service) and identify potential areas of performance, scalability and resiliency improvements Identify KPIs, perform trending and analysis, identify patterns and engineer remedial solutions for a high performant, fault tolerant and resilient platform and application stack. Design, automate and perform scalability and resiliency tests using various tools like JMeter, Chaos Monkey or similar Use observability stack to improve diagnosability and trending around Performance bottlenecks Identify performance tuning opportunities and recommend remedial solutions Take the first step towards your dream career Every Boomer brings something unique to the table. Here s what we are looking for with this role: Essential Requirements Expert in performance engineering fundamentals - arrival rate, workload models, responsiveness, computing resource utilization, time complexity, scalability, resiliency etc.. Expert in monitoring the performance using native Linux OS, Application Performance Management(APM) and Infrastructure monitoring tools Experience in analyzing crash dump, thread dump, SQL slow query log and identify performance bottlenecks Expert in recommending optimal resource configurations in Cloud, Virtual Machine, Container and Container Orchestration technologies Flexibility to work in a remote and geographically distributed team environment Desirable Requirements Experience in writing data extraction and custom monitoring tools using any programming language - Java, Python, R , Bash or similar Experience in capacity planning and modelling using AI/ML, queueing models or similar approaches Performance tuning experience in Java or similar application code
Site Reliability Engineer -- Logging And Monitoring
Ibm (international Business Machines)
Introduction A career in IBM Software means you ll be part of a team that transforms our customer s challenges into solutions. Seeking new possibilities and always staying curious, we are a team dedicated to creating the world s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career. IBM s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive. Your role and responsibilities In this role, you will build and maintain an observability stack for IBM s Cloud Object Storage service using managed services as well as custom built services. This stack is used by Cloud Object Storage SREs and devs to understand the health of the service. Work duties and responsibilities include: Design, setup, configure and implement the COS Monitoring System using technologies such as Elasticsearch, Logstash, Kibana, Kafka, Kafka Mirrors, Filebeat, Grafana and Sysdig. Automate CICD tasks and infrastructure using Ansible, Terraform, Jenkins, and Travis. Experience with microservices and distributed application architecture, such as containers and Kubernetes. Experience with Linux administration and programming languages such as java, python and sql. Performance and configuration tuning to support the increasing load of data flowing into the COS Monitoring System. Provide design recommendations and thought leadership to provide best-in-class observability as part the COS Monitoring System. Provide 24x7 on-call customer support on a rotational basis. Design and develop dashboards for metrics analysis Design, Develop and Configure an alerting solution for an end-to-end incident management and recovery process by integrating Sysdig with Pagerduty, Email and Slack. Required education Bachelor's Degree Preferred education Bachelor's Degree Required technical and professional expertise Ability and tenacity to solve increasingly complex technical issues through analysis and a variety of problem-solving techniques. Working knowledge of Object-Oriented Python with demonstrable experience in applying these skills. Working knowledge of Linux environments. Experience working in an Agile-Scrum development environment. Experience using tools such as Jira, GitHub and Logging and monitoring tools BS in CS, CE or similar field, plus 2 to 5 years relevant work experience. Qualification : BS in CS, CE or similar field, plus 2 to 5 years relevant work experience.
Devops Engineer
Sap
About SAP SAP is a global leader in enterprise software, renowned for helping over 400,000 customers worldwide optimize business processes and unlock deeper insights. Originally known for its ERP software, SAP now leads in business applications across databases, analytics, intelligent technologies, and experience management. With 200 million users and over 100,000 employees, SAP fosters a collaborative and inclusive culture, emphasizing personal development and a forward-focused, purpose-driven approach. We aim to solve challenges and create innovations that matter. Meet Your Team: SAP Build Process Automation (SBPA) SBPA combines Workflow Management, Robotic Process Automation, and embedded AI capabilities to empower business users and developers to automate processes easily and quickly. With a no-code development interface, it helps streamline operations without the need for IT support and coding expertise. As part of our highly integrated global team, you'll play a key role in this state-of-the-art software development initiative. Role Overview: DevOps Engineer As a DevOps Engineer at SAP, you will contribute to the development of our SBPA product. You will work with both cloud and desktop components, automating and optimizing our build process and CI/CD pipelines. Ensuring timely releases while adhering to security, compliance, and non-regression standards will be a crucial part of your responsibilities. You will collaborate closely with developers, QA, and program management teams to ensure smooth and efficient release operations. Key Responsibilities Automating Build Process: Enhance and optimize CI/CD pipelines for cloud and desktop components, ensuring seamless operation at all times. Release Management: Facilitate the release process by working on GitHub and Jenkins, ensuring all security and compliance standards are met. Collaboration: Work closely with developers, quality engineers, and program management to ensure the timely release of features and updates. Support: Assist the global development team and local team in Bangalore, ensuring smooth integration of cloud-based systems and desktop components. What You Bring Experience: 1 6 years in DevOps or related technical fields. Technical Expertise: Experience building and maintaining Hyperspace pipelines. Knowledge of GitHub and Jenkins. Familiarity with CI/CD concepts and practices. Experience with BTP, Cloud Foundry, and cloud platforms (AWS, GCP, Azure). Familiarity with containers (Docker, Kubernetes). Programming Skills: Knowledge of JavaScript, Node.js, Groovy, and optionally C++/C# and Visual Studio. Soft Skills: Strong problem-solving ability, attention to detail. Fluent in English (verbal and written). Collaborative team player with a proactive approach to challenges. Preferred Skills Knowledge of SAP Business Technology Platform (BTP). Familiarity with cloud-based Infrastructure as a Service (IaaS) such as AWS, GCP, and Azure. Software development experience in relevant languages (JavaScript, Node.js, Groovy). Inclusive Culture: We value diversity and foster an inclusive environment where all employees can thrive, regardless of background. Work-Life Balance: SAP offers flexible working models and focuses on the health and well-being of our employees. Development Opportunities: We believe in unleashing the potential of every individual and invest in their growth and success. Equal Opportunity Employer SAP is committed to creating an inclusive and diverse work environment. We provide accessibility accommodations for applicants with disabilities and encourage all qualified individuals to apply, regardless of race, gender, age, disability, or other protected categories. Qualification : University Degree in Computer Science or related technical areas
Mobile App And Observability Sdk Engineer
Team Vunet Systems
Mobile App and Observability SDK Engineer Experience: 3 6 Years Location: Bengaluru About VuNet VuNet is a pioneer in Business Journey Observability, revolutionizing the financial services industry with Big Data and Machine Learning. Our cutting-edge platform offers end-to-end visibility into customer journeys, driving proactive issue resolution, operational resilience, and superior user satisfaction. With over 28 billion digital transactions monitored monthly touching 400 million users worldwide we re already powering leading banks and financial institutions across India and MEA. VuNet is Series B funded, part of NASSCOM DeepTech Club, and recognized globally by analysts like Gartner and Omdia. Your Role: Mobile App and Observability SDK Engineer At VuNet, the Product Development Team is dedicated to delivering exceptional customer experiences through scalable products. We are looking for a Mobile App and Observability SDK Engineer to join this team. In this role, you ll be at the forefront of building high-quality mobile applications and advancing our Mobile Real User Monitoring (MRUM) initiatives. You ll capture and translate mobile performance data into actionable insights, helping improve the performance and user experience of mobile apps across various platforms. If you re passionate about mobile engineering, user experience, and observability this role offers a unique opportunity to merge these interests into a groundbreaking solution. Roles & Responsibilities Mobile Application Development: Design, develop, and maintain robust, high-performance mobile applications for iOS and Android using Swift, Kotlin, Flutter, or React Native. Testing & Quality Assurance: Implement unit, integration, and UI testing strategies to ensure the app s quality, stability, and regression coverage. Debugging & Profiling: Identify and resolve performance bottlenecks, ANRs, crashes, and memory leaks using tools like Android Studio Profiler, Xcode Instruments, or Flipper. Crash Analysis & Reporting: Integrate crash analytics tools and develop efficient incident tracking and resolution workflows. Performance Monitoring & Insights: Leverage telemetry, profiling, and analytics data to enhance app performance, responsiveness, and overall user experience. Observability Collaboration: Work with SRE and backend teams to export performance metrics, logs, and traces from mobile clients into centralized observability platforms. Code Quality: Write clean, modular, and well-documented code, adhering to best practices in mobile development and SDK maintenance. What You Bring Mandatory Skills: Mobile App Development: 3+ years of hands-on experience in mobile app development using Flutter, React Native, Swift, or Kotlin (experience in at least two of these). App Lifecycle & Performance: Strong understanding of mobile app lifecycle, UI rendering, asynchronous processing, state management, and performance optimization (ANRs, memory management, network latency). Debugging & Profiling Tools: Proficiency in debugging, profiling, and testing mobile applications using tools like Android Studio Profiler, Xcode Instruments, or Flipper. Crash Analytics: Experience integrating and using crash analytics and reporting tools. CI/CD & SDK Versioning: Familiarity with CI/CD pipelines, automated testing, and SDK versioning. Performance Instrumentation: Interest in observability, monitoring, and performance instrumentation with a willingness to learn OpenTelemetry and RUM concepts. Problem-Solving Mindset: Strong analytical and debugging skills, focused on enhancing performance and reliability. Nice-to-Have Skills: OpenTelemetry & SDKs: Exposure to OpenTelemetry SDKs or other instrumentation frameworks for capturing telemetry data (e.g., traces, metrics, logs). Mobile Observability: Familiarity with mobile observability backends. Session Replay & Mobile Analytics: Knowledge of session replay, user behavior tracking, or mobile analytics SDKs. SRE & Monitoring Practices: Understanding of SRE principles, monitoring best practices, and golden signals. Open Source Contributions: Contributions to open-source SDKs or mobile performance tools. Life at VuNet: At VuNet, we re building a world-class observability platform proudly Made in India. We re just getting started, and we re looking for people like you to join us in tackling some of the most complex challenges in the digital world. Our team is filled with passionate problem-solvers who thrive in a collaborative, fast-paced environment. We embrace continuous learning, adapt quickly, and stay ahead of emerging technologies like Gen AI. If you re looking to work on cutting-edge technology, make a real impact, and grow with a supportive team, you ll feel right at home here at VuNet. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness and 1:1 counseling support. A learning culture that promotes growth, innovation, and ownership. A transparent, inclusive, and high-trust workplace culture. Access to Gen AI and integrated technology workspaces. Supportive career development programs to expand your skills with various training opportunities.
Lead Platform Engineer
Team Vunet Systems
Lead Platform Engineer Observability Solutions Location: Bengaluru Experience: 6 10 Years Function: Observability Engineering | Platform Architecture | SRE Enablement Join VuNet Redefining Digital Observability at Scale VuNet is transforming the future of digital experiences through Business Journey Observability, combining Big Data and AI/ML to empower real-time visibility across payments, banking, and financial services. Monitoring 28+ billion transactions/month, our platform is trusted by top financial institutions and powers over 300 million users. Backed by Series B funding and recognized by Gartner, NASSCOM, and Forbes, we are leading the charge in building a new category of observability, proudly Made in India for global impact. Your Role: Lead Platform Engineer As the Lead Platform Engineer, you will architect and drive the development of packaged observability solutions across 100+ infrastructure and application technologies. You will define **golden signals**, build **data collection strategies**, and lead the standardization of alerts, dashboards, and RCA workflows for platforms like **Kubernetes, Oracle DB, and Tomcat**. This is a cross-functional leadership role that sits at the intersection of product, platform, DevOps, and SRE. You will **lead a team** and influence how observability is delivered, scaled, and adopted across complex environments. Key Responsibilities Observability Solution Development Design and lead the delivery of observability packages for databases, middleware, cloud-native, and legacy platforms. Define and implement data collection pipelines, including agents, APIs, logs, metrics, traces, and service discovery. Establish **golden signals, SLIs/SLOs**, and health KPIs for performance, availability, and anomaly detection. Dashboards, Alerts & RCA Develop standardized, reusable dashboards, alerts, reports, and troubleshooting playbooks. Automate **RCA workflows** to improve MTTR and reduce alert fatigue. Platform Enablement & Integration Work with engineering to enhance agent capabilities and support new data sources/formats. Guide implementation of platform features for better observability at scale. Team Leadership & Governance Lead and mentor a team of observability engineers and specialists. Define design patterns, reusable modules, and version-controlled libraries. Stakeholder Collaboration Partner with product managers, DevOps, SREs, and customer teams to gather requirements, align priorities, and validate use cases. Ensure deliverables are scalable, well-documented, and production-ready. What You Bring Must-Have Skills 6 10 years of experience in observability, platform engineering, or SRE roles. Hands-on with tools like Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, Splunk. Strong understanding of logs, metrics, traces, profiling, and collection strategies. Experience developing solutions for platforms like Kubernetes, Oracle, PostgreSQL, Tomcat, etc. Proficient in Python, Shell scripting, APIs, and automation tools (**Terraform**, etc.). Familiar with alert fatigue mitigation, anomaly detection, and RCA frameworks. Excellent communication, technical leadership, and documentation skills. Nice to Have Experience managing an observability marketplace or solution catalog. Contributions to open-source observability projects. Certifications in Kubernetes, Observability platforms, or cloud providers (AWS/GCP/Azure). Background in ITSM tools, CMDBs, or incident workflow automation. At VuNet, you ll help build a category-defining observability platform that s already transforming critical infrastructure for leading financial institutions. You ll work with passionate engineers, push technical boundaries, and grow in a high-trust, high-impact environment. What You ll Experience: Ownership of key observability initiatives impacting 300M+ users. Collaboration with SRE, DevOps, and product teams across real-time financial systems. Opportunity to experiment with and shape Gen AI, ML, and emerging telemetry trends. Perks & Benefits Health insurance for you, your parents, and dependents. 1:1 mental wellness support. Training programs, certifications, and career growth opportunities. Transparent, inclusive, and high-trust work culture. Access to cutting-edge technology and Gen AI-powered workspaces.
1 - 20 of 0 jobs
* No exact matches found. Showing closest results insteadNo results found
Modify search criteria or create an alert to get relevant jobs as soon as they’re posted