Grafana Jobs in Bengaluru
55 Jobs Found
Site Reliability Engineer
Groww
Position: Site Reliability Engineer Location: Bengaluru About Groww At Groww, we re on a mission to make financial services simple, accessible, and transparent for every Indian. As one of India s fastest-growing financial platforms, we help millions take control of their financial future through a wide range of products. We re a team driven by ownership, radical customer-centricity, and a deep passion for challenging the status quo. From intuitive design to robust engineering, everything we build is grounded in what our customers need. If you re excited about building systems that power the future of finance in India, we d love to hear from you. Our Vision To empower every Indian with the knowledge, tools, and confidence to make sound financial decisions. Our goal is to be the most trusted financial partner for millions across the country. Our Core Values Customer Obsession We put our users first, always. Extreme Ownership We own everything we do, end-to-end. Simplicity We keep things simple, effective, and intuitive. Long-term Thinking We focus on sustainable, impactful decisions. Transparency We believe in open communication and collaboration. Role Overview: As a Site Reliability Engineer (SRE) at Groww, you will be responsible for ensuring our systems are highly available, performant, and secure. You will work closely with engineering and infrastructure teams to improve reliability, automate deployments, and manage mission-critical services that power our platform. Key Responsibilities: Monitor and troubleshoot issues related to system performance, availability, and security. Define and maintain SLIs, SLOs, and Error Budgets to improve system reliability. Use tools like Grafana to analyze and report on metrics and trace data. Participate in the on-call rotation for 24/7 support of production systems. Collaborate with developers to ensure scalability and reliability are built into new services. Roll out security and infrastructure features proactively. Manage automated deployments, version control, and release rollouts. Perform Root Cause Analysis (RCA) for incidents and implement long-term fixes. Optimize system performance, conduct capacity planning, and create recovery strategies. Identify and automate repetitive tasks to reduce toil. Leverage CI/CD tools such as Git, Jira, Jenkins to streamline development workflows. Requirements: 4 6 years of relevant experience in SRE, DevOps, or infrastructure engineering. Bachelor's or Master's degree in Computer Science or a related field. Strong background in Linux/Unix system administration and networking. Hands-on experience with cloud platforms like GCP or AWS. Proficiency in programming languages such as Python, Java, or Go. Experience with monitoring and alerting tools: Grafana, Prometheus, New Relic, etc. Familiarity with configuration management tools. Experience with Kubernetes, Docker, and container orchestration tools is a strong plus. Excellent problem-solving, communication, and team collaboration skills. Be a part of one of India s fastest-growing fintech startups. Build and scale systems that impact millions of users daily. Work with passionate, driven teammates who are redefining financial services. A culture that encourages continuous learning, ownership, and transparency. If you're ready to help shape the future of fintech infrastructure in India, Groww is the place for you. Let s build something extraordinary together. Qualification : Bachelor's or Master's degree in Computer Science or a related field
Technical Lead Devops
Subex Limited
Position: Technical Lead - DevOps Location: Bangalore Rural, Karnataka, India Department: Data Platform and DevOps Employment Type: Subexian Experience Required: 3 to 6 years Job Overview: We are seeking an experienced Kubernetes Administrator with a strong background in managing containerized environments. The ideal candidate will have 4+ years of hands-on experience in deploying, configuring, and optimizing Kubernetes clusters to drive scalability, reliability, and performance. This is an excellent opportunity to leverage your expertise in Kubernetes orchestration while contributing to the overall success of our platform. Key Responsibilities: Cluster Management: Deploy, configure, and manage Kubernetes clusters both on-premises and across cloud platforms such as AWS, Azure, and GCP. Security & Compliance: Implement best practices for cluster security, including role-based access control (RBAC), network policies, and data encryption at rest and in transit. Automation: Automate cluster provisioning and ongoing management using tools like Terraform, Ansible, or Helm charts, streamlining operations and reducing manual tasks by 40%. Monitoring & Performance: Continuously monitor cluster health and performance metrics using tools like Prometheus, Grafana, ensuring high availability and optimal performance. CI/CD Pipelines: Design and implement CI/CD pipelines for containerized applications using tools such as Jenkins, GitLab CI/CD, and CircleCI to enable smooth continuous delivery. Collaboration: Work closely with development teams to troubleshoot issues, optimize application performance, and ensure compatibility with Kubernetes environments. Security Audits: Conduct regular security audits to identify vulnerabilities and ensure compliance with industry standards. Documentation: Maintain clear and comprehensive documentation for deployment procedures, configuration settings, and troubleshooting guides to enhance knowledge sharing within the team. Infrastructure Management: Administer and maintain Linux/Unix servers and virtualization platforms such as VMware or KVM, ensuring seamless operations across the infrastructure. Backup & Recovery: Implement and manage robust backup and disaster recovery solutions to ensure data integrity and minimize system downtime. Technical Support: Provide expert-level technical support for server and network infrastructure-related issues. Required Skills & Qualifications: Proven experience in Kubernetes deployment, configuration, and administration. Strong command of containerization technologies, including Docker and containerd. Hands-on experience with cloud platforms such as AWS, Azure, and GCP. Proficiency in Infrastructure as Code (IAC) tools like Terraform and Ansible. Familiarity with CI/CD pipelines and automation tools like Jenkins and GitLab CI/CD. Excellent troubleshooting and problem-solving skills. Strong communication and collaboration abilities, with the capability to work effectively across cross-functional teams. If you re passionate about DevOps, Kubernetes, and driving the success of containerized environments, we d love to hear from you!
Platform Engineer
Colortokens
Platform Engineer Location: Bengaluru, Karnataka, India Full-time partially remote About ColorTokens At ColorTokens, we empower businesses to stay operational and resilient in an increasingly complex cybersecurity landscape. Breaches happen but with our cutting-edge ColorTokens Xshield platform, companies can minimize the impact of breaches by preventing the lateral spread of ransomware and advanced malware. We enable organizations to continue operating while breaches are contained, ensuring critical assets remain protected. Our innovative platform provides unparalleled visibility into traffic patterns between workloads, OT/IoT/IoMT devices, and users, allowing businesses to enforce granular micro-perimeters, swiftly isolate key assets, and respond to breaches with agility. Recognized as a Leader in the Forrester Wave : Microsegmentation Solutions (Q3 2024), ColorTokens safeguards global enterprises and delivers significant savings by preventing costly disruptions. Our culture We foster an environment that values customer focus, innovation, collaboration, mutual respect, and informed decision-making. We believe in alignment and empowerment so you can own and drive initiatives autonomously. Self-starters and high-motivated individuals will enjoy the rewarding experience of solving complex challenges that protect some of world s impactful organizations be it a children s hospital, or a city, or the defense department of an entire country. Position Overview: Colortokens is looking for a Junior Platform Administrator to assist in managing, maintaining, and optimizing our NextGen Security Information and Event Management (SIEM) platform. The ideal candidate will support the day-to-day operations, help onboard customer log sources, troubleshoot integration issues, and provide technical assistance to the security operations team. This role is ideal for a motivated professional with 3+ years of experience in SIEM administration, security operations, or log management. Key Responsibilities: SIEM Platform Administration Assist in deploying, configuring, and maintaining the NextGen SIEM platform (e.g., Stellar Cyber, Splunk, Sentinel, QRadar, Chronicle, Exabeam). Perform basic updates and patches to ensure platform security and functionality. Monitor SIEM health, performance, and uptime under the guidance of senior administrators. Log Source Management Onboard new log sources and validate data ingestion. Help troubleshoot log ingestion, parsing, and formatting issues. Maintain log retention policies for compliance. Rule and Use Case Management Support the development and deployment of detection rules, correlation use cases, and alerts. Tune existing use cases to minimize false positives. Work closely with security analysts to refine alerting strategies. Integration and Automation Assist in integrating SIEM with other security tools (e.g., EDR, microsegmentation, vulnerability scanners). Work on basic automation tasks using scripting (Python, PowerShell) to enhance SIEM efficiency. Platform Security and Compliance Support role-based access control (RBAC) and platform security policies. Help ensure SIEM adheres to compliance standards like SOC2, ISO 27001. Participate in periodic security audits. Network Debugging & Troubleshooting Have a basic understanding of TCP/IP, networking concepts, and protocols. Assist in debugging network connectivity issues related to SIEM log ingestion. Use basic network troubleshooting tools. Collaboration and Support Work alongside SOC analysts, threat hunters, and security engineers. Provide basic technical support for SIEM users. Assist in training and documentation for security teams. Performance Monitoring and Optimization Monitor storage and indexing performance to ensure optimal operations. Report any performance issues to senior administrators. Contribute to platform health reports and alerting metrics. Incident Support Assist SOC teams in log analysis, incident response, and forensic investigations. Ensure log data is readily available for security incidents. Education and Certifications: Bachelor s degree in Computer Science, Information Security, or a related field. Certifications (Preferred but not mandatory): Splunk Certified User/Admin Microsoft Certified: Security Operations Analyst Associate QRadar Certification Any SIEM-related certification Experience: 3+ years of experience in SIEM administration, security operations, or log management. Hands-on experience with at least one SIEM platform (e.g., Stellar Cyber, Splunk, Sentinel, Chronicle, Exabeam). Basic knowledge of log ingestion, rule creation, and data parsing. Exposure to scripting (Python, PowerShell) for automation. Basic understanding of TCP/IP networking concepts and network debugging. Technical Skills: Understanding of log formats, Syslog, JSON, XML, and data pipelines. Basic knowledge of querying languages (KQL, SPL, AQL). Familiarity with SIEM integration with security tools like EDR, SOAR, NDR. Awareness of MITRE ATT&CK, NIST, or CIS security frameworks. Basic experience with network troubleshooting tools (ping, traceroute, netcat (nc)). Soft Skills: Strong problem-solving and troubleshooting abilities. Good verbal and written communication skills. Ability to work collaboratively in a security operations environment. Preferred Skills: Basic understanding of cloud-based security solutions (AWS, Azure, Google Cloud). Exposure to SOAR tools (e.g., Cortex XSOAR, Splunk Phantom). Interest in machine learning-based anomaly detection for SIEM. Key Metrics for Success: Successful onboarding of log sources. Improvement in log ingestion and parsing accuracy. Contribution to fine-tuning detection rules. Timely resolution of SIEM-related support requests. Ability to identify and troubleshoot basic network connectivity issues.
Senior Associate Infrastructure L1 (AWS)
Publicis Sapient
Senior Associate Infrastructure L1 (AWS) Location: Bengaluru, India Department: Infrastructure & Cloud Engineering Employment Type: Full-Time About the Role As a Senior Associate Infrastructure L1 (AWS), you will design, implement, and manage secure, scalable, and highly available cloud infrastructure for enterprise digital transformation initiatives. You ll collaborate with cross-functional teams to automate deployments, enable DevOps best practices, and ensure robust observability across systems. Your goal is to reduce time-to-market and optimize performance, cost, and compliance. Key Responsibilities Architect and build immutable infrastructure on AWS and/or other cloud platforms. Implement and maintain infrastructure as code using Terraform, CloudFormation, or similar. Manage containerized environments using Kubernetes (EKS/GKE), ECS, Docker, and Helm. Implement service mesh (e.g., Istio) for advanced traffic management, monitoring, and security. Develop and manage CI/CD pipelines using Jenkins, GitLab, CircleCI, or similar. Automate build/deployment processes using Groovy, Go, Python, Shell, or PowerShell. Integrate DevSecOps and security scanning into the software delivery lifecycle. Configure and maintain monitoring, logging, and observability using: Monitoring: Prometheus, Grafana, Datadog, New Relic Logging: ELK Stack, Fluentd, Splunk Observability: OpenTelemetry, Jaeger, Kiali, CloudTrail, Dynatrace Troubleshoot infrastructure, performance, and deployment issues. Collaborate with application teams and stakeholders to ensure high performance and availability of deployed services. Required Skills & Qualifications 4 to 12 years of experience in Cloud Infrastructure & DevOps roles. Bachelor's or Master s degree in Engineering, Computer Science, or related field. Hands-on experience with AWS (EC2, VPC, IAM, Lambda, RDS, CloudWatch, etc.) Solid experience in container orchestration using Kubernetes (EKS/GKE) and infrastructure management. Expert in IaC tools like Terraform (preferred), ARM templates, Pulumi, etc. Proficiency in CI/CD pipeline automation and scripting. Familiarity with cloud-native security practices and vulnerability scanning tools. Experience with DNS, Load Balancers, and high-volume application infrastructure setup. Hands-on experience with artifact repositories like Nexus or Artifactory. Preferred Certifications (Nice to Have) Associate-level certifications in AWS, Azure, or GCP HashiCorp Certified Terraform Associate Benefits Gender-neutral workplace policies 18 paid holidays per year Generous parental leave and new parent transition support Flexible work arrangements Comprehensive Employee Assistance Program (mental & physical wellness) About Publicis Sapient Publicis Sapient is a global digital transformation partner helping established organizations evolve into their future state through technology, data, consulting, and customer-first experiences. With over 20,000 employees across 53 offices, we combine deep domain knowledge with a start-up mindset and agile methods to solve complex business challenges.
It Automation Engineer
Samsara Inc
Position: IT Automation Engineer Location: Bengaluru, India (Hybrid 3 days onsite) Company: Samsara Technologies India Pvt. Ltd. About Samsara Samsara (NYSE: IOT) is a global leader in the Connected Operations Cloud, empowering organizations in physical operations such as transportation, logistics, construction, and manufacturing to unlock actionable insights from IoT data. With products that improve safety, efficiency, and sustainability, Samsara is at the forefront of digital transformation for industries that power the world. Role Overview As an IT Automation Engineer within Samsara s Business Technology Core IT team, you'll play a key role in streamlining internal IT systems and processes through automation, infrastructure-as-code, and modern DevOps practices. This position emphasizes cloud infrastructure, scripting, CI/CD, and SaaS system integration to support high-growth scalability and efficiency across Samsara's enterprise environment. This hybrid role requires 3 days per week in the Bengaluru office and 2 days remote, operating in India Standard Time (IST). Key Responsibilities Automation & Development Design and build automation scripts and services using Python, Bash, or JavaScript (Node.js). Automate repetitive IT operations across internal platforms, SaaS tools, and cloud infrastructure. Develop and deploy Infrastructure-as-Code (IaC) using Terraform or CloudFormation for AWS environments. Cloud & DevOps Engineering Manage and provision AWS services such as Lambda, EC2, S3, RDS, ECS, API Gateway, etc. Build and maintain CI/CD pipelines and implement containerized solutions using Docker. Implement observability and monitoring solutions using tools like CloudWatch and Splunk. Collaboration & Strategy Partner cross-functionally with IT, security, and business systems teams. Lead strategic automation initiatives to improve IT efficiency at scale. Write and maintain clear documentation for automated workflows and tooling. Minimum Qualifications Bachelor's degree in Computer Science, IT, or a related field. 5+ years in IT automation, DevOps, or software development roles. Strong scripting skills in Python, JavaScript (Node.js), or Go. Hands-on experience with AWS services and IaC tools (Terraform preferred). Experience with SaaS ecosystems like Google Workspace, Okta, Slack, Zoom, GitHub, Zendesk. Proficient in version control using Git/GitHub and building CI/CD pipelines. Strong communication and cross-functional collaboration skills. Preferred Qualifications Familiarity with Atlassian tools (Jira, Confluence), OpsGenie, StatusPage. Experience with Splunk and monitoring large-scale cloud systems. Exposure to Google Cloud Platform (GCP). Experience leading end-to-end internal application development projects. Qualification : Bachelor's degree in Computer Science, IT, or a related field
Performance Engineer
Cognite
Performance Engineer Location: Bengaluru (Whitefield) Team: Product Engineering Employment: Full-Time | Hybrid About Cognite Cognite is a global SaaS leader driving industrial digital transformation through AI and data. Our flagship products include Cognite Atlas AI and Cognite Data Fusion (CDF), empowering industries such as Oil & Gas, Chemicals, Pharma, and Manufacturing to harness data at scale. Recognized with multiple industry awards, including 2022 Technology Innovation Leader and 2024 Microsoft Energy & Resources Partner of the Year, we lead the way in innovative industrial solutions. Our Values Impact: Deliver meaningful outcomes with focus and purpose. Ownership: Take initiative, embrace responsibility, and collaborate inclusively. Relentless: Innovate persistently, learn from challenges, and improve continuously. Role & Responsibilities Design, develop, and execute performance and load tests to ensure system scalability, stability, and reliability of Cognite SaaS products. Identify performance bottlenecks and provide actionable insights for improvement. Build and maintain testing frameworks, scripts, and tools to support performance testing initiatives. Collaborate closely with engineering teams to align testing strategies with system architecture. Monitor production system performance and assist in root cause analysis of performance issues. Share performance optimization best practices via documentation, training, and team discussions. Qualifications Bachelor s or Master s degree in Computer Science, IT, or related fields. 3-5 years of experience in performance testing and engineering, preferably in SaaS environments. Proficiency with performance testing tools such as JMeter, Gatling, LoadRunner, BlazeMeter, or equivalents. Strong understanding of CI/CD pipelines and container technologies like Kubernetes and Docker. Solid programming skills in Java, Python, or similar languages. Experience with databases like PostgreSQL. Familiarity with performance monitoring and analysis tools such as Grafana and Prometheus. Preferred Skills Agile methodology experience and working in globally distributed teams. Expertise testing large-scale systems and handling high-volume data loads. Knowledge of React and JSON for test data creation and API performance testing. Diverse global community with 70+ nationalities and strong DEI focus. Modern, vibrant office in Whitefield, Bengaluru with hybrid work culture. Flat organizational structure with direct access to leadership and minimal bureaucracy. Collaborate with world-class talent on ambitious and impactful industrial tech projects. Engage with the wider Cognite community through HUB conversations and partnerships. Make an Impact Join Cognite to help build scalable, high-performing SaaS solutions that empower industrial enterprises globally. We welcome candidates from all backgrounds to apply. Qualification : Bachelors or Masters degree in Computer Science, IT, or related fields.
Devops Engineer
Sarvam
DevOps Engineer Location: Bengaluru, Karnataka, India (On-Site) Department: Engineering Employment Type: Full-Time About Sarvam.ai Sarvam.ai is a cutting-edge generative AI startup headquartered in Bengaluru, India, with a mission to make generative AI accessible and impactful for Bharat. Founded by AI experts, we are dedicated to developing high-performance, cost-effective AI agents tailored for the Indian market. We enable enterprises to tap into new opportunities, build deeper customer connections, and reshape the future of AI for India and beyond. Role Overview We are looking for a DevOps Engineer to join our team and help build and manage scalable, secure, and high-performance infrastructure. In this role, you will be a key contributor to automating deployments, managing cloud infrastructure, optimizing CI/CD workflows, and ensuring system reliability. You will work with cutting-edge technologies, including cloud platforms, containerization, and infrastructure as code (IaC), to deliver impactful solutions for AI-driven products. Key Responsibilities CI/CD Pipelines: Design, implement, and manage CI/CD pipelines for seamless software deployment and integration. Cloud Infrastructure: Deploy and manage cloud infrastructure using Terraform, Kubernetes, and Docker for scalability and high performance. Automation & Scaling: Automate infrastructure provisioning, scaling, and security compliance to support high-availability environments. Monitoring & Optimization: Implement logging, monitoring, and alerting solutions using tools like Prometheus, Grafana, ELK Stack, or CloudWatch to monitor system performance and optimize resource utilization. Security & Compliance: Enhance security and compliance by managing IAM policies, encryption, and vulnerability scanning. Troubleshooting & Root Cause Analysis: Troubleshoot system failures, perform root cause analysis, and implement improvements to ensure reliability and uptime. Collaboration: Work closely with development teams to ensure smooth deployment and operation of AI models and applications. Must-Have Skills & Qualifications Educational Background: Bachelor s degree in Computer Science, Engineering, or related field (2024/2025 graduates). Cloud Expertise: Strong experience with AWS, Azure, or GCP for deploying and managing cloud-based applications. Containerization: Proficiency in Docker and Kubernetes for building and managing containerized applications. Infrastructure as Code (IaC): Experience with Terraform, Ansible, or CloudFormation to automate infrastructure management. CI/CD Pipelines: Experience in setting up automated workflows using tools like GitHub Actions, Jenkins, or GitLab CI/CD for smooth deployments. Monitoring & Logging: Experience with Prometheus, Grafana, ELK, or similar tools to implement effective monitoring and logging solutions. Networking & Security: Strong understanding of firewalls, VPNs, SSL, and cloud security best practices for secure infrastructure. Version Control: Proficiency with Git for managing code repositories and version control workflows. Problem Solving: Strong debugging, troubleshooting, and analytical skills to resolve complex system issues. Good to Have (Preferred Experience) Serverless Computing: Exposure to serverless computing models such as AWS Lambda or Azure Functions. Message Queues: Experience with message queues like Kafka, RabbitMQ, or SQS. Site Reliability Engineering (SRE): Familiarity with SRE practices to ensure the reliability and availability of large-scale systems. Open Source Contributions: Contributions to open-source projects or a strong GitHub portfolio showcasing DevOps expertise and best practices. Impactful Work: Work on AI-driven products that are reshaping the future of technology in India. Innovative Team: Collaborate with a team of AI experts and engineers pushing the boundaries of technology. Career Growth: Opportunity to grow in a fast-growing startup at the forefront of the generative AI revolution. Cutting-edge Technologies: Work with cloud technologies, automation, and AI infrastructure to create high-impact products. Qualification : Bachelors degree in Computer Science, Engineering, or related field
Site Reliability Developer 2/3
Oracle
Job Description: Site Reliability Engineer - OCI Cloud Engineering Team Role: Site Reliability Engineer (SRE) Team: OCI OLTP (Online Transaction Processing) Location: Kiev Career Level: IC2 Experience: 5+ years Overview: Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you! As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure. This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement. Key Responsibilities: Cloud Service Operations & Reliability: Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment. Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services. Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions. Automation & Improvements: Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime. Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services. Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations. Security & Incident Response: Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards. Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution. Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance. Collaborative Problem-Solving: Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services. Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders. Documentation & Knowledge Sharing: Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals. Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations. Continuous Learning: Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work. Technical and Professional Requirements: Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments. Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations. Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes). Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems. Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations. Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction. Preferred Qualifications: Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability. Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools. Experience in networking and storage technologies in a cloud environment. Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services. If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!
Senior Site Reliability Engineer
Couchbase
Job Title: Site Reliability Engineer (SRE) Cloud Platform & Production Pipeline Initiatives Location: Bangalore, India (Office-based role) About Couchbase: As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI-driven world. By uniting transactional, analytical, mobile, and AI workloads into a seamless, fully managed solution, Couchbase empowers developers and enterprises to build and scale applications with unmatched flexibility, performance, and cost-efficiency from cloud to edge. Trusted by over 30% of the Fortune 100, Couchbase is unlocking innovation, accelerating AI transformation, and redefining customer experiences. Come join our mission! Job Overview: As a Site Reliability Engineer (SRE), you will play a pivotal role in managing, optimizing, and maintaining Couchbase s cloud infrastructure for Capella, our Database as a Service (DBaaS) platform. You will be responsible for ensuring the reliability and performance of our cloud service while collaborating closely with engineering teams to improve deployment pipelines, security practices, and overall system health. You will work across cloud platforms and multiple tools to provide guidance, mentorship, and contribute to the strategic direction of cloud operations. Responsibilities: Infrastructure Management: Manage, monitor, and maintain the infrastructure for Capella to ensure reliable operations. Security & Compliance: Implement and manage cloud environments in accordance with company security guidelines, including vulnerability management, penetration testing, and compliance requirements (SOC 2, PCI-DSS, GDPR, HIPAA, etc.). CI/CD & Release Pipeline: Collaborate with engineering teams to optimize CI/CD processes, aiming for a highly resilient deployment strategy, ideally with zero downtime. Cloud Optimization: Stay up-to-date with new technologies and industry trends to continuously improve cloud platform architecture and meet the evolving needs of the business. Security Integration: Work with development teams to integrate security scanners within the DevOps lifecycle, enhancing security posture. Leadership & Mentorship: Provide guidance on architecture, code reviews, and technical feedback to improve service reliability, security, cost, and performance. Incident Management: Demonstrate exceptional problem-solving skills, proactively identifying and addressing potential issues before they affect business operations. Collaboration: Partner with development teams, application owners, and stakeholders to integrate best practices and ensure seamless service delivery. Requirements: Experience: 5+ years in Site Reliability Engineering (SRE), DevSecOps, or similar roles, with significant experience working in public cloud environments. Programming & Scripting: Proficiency in languages such as Go, Python, Java, or Ruby. Linux Expertise: High proficiency with Linux operating systems. Kubernetes Management: Experience in managing and maintaining Kubernetes clusters (both self-managed and managed platforms like AWS EKS). Security & Vulnerability Management: In-depth knowledge of security tools and practices (vulnerability management, pen testing, SCA, DAST, SAST), with hands-on experience using tools like Sysdig, Synk, and Blackduck. Cloud Platforms & Tools: Strong experience with cloud platforms (AWS, GCP, Azure) and open-source tools like Artifactory, Jira, Jenkins, Grafana, Prometheus, Datadog, Thanos, etc. Configuration Management: Proficiency with Terraform, Git, and CI/CD platforms (e.g., CircleCI, GitHub, Spinnaker). Networking Security: Solid understanding of TCP/IP, DNS, HTTP, Firewalls, VPNs, and other networking security concepts. Preferred Skills: Availability & Reliability: Knowledge of SLO/SLA, availability, reliability, and performance concepts. Incident Management: Experience with on-call rotations and incident management. Database Experience: Familiarity with databases, particularly Couchbase. Security Certifications: Relevant certifications in security or cloud technologies are a plus. Couchbase reimagines database technology to deliver a fast, flexible, and affordable cloud database platform, empowering developers to build applications with exceptional customer experiences. Trusted by over 30% of the Fortune 100, Couchbase drives innovation and customer success through its Capella platform. Benefits at Couchbase: Generous Time Off Program: Flexibility to care for yourself and your family. Wellness Benefits: Access to world-class medical plans, dental, vision, life insurance, and employee assistance programs. Financial Planning: RSU equity program, ESPP, retirement planning, and business travel insurance. Career Growth: Focused on your career development and success. Fun Perks: Ergonomic and comfortable office setup, food & snacks for in-office employees, and more!
Site Reliability Engineer -- Logging And Monitoring
Ibm (international Business Machines)
Introduction A career in IBM Software means you ll be part of a team that transforms our customer s challenges into solutions. Seeking new possibilities and always staying curious, we are a team dedicated to creating the world s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career. IBM s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive. Your role and responsibilities In this role, you will build and maintain an observability stack for IBM s Cloud Object Storage service using managed services as well as custom built services. This stack is used by Cloud Object Storage SREs and devs to understand the health of the service. Work duties and responsibilities include: Design, setup, configure and implement the COS Monitoring System using technologies such as Elasticsearch, Logstash, Kibana, Kafka, Kafka Mirrors, Filebeat, Grafana and Sysdig. Automate CICD tasks and infrastructure using Ansible, Terraform, Jenkins, and Travis. Experience with microservices and distributed application architecture, such as containers and Kubernetes. Experience with Linux administration and programming languages such as java, python and sql. Performance and configuration tuning to support the increasing load of data flowing into the COS Monitoring System. Provide design recommendations and thought leadership to provide best-in-class observability as part the COS Monitoring System. Provide 24x7 on-call customer support on a rotational basis. Design and develop dashboards for metrics analysis Design, Develop and Configure an alerting solution for an end-to-end incident management and recovery process by integrating Sysdig with Pagerduty, Email and Slack. Required education Bachelor's Degree Preferred education Bachelor's Degree Required technical and professional expertise Ability and tenacity to solve increasingly complex technical issues through analysis and a variety of problem-solving techniques. Working knowledge of Object-Oriented Python with demonstrable experience in applying these skills. Working knowledge of Linux environments. Experience working in an Agile-Scrum development environment. Experience using tools such as Jira, GitHub and Logging and monitoring tools BS in CS, CE or similar field, plus 2 to 5 years relevant work experience. Qualification : BS in CS, CE or similar field, plus 2 to 5 years relevant work experience.
Lead Software Engineer - Scale & Performance
Team Vunet Systems
Lead Software Engineer - Scale & Performance Location: Bengaluru Experience: 6 12 years About VuNet VuNet is a pioneer in Business Journey Observability, using Big Data and Machine Learning to revolutionize digital experiences in the financial services industry. Our platform delivers end-to-end visibility into customer journeys, helping organizations proactively resolve issues, ensure operational resilience, and deliver superior user satisfaction. With over 28 billion digital transactions monitored every month and serving more than 300 million users globally, VuNet is shaping the future of observability for some of the largest banks and financial institutions. We are Series B funded, part of NASSCOM s DeepTech Club, and recognized by global analysts such as Gartner and Omdia. Your Role: Lead Software Engineer - Scale & Performance As a Lead Software Engineer for Scale & Performance, you ll own the performance and scalability benchmarks for VuNet s observability platform. You will work with cutting-edge technologies, design robust test frameworks, and ensure that our platform scales seamlessly to meet the demands of millions of users. Roles & Responsibilities Own performance and scalability benchmarking for key platform components (ingestion pipelines, data storage, and query services). Design and execute load, stress, soak, and capacity tests across microservices, agents, and ingestion layers. Identify and resolve performance bottlenecks in both infrastructure (CPU/memory/IO) and application layers (API latency, throughput, GC behavior). Develop and maintain performance test frameworks, preferably using Kubernetes-based environments. Collaborate with DevOps and SRE teams to optimize system configurations (Kubernetes, Postgres/TimescaleDB, ClickHouse, Kafka) for scale. Implement OpenTelemetry for service instrumentation to monitor system health and latency (p50/p95/p99 metrics). Contribute to capacity planning, scaling strategies (horizontal/vertical), and resource optimization. Analyze production incidents related to scaling issues and drive permanent fixes. Work with engineering teams to design scalable architecture patterns and define SLIs/SLOs for system performance. Document performance baselines, tuning guides, and scalability best practices for internal use. What You Bring Mandatory Skills: Strong background in performance engineering for large-scale distributed systems or SaaS platforms. Expertise in Kubernetes, container runtimes (containerd/Docker), and resource profiling in containerized environments. Solid understanding of Linux internals, CPU/memory profiling, and network stack tuning. Hands-on experience with observability tools (Prometheus, Grafana, OpenTelemetry, Jaeger, Loki, Tempo, etc.). Familiarity with observability platform datastores like ClickHouse, PostgreSQL/TimescaleDB, Elasticsearch, or Cassandra. Experience with performance benchmarking tools such as k6, Locust, JMeter, or custom Golang/Python scripts. Ability to interpret system metrics (CPU usage, memory, GC, latency) and correlate across different layers. Nice-to-Have Skills: Experience with agent benchmarking (OpenTelemetry Collector, custom data shippers). Exposure to streaming systems like Kafka, NATS, or Pulsar. Familiarity with CI/CD pipelines for performance testing and regression tracking. Knowledge of cost optimization and capacity forecasting in cloud environments (AWS/GCP/Azure). Proficiency in Go, Python, or Bash scripting for automation and data analysis. Life at VuNet: At VuNet, we're building a world-class observability platform, and we re just getting started. You ll be part of a passionate, problem-solving team that embraces collaboration, fast learning, and staying ahead of emerging technologies like Gen AI. We foster a high-trust, inclusive culture where collaboration, ownership, and innovation are central to our success. If you're looking to work on cutting-edge tech, make a real impact, and grow with a supportive team you ll fit right in at VuNet. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness and 1:1 counseling support. A culture that promotes continuous learning, innovation, and career growth. Transparent, inclusive, and high-trust workplace. Opportunities for skill enhancement with training programs focused on new Gen AI technologies.
Senior Qa Engineer
Team Vunet Systems
Senior QA Engineer - AI-Powered Observability Platform Location: Bengaluru Experience: 6 10 years About VuNet VuNet is at the forefront of Business Journey Observability, revolutionizing the financial services industry with Big Data and Machine Learning. Our deep-tech platform provides comprehensive visibility into customer journeys, enabling proactive issue resolution, operational resilience, and superior user experiences. We monitor over 28 billion digital transactions monthly, serving 300 million users globally, and we re powering some of the largest banks and financial institutions in India and MEA. VuNet is Series B funded, part of NASSCOM s DeepTech Club, and recognized by analysts like Gartner and Omdia. Your Role: Senior QA Engineer - AI-Powered Observability Platform As a Senior QA Engineer at VuNet, you ll play a crucial role in ensuring the quality and reliability of our VuSmartMaps Observability Platform. You ll lead the design and implementation of cutting-edge test automation, performance validation, and reliability frameworks across distributed systems that handle billions of telemetry events. Working closely with development, operations, and QA teams, you will drive quality across the entire platform and play a key role in ensuring that our systems are scalable, resilient, and performant. Roles & Responsibilities Quality Strategy Ownership: Own the end-to-end quality strategy for observability platform components (metrics, logs, tracing, alerting, dashboards, MLOps). Automated Testing: Build and maintain automated test suites for data pipelines, APIs, and integration flows involving tools like Prometheus, Grafana, Loki, Elastic, and OpenTelemetry. Performance Validation: Design and execute tests to validate high-throughput, distributed systems under real-world load conditions, ensuring performance benchmarks are met. Test Frameworks Development: Develop and maintain test frameworks and tools using Python, Go, Bash, pytest, k6, Playwright, and others. System Reliability & Alerting: Define and implement test coverage for system reliability, alerting accuracy, and visualization correctness. Collaboration: Partner with developers, SREs, and DevOps teams to shift quality left in the development lifecycle, contributing to CI/CD pipelines and automation workflows using GitOps tools. Automation Integration: Integrate automated test suites into smoke, functional, and regression pipelines using Jenkins, Spinnaker, and other CI/CD tools. Mentorship: Mentor junior QA engineers, establish best practices, and ensure consistency in the QA discipline across the team. What You Bring Mandatory Skills: Experience: Minimum 6+ years in software quality engineering, with a focus on automated testing, performance, and reliability. Scripting/Programming: Proficiency in at least one scripting or programming language (JavaScript, Python, Go). CI/CD Systems: Experience with CI/CD systems such as GitHub Actions, Jenkins, or ArgoCD. Debugging Skills: Excellent debugging skills and the ability to analyze code quality and system performance. Distributed Systems Knowledge: Familiarity with Kafka, Kafka Streams, ClickHouse DB, and distributed systems. Kubernetes & Microservices: Strong experience testing Kubernetes-native systems, Helm deployments, and microservices. Observability Tools: Knowledge of observability tools like Prometheus, Grafana, Elastic Stack, OpenTelemetry, Loki, or Jaeger. Tooling & Deployment: Proficiency in Jenkins, Spinnaker, GitOps, Kubernetes, and Docker. Testing Experience: Hands-on experience in various types of testing (functional, performance, load, etc.) and knowledge of testing tools. Documentation Skills: Ability to create clear documentation (e.g., release notes, troubleshooting guides, and migration guides). Nice-to-Have Skills: Performance Testing: Experience designing and executing performance and load testing for high-traffic applications. Web Services & Systems Design: Understanding of web services and distributed systems architecture. Cross-Functional Communication: Excellent communication skills with the ability to coordinate across multiple teams. Life at VuNet: At VuNet, we re building a world-class observability platform proudly Made in India and we re just getting started. Join a passionate team of problem-solvers who love tackling complex challenges and stay ahead of the curve with technologies like Gen AI. We offer an environment where collaboration, innovation, and learning are at the core of everything we do. You ll have the opportunity to work on cutting-edge technologies and make a real impact on a product that powers leading banks and financial institutions globally. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness support and 1:1 counseling. A learning culture that promotes growth, innovation, and ownership. Transparent, inclusive, and high-trust workplace culture. Exposure to Gen AI and integrated technology workspaces. Support for career development with various training programs to enhance your skills and expertise.
Gen AI Support Engineer-2
Exotel
Gen AI Support Engineer-2 Location: Bengaluru Experience: 4 7+ years Employment Type: Full-time About Us Exotel is the leading full-stack customer engagement platform and virtual telecom operator for emerging markets. Since its inception in 2011, Exotel has been powering 50 million daily engagements across voice, video, and messaging channels. We provide our unified customer engagement solutions to over 6000 companies globally, including industry leaders like Ola, Swiggy, Flipkart, GoJek, Byjus, Urban Company, HDFC Bank, Zomato, and Oyo. With $100 million in Series D funding and an ARR of $60 million, Exotel is a growth-stage company poised for massive impact. Overview We're seeking a Gen AI Support Engineer-2 to join our team. As an L2 Support Engineer, you will be the highest level of technical escalation within the support organization. Your role will encompass system reliability, platform integrity, troubleshooting mission-critical production issues, and collaborating with engineering teams for architecture feedback. Additionally, you'll help mentor junior engineers and improve operational processes and tools for large-scale environments. If you're passionate about writing clean code with Python and Django and want to contribute to a fast-paced, mission-driven company, this role is for you! Responsibilities Mission-Critical Issue Resolution: Own the resolution of high-priority, time-sensitive production issues. Root Cause Analysis (RCA): Lead RCA reviews and push for systemic improvements in system architecture and processes. Performance Optimization: Identify bottlenecks and propose architectural changes to improve system performance and scalability. Patch Management: Assist in configuring, deploying, and testing patches, releases, and application updates to production environments. SME for Production Systems: Serve as the Subject Matter Expert (SME) for Exotel's production systems and integrations. Cross-Team Collaboration: Work with Delivery, Product, and Engineering teams to influence system design, rollout strategies, and improvement plans. Mentorship: Lead and mentor L1/L2 engineers on troubleshooting best practices and continuous learning. Code Writing & Automation: Write clean, maintainable code for internal tools, scripts, and automation using Python and Django. Support Tooling: Automate recovery workflows and design support tools for proactive monitoring. Operational Excellence: Establish and improve SLAs, monitoring dashboards, alerting systems, and operational runbooks to ensure system reliability. Must Have Skills Backend Development Support: 3+ years of experience in backend development support, production support, or DevOps/SRE roles. Core Technologies: Proficiency in Python, Django, SQL, and troubleshooting in Linux. Web Technologies: Strong understanding of HTML, CSS, JavaScript, and other web technologies. Distributed Systems & Cloud: Experience working with distributed systems, cloud architecture (AWS), Docker, and Kubernetes. Automation: Strong scripting skills with Bash/Python for automation and operational support. CI/CD & Observability: Good understanding of CI/CD, observability tools, and release management workflows. Communication Skills: Excellent communication, leadership, and incident command skills for managing production issues and cross-functional collaboration. Nice to Have Experience with AI-powered systems and machine learning technologies. Familiarity with monitoring systems like Prometheus, Grafana, or Elasticsearch. Knowledge of microservices architectures and scaling distributed systems. Innovative Work: Be at the forefront of cloud-based communications technology and AI-driven customer engagement platforms. Impact: Play a key role in maintaining and optimizing systems that power millions of customer interactions daily. Growth Opportunities: Be part of a fast-growing company with ample learning opportunities and career development. Collaborative Environment: Work in a supportive, inclusive environment where your input and ideas matter. Competitive Benefits: Comprehensive benefits package including health insurance, mental wellness support, and more.
Senior Java Web Backend Engineer
Blueoptima
Position: Senior Java Web Backend Engineer Job Type: Full-time Location: Bengaluru Department: Engineering About BlueOptima: At BlueOptima, our vision is to become the global reference for optimizing the performance of software engineers across all industries. We provide industry-leading objective metrics in software development, enabling large organizations to deliver better software, faster, and at a lower cost through technology that pushes the limits of what has been done before. As a fast-growing global company, we ve consistently doubled our headcount and revenue year over year, without external investment. Our headquarters is in London, with additional offices in Mexico, India, and the US. Our diverse team consists of 210+ employees from 34+ nationalities and speaks over 25 languages. We foster an open-minded environment and encourage employees to create their own success stories within this high-performance atmosphere. Job Description: We are looking for a Senior Java Web Backend Engineer with extensive experience in designing, building, and maintaining scalable SaaS applications using Java/J2EE technologies. The ideal candidate will be a tech enthusiast, committed to excellence, and eager to take on a leadership role as a mentor to a team of talented engineers. You ll be part of a self-managed Agile team, where you will actively contribute to improving development processes, bringing new ideas to the table, and proposing improvements in methodology, management, and organization. Key Responsibilities: Application Development & Maintenance: Design, develop, implement, test, and maintain application software components. Requirements Analysis: Analyze client requirements and convert them into technical specifications, ensuring alignment with project goals. Feature Ownership: Take ownership of development for new features and continuous improvements to the platform. Performance Optimization: Identify and resolve performance bottlenecks, ensuring high scalability and efficiency of the system. Architecture Improvement: Identify architectural inefficiencies, and create and execute a roadmap to address and resolve them. Leadership & Mentorship: Lead and mentor junior developers, fostering their technical growth and career development. Client Interaction: Provide technical support to client-facing teams and occasionally interact with clients to resolve issues related to your component. What You Need to Succeed at BlueOptima: Education: Minimum Bachelor's degree in Computer Science or equivalent. Self-Sufficiency: Ability to work autonomously with minimal supervision. Problem-Solving Skills: Strong analytical and problem-solving capabilities, coupled with a can-do attitude. Agile Methodologies: Experience with Agile methodologies (e.g., SCRUM, Sprints) and leading small Scrum teams. Commitment to Excellence: Focused on completing tasks efficiently and reliably while identifying the best approach to solving complex problems. Must-Have Technical Skills: Java Expertise: 5+ years of experience with Java, J2EE/Java EE, Spring, and Spring Boot. Architectural Knowledge: Solid understanding of Monolithic, SOA, and Microservices architectures. Concurrency & Thread-Safety: Strong knowledge of Java concurrency patterns and experience building thread-safe applications. Database Skills: Expertise in relational databases, partitioning, indexing techniques, and SQL (PostgreSQL). System Design: Experience creating high and low-level design documents based on application architecture. Linux Proficiency: Familiarity with Linux shell and command-line tools. Testing Skills: Strong grasp of unit testing and integration testing frameworks. Cloud Platform Experience: Hands-on experience with cloud platforms like AWS, Azure, or Google Cloud (e.g., S3, EC2, Lambda). Message Queues & Streaming: Familiarity with message queues (e.g., Kafka, RabbitMQ, SQS) for high-performance, scalable systems. Monitoring & Logging: Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, ELK Stack, Splunk). At BlueOptima, we believe in accelerating your career progression. You ll have the opportunity to strengthen your skills, take on diverse challenges, and quickly grow within the organization. We support your development every step of the way, with a clear path to leadership and technical expertise in a fast-paced, innovative environment. Qualification : Bachelor's degree in Computer Science or equivalent
Devops + Tester
Sourcefuse
Job Title: DevOps + Tester Location: Bangalore, India Experience: 4 5 years Industry: IT Services Job Type: Full-time Role Overview This hybrid DevOps + QA role focuses on: Ensuring mobile application performance and reliability. Driving automation, CI/CD, and continuous improvement. Designing and executing automated test scripts and performing integration, regression, and performance testing. Supporting innovation and scalable software deployment in alignment with Rakuten s standards. You ll collaborate closely with development, QA, and operations teams, while improving infrastructure and testing frameworks. Key Skills & Tools CI/CD Tools: Jenkins, Bamboo, Docker Testing: Automation, Integration, Regression, Performance Testing Cloud Platforms: AWS, Azure, GCP Salesforce Ecosystem: 1 2 years hands-on experience preferred API Integration: Including legacy systems Test Scripting Tools: Open-source or commercial frameworks Solid grasp of software architecture, high availability, and transaction-intensive systems Responsibilities Monitor and optimize app performance Develop and maintain automated test scripts Execute integration and regression testing Conduct performance tests during pipeline integration Collaborate across DevOps, development, and QA teams Maintain detailed test documentation Conduct unit tests, code reviews, and QA validations Ensure service quality and customer satisfaction Education & Qualifications Bachelor s degree in CS, IT, Engineering, or related field (required) MBA or advanced degree (preferred) Salesforce Admin or PD certification (preferred) Ideal Candidate Traits Strong DevOps + Testing blend with cloud experience Effective communication with technical and non-technical teams Strategic thinker with planning skills Thrives in fast-paced environments, managing multiple priorities Interview Process 2 Technical Rounds Qualification : Bachelors degree in CS, IT, Engineering, or related field (required)
Enterprise Infra Automation Architect
Infosys
Job Title: Enterprise Infrastructure Automation Architect Location: Bengaluru, India Experience: 16 20 Years Service Line: Cloud & Infrastructure Services Educational Qualifications: B.E., B.Tech, M.Tech, BCA, MCA, MBA Role Overview: We are looking for a seasoned Enterprise Infrastructure Automation Architect to lead the design and implementation of automation strategies across our global IT infrastructure. This role is pivotal in driving enterprise-wide automation initiatives, streamlining operations, and enabling digital transformation through scalable and secure infrastructure automation solutions. Key Responsibilities: Infrastructure Automation Strategy & Roadmap Define and maintain the enterprise automation strategy aligned with organizational goals and IT objectives. Identify automation opportunities across compute, storage, network, virtualization, cloud, and data center domains. Establish automation goals, KPIs, and success metrics for continuous improvement. Evaluate and recommend emerging automation technologies and frameworks. Solution Design & Architecture Design scalable, secure, and maintainable automation architectures for enterprise infrastructure. Define enterprise-wide automation standards and best practices (e.g., IaC, scripting, orchestration). Select and standardize tools such as Ansible, Terraform, Python, PowerShell, and cloud-native automation services. Build reusable automation frameworks, templates, and modules to ensure consistency. Implementation & Governance Provide architectural oversight and support during implementation and deployment phases. Ensure compliance with automation standards and governance throughout the lifecycle. Participate in project reviews to ensure strategic alignment with enterprise automation goals. Establish governance processes for managing scripts, workflows, and infrastructure-as-code artifacts. Additional Responsibilities: Proven experience in IT Service Management and remote delivery automation environments. Ability to articulate the business value and operational impact of automation initiatives. Self-motivated, creative thinker with excellent problem-solving abilities. Excellent communication skills, both verbal and written. Technical & Professional Requirements: In-depth knowledge of enterprise architecture frameworks (e.g., TOGAF, Zachman). Expertise in infrastructure domains including compute, storage, middleware, backup (on-prem & cloud). Experience with public cloud platforms (AWS, Azure, GCP) and hybrid cloud architectures. Proficiency in security standards and best practices across global IT environments. Hands-on experience with monitoring and orchestration tools. Skilled in creating architecture diagrams and workflow visualizations using tools like MS Visio, Lucidchart, etc.
Devops Engineer
Camsdata Technologies India Pvt. Ltd.
DevOps Engineer Bangalore, India Location: Bangalore (Bengaluru) Experience: 2 to 8 Years Industry: IT Software / Cloud & DevOps Job Summary: We are seeking an experienced DevOps Engineer to design, implement, and manage CI/CD pipelines on AWS and support application deployments. The ideal candidate will have hands-on expertise with AWS services, automation tools, and security integration within DevOps workflows. Key Responsibilities: Design, configure, and maintain CI/CD pipelines using AWS native tools or traditional platforms such as Jenkins, GitHub Actions, etc. Deploy applications on AWS using services like AWS Fargate, EBS, S3, CodePipeline, CodeBuild, and others Onboard applications onto AWS DevOps platform following the required CI/CD workflow Collaborate with application and operations teams to provide remediation and support for CI/CD pipeline onboarding Integrate various test automation frameworks and tools into CI/CD pipelines for continuous testing Implement security scanning and frameworks within pipelines, including SAST, DAST, IAST, and RASP Monitor the DevOps platform, applications, and infrastructure; respond proactively to incidents and events Automate operational tasks using Ansible or scripting languages (e.g., Python, Bash) Develop reusable automation assets and scripts to streamline DevOps processes Required Skills: Proven experience setting up and managing CI/CD pipelines on AWS and other platforms Strong knowledge of AWS services relevant to DevOps: Fargate, EBS, S3, CodePipeline, CodeBuild Familiarity with automation tools like Ansible, scripting languages, and infrastructure-as-code Experience integrating security tools and frameworks within DevOps pipelines Good troubleshooting and monitoring skills with cloud-native tools and third-party platforms Excellent collaboration skills for working across development and operations teams Preferred Qualifications: Bachelor s degree in Computer Science, Engineering, or related field Certifications in AWS DevOps (AWS Certified DevOps Engineer) or similar credentials Experience with container orchestration (e.g., Kubernetes) and Docker Knowledge of Agile and DevSecOps methodologies Work on cutting-edge cloud-native DevOps solutions Collaborate with a dynamic team focused on automation and security Opportunity for professional growth and certification support Qualification : Bachelors degree in Computer Science, Engineering, or related field.
Devops Engineer
Team Vunet Systems
DevOps Engineer Location: Bengaluru, India Experience: 3 - 5 Years Job Type: Full-time About VuNet VuNet is a deep-tech leader in Business Journey Observability, leveraging Big Data and Machine Learning to deliver end-to-end digital experience monitoring for major financial institutions. The platform monitors over 28 billion transactions monthly, powering top banks and enterprises in India and MEA. Work on cutting-edge observability technology Join a Series B funded, award-winning startup recognized by Gartner, Forbes, and NASSCOM Collaborate in a fast-paced, innovative environment focused on learning and growth Access to mental wellness support, health insurance (covering family), and career development programs Role Overview: DevOps Engineer Design, develop, and maintain VuSmartMaps deployments across on-premises, cloud, and hybrid environments Automate deployments using Infrastructure-as-Code (IaC) and CI/CD pipelines Manage cybersecurity assessments and remediations for deployments Collaborate with development teams to improve deployment processes and infrastructure support Publish VuSmartMaps in cloud marketplaces (AWS, Azure, GCP) Stay current on DevOps, CI/CD, infrastructure orchestration, cybersecurity, AI workflows, and big data technologies Key Responsibilities Develop and maintain IaC frameworks enabling flexible VuSmartMaps deployment Build and manage CI/CD pipelines using GitHub Actions, Jenkins Monitor infrastructure, conduct cybersecurity testing, and manage patching Improve deployment efficiency and customer experience Collaborate cross-functionally for seamless integration and rollout Must-Have Skills 3+ years building/managing CI/CD pipelines (GitHub Actions, Jenkins) Certified/experienced in Kubernetes, Docker, Terraform, Helm, YAML Hands-on experience with GitOps workflows Knowledge of web servers (Nginx, Django), identity providers (Active Directory, LDAP), load balancers (Traefik) Experience with databases (PostgreSQL, Elasticsearch, Hadoop stack) and secrets management (Key Vault) Familiarity with cloud services (AWS, Azure, GCP) across IaaS, PaaS, SaaS layers Strong Linux and scripting skills (Bash, Python) Excellent communication skills for cross-team collaboration Good-to-Have Skills Exposure to Red Hat OpenShift, VMware, Ansible, Chef, Puppet Familiarity with container orchestration tools (Podman, Docker Swarm, Nomad) Experience optimizing dockerized microservices and container images Benefits Comprehensive health insurance covering you and your family Mental health and 1:1 counseling support Learning culture focused on innovation and career growth Inclusive, transparent workplace culture Access to new Gen AI tools and integrated tech workspace Career development and skill enhancement programs
Oracle Cloud Operation Engineer
Oracle
Job Description: SaaS Cloud Ops Specialist We are looking for SaaS Cloud Ops specialists involved in managing and supporting cloud-based applications, databases, and services. These roles can include: Designing, planning, implementing, onboarding, configuring, and managing cloud environments and applications; troubleshooting and resolving cloud services issues; maintaining, monitoring, planning, and documenting; and infrastructure-level automation experience. Career Level - IC3 Responsibilities As part of the Oracle Finance GIU - Banking-Application Management Support team, SaaSOps will be taking complete responsibility for supporting & maintaining OCI cloud-based applications, environments, and databases on OCI (Oracle Cloud). The new hire is expected to support 24x7 Production Operations for SaaS customers, associated banking cloud services, and products. Candidate should have expertise in the below (at least 3-4 from below): Kubernetes administration (Mandatory) Oracle Database administrator (Mandatory) OCI administration / or any other cloud administration (Mandatory) Linux (Mandatory) Excellent Communication Skills (Mandatory) 24*7 Production Operations (Mandatory) Expertise in Autonomous Database Automation experience CI/CD Pipelines Knowledge in GIT Repository Disaster Recovery (DR) SaaSOps is expected to possess strong troubleshooting skills and will need to work on a ticketing-based system to resolve issues and monitor various aspects of the cloud services as part of the day-to-day job. Also, he/she will work on critical and non-critical issues from the queues, escalation channels, and other modes of assignments. The candidate would be expected to update Service Requests with technical and non-technical solutions, meet SLA requirements, and interact with other functional teams, customers, customer management teams, and Product engineering teams as and when required.
Senior DevOps / Site Reliability Engineer
Blue Yonder
Job Title: Senior DevOps / Site Reliability Engineer Location: Pune, India Company: Blue Yonder Experience: 10 to 13 years Education: Bachelor s Degree in Computer Science, Engineering, or related STEM fields Company Overview Blue Yonder is a leading AI-driven Global Supply Chain Solutions provider and consistently recognized as one of Glassdoor s Best Places to Work. We are driving the next wave of digital transformation in manufacturing and retail, delivering innovative SaaS solutions that power intelligent supply chains across the globe. We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to lead the design, development, deployment, and operational management of our Azure SaaS solution. This role requires strong DevOps, cloud delivery, and infrastructure automation expertise, along with leadership capabilities to guide a growing global team. Role Overview In this role, you will be responsible for architecting, planning, and executing end-to-end delivery pipelines, supporting both product development and operational stability. Working closely with platform, product, and architecture teams, you will implement best-in-class DevOps and SRE practices, ensuring scalability, resilience, and cost optimization. Key Responsibilities Architect, design, and manage CI/CD pipelines and infrastructure for a cloud-native, multi-tenant SaaS solution on Azure. Lead sprint planning, backlog grooming, and architecture discussions. Develop quality automation scripts and tools to reduce manual efforts and enable self-healing, self-service capabilities. Identify and resolve operational bottlenecks and proactively improve observability (monitoring, alerting, logging). Participate in code reviews, ensure secure and scalable designs, and mentor junior and mid-level engineers. Collaborate with stakeholders to understand business and technical requirements and translate them into actionable user stories. Implement and enforce cloud cost optimization strategies. Conduct post-incident reviews with a blameless culture to identify root causes and drive continuous improvements. Automate service requests and standard operational procedures. Drive improvements to the team s continuous integration pipeline, ensuring rapid and reliable deployments. Stay updated with the latest DevOps, SRE, and cloud technologies and bring innovative ideas to the table. Participate in team hiring and actively contribute to onboarding new team members. Technical Environment Languages: Java, Python, PowerShell, Shell Scripting DevOps Tools: Azure DevOps, GitHub Actions, Jenkins Cloud: Microsoft Azure (ARM Templates, AKS, Event Hub, HDInsight, Azure AD, Application Gateway, Virtual Networks) Architecture: Microservices, Kubernetes, Docker, Event-driven architecture Frameworks: Spring Boot, Hibernate Monitoring & Logging: Elasticsearch, Spark, Kafka Databases: RDBMS, NoSQL Version Control: Git Required Skills & Experience Bachelor s Degree (STEM preferred) with 10 to 13 years of experience in DevOps, Cloud Delivery, or Site Reliability Engineering. Proven hands-on experience with Azure Cloud Services. Expertise in setting up and optimizing CI/CD pipelines. Strong scripting experience: Shell and PowerShell are mandatory; Python is a plus. Strong understanding of container technologies (Docker, Kubernetes) and microservices architecture. Experience integrating and managing third-party monitoring and logging tools. Strong problem-solving skills and ability to work with global, cross-functional teams. Excellent communication and stakeholder management skills. Nice to Have Development experience in Java or Python. Experience working in agile teams with a product-centric mindset. Experience working in manufacturing or retail domains. Exposure to AI/ML-driven monitoring and observability tools. Work with cutting-edge technologies on globally impactful solutions. Collaborate with diverse and talented teams across the US, India, and the UK. Foster your career growth through mentorship, continuous learning, and leadership opportunities. Experience an inclusive, flexible work culture where innovation and creativity thrive. Diversity, Inclusion, Value & Equality (DIVE) At Blue Yonder, we are committed to building an inclusive environment where everyone feels empowered to be themselves. All qualified applicants will receive consideration for employment regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. Qualification : Bachelors Degree in Computer Science, Engineering, or related STEM fields
1 - 20 of 0 jobs
* No exact matches found. Showing closest results insteadNo results found
Modify search criteria or create an alert to get relevant jobs as soon as they’re posted