Site Reliability Engineer Logging AND Monitoring Jobs in Bengaluru
595 Jobs Found
Lead Platform Engineer
Team Vunet Systems
Lead Platform Engineer Observability Solutions Location: Bengaluru Experience: 6 10 Years Function: Observability Engineering | Platform Architecture | SRE Enablement Join VuNet Redefining Digital Observability at Scale VuNet is transforming the future of digital experiences through Business Journey Observability, combining Big Data and AI/ML to empower real-time visibility across payments, banking, and financial services. Monitoring 28+ billion transactions/month, our platform is trusted by top financial institutions and powers over 300 million users. Backed by Series B funding and recognized by Gartner, NASSCOM, and Forbes, we are leading the charge in building a new category of observability, proudly Made in India for global impact. Your Role: Lead Platform Engineer As the Lead Platform Engineer, you will architect and drive the development of packaged observability solutions across 100+ infrastructure and application technologies. You will define **golden signals**, build **data collection strategies**, and lead the standardization of alerts, dashboards, and RCA workflows for platforms like **Kubernetes, Oracle DB, and Tomcat**. This is a cross-functional leadership role that sits at the intersection of product, platform, DevOps, and SRE. You will **lead a team** and influence how observability is delivered, scaled, and adopted across complex environments. Key Responsibilities Observability Solution Development Design and lead the delivery of observability packages for databases, middleware, cloud-native, and legacy platforms. Define and implement data collection pipelines, including agents, APIs, logs, metrics, traces, and service discovery. Establish **golden signals, SLIs/SLOs**, and health KPIs for performance, availability, and anomaly detection. Dashboards, Alerts & RCA Develop standardized, reusable dashboards, alerts, reports, and troubleshooting playbooks. Automate **RCA workflows** to improve MTTR and reduce alert fatigue. Platform Enablement & Integration Work with engineering to enhance agent capabilities and support new data sources/formats. Guide implementation of platform features for better observability at scale. Team Leadership & Governance Lead and mentor a team of observability engineers and specialists. Define design patterns, reusable modules, and version-controlled libraries. Stakeholder Collaboration Partner with product managers, DevOps, SREs, and customer teams to gather requirements, align priorities, and validate use cases. Ensure deliverables are scalable, well-documented, and production-ready. What You Bring Must-Have Skills 6 10 years of experience in observability, platform engineering, or SRE roles. Hands-on with tools like Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, Splunk. Strong understanding of logs, metrics, traces, profiling, and collection strategies. Experience developing solutions for platforms like Kubernetes, Oracle, PostgreSQL, Tomcat, etc. Proficient in Python, Shell scripting, APIs, and automation tools (**Terraform**, etc.). Familiar with alert fatigue mitigation, anomaly detection, and RCA frameworks. Excellent communication, technical leadership, and documentation skills. Nice to Have Experience managing an observability marketplace or solution catalog. Contributions to open-source observability projects. Certifications in Kubernetes, Observability platforms, or cloud providers (AWS/GCP/Azure). Background in ITSM tools, CMDBs, or incident workflow automation. At VuNet, you ll help build a category-defining observability platform that s already transforming critical infrastructure for leading financial institutions. You ll work with passionate engineers, push technical boundaries, and grow in a high-trust, high-impact environment. What You ll Experience: Ownership of key observability initiatives impacting 300M+ users. Collaboration with SRE, DevOps, and product teams across real-time financial systems. Opportunity to experiment with and shape Gen AI, ML, and emerging telemetry trends. Perks & Benefits Health insurance for you, your parents, and dependents. 1:1 mental wellness support. Training programs, certifications, and career growth opportunities. Transparent, inclusive, and high-trust work culture. Access to cutting-edge technology and Gen AI-powered workspaces.
Associate ML Ops
Mpokket Financial Services Private Limited
Job Title: Associate ML Ops Location: Bangalore Department: Data Science Employment Type: Full-time Experience: 1 2 years Job Overview We are seeking a motivated and detail-oriented Associate ML Ops to join our Data Science team. In this role, you will be responsible for supporting the deployment, monitoring, and scaling of machine learning models in production environments. You ll collaborate closely with data scientists and engineers to build robust MLOps pipelines and ensure model reliability, scalability, and performance. If you are passionate about bringing machine learning models to life and have hands-on experience in productionizing ML systems, we d love to hear from you. Key Responsibilities Deploy and maintain machine learning models in production environments using best-in-class tools like Databricks and MLflow. Collaborate with data scientists to translate experimental models into scalable, production-ready systems. Monitor model performance, accuracy, and overall health through automated tools and custom strategies. Build and maintain RESTful APIs using Python frameworks such as Flask or Django to serve ML models. Write efficient and optimized SQL and NoSQL queries for data extraction and transformation. Apply software engineering best practices, including version control, testing, and documentation, to MLOps workflows. Work with Python libraries like Pandas, PySpark, scikit-learn, SQLAlchemy, and Requests. Troubleshoot issues related to model deployment, API performance, or data integration pipelines. Minimum Qualifications Bachelor s or Master s degree in Computer Science, Statistics, Econometrics, Operations Research, or a related technical field. 1 2 years of hands-on experience in solving analytical or machine learning problems in production settings. Must-Have Technical Skills Hands-on experience with Databricks and MLflow Proven expertise in deploying ML models in real-world applications Strong understanding of data structures, algorithms, OOP, and software engineering principles Experience building and maintaining REST APIs using Python Proficiency in SQL and NoSQL Excellent Python programming and debugging skills Familiarity with core Python libraries used in ML and data processing: Pandas, scikit-learn, PySpark, SQLAlchemy, etc. Nice-to-Have Skills Exposure to Kafka for streaming and batch data processing Familiarity with Git and CI/CD pipelines Experience with Python multiprocessing or worker/queue systems Understanding of event-driven or asynchronous programming models This is an exciting opportunity to work at the intersection of data science and engineering. You ll play a key role in productionizing cutting-edge models and ensuring they deliver real business impact. Qualification : Bachelors or Masters degree in Computer Science, Statistics, Econometrics, Operations Research, or a related technical field
ML Ops Engineer
Mpokket Financial Services Private Limited
Job Title: ML Ops Engineer Location: Bangalore Department: Data Science Employee Type: Full-time Experience Required: 3 5 years Position Overview We are seeking an experienced and motivated ML Ops Engineer to join our Data Science team. In this role, you will be responsible for deploying, monitoring, and maintaining machine learning models in production environments. You will work closely with data scientists, engineers, and product teams to ensure models are scalable, reliable, and aligned with business objectives. This role is ideal for professionals who are passionate about building robust ML pipelines and bringing machine learning solutions into real-world applications at scale. Key Responsibilities Deploy and manage machine learning models in production environments, ensuring scalability, reliability, and performance. Build and maintain MLOps pipelines using platforms like Databricks and MLflow. Monitor model performance, accuracy, and health; implement alerting and diagnostics as needed. Develop and maintain RESTful APIs using Python frameworks such as Flask or Django to serve ML models. Optimize data workflows and collaborate with engineering teams to improve model integration and performance. Design strategies for automated model retraining, deployment, and version control. Write clean, maintainable, and efficient code using Python, adhering to OOP principles and best practices. Write complex queries using SQL and work with NoSQL databases to support data pipelines and feature stores. Leverage Python libraries such as PySpark, Pandas, scikit-learn, SQLAlchemy, and Requests. Minimum Qualifications Bachelor s or Master s degree in Computer Science, Statistics, Econometrics, Operations Research, or a related technical field. 3 5 years of experience in building, deploying, and monitoring machine learning solutions in production. Must-Have Skills Experience with Databricks and MLflow for model training and deployment. Proven expertise in machine learning model deployment and monitoring in live environments. Strong programming skills in Python, with solid understanding of data structures, algorithms, and OOP concepts. Experience developing RESTful APIs using Flask or Django. Proficient in SQL and NoSQL database operations. Hands-on knowledge of libraries such as Pandas, PySpark, scikit-learn, SQLAlchemy, and Requests. Strong analytical, problem-solving, and debugging skills. Good-to-Have Skills Experience with Kafka streaming and batch processing. Familiarity with CI/CD pipelines and version control systems like Git. Understanding of Python multiprocessing, worker/queue systems, and asynchronous/event-driven programming. This is a unique opportunity to work at the intersection of machine learning and DevOps. You'll play a critical role in operationalizing AI models and making them a core part of our product offerings. If you enjoy building scalable systems and solving real-world ML engineering challenges, we d love to meet you. Qualification : Bachelors or Masters degree in Computer Science, Statistics, Econometrics, Operations Research, or a related technical field
Site Reliability Engineer
Groww
Position: Site Reliability Engineer Location: Bengaluru About Groww At Groww, we re on a mission to make financial services simple, accessible, and transparent for every Indian. As one of India s fastest-growing financial platforms, we help millions take control of their financial future through a wide range of products. We re a team driven by ownership, radical customer-centricity, and a deep passion for challenging the status quo. From intuitive design to robust engineering, everything we build is grounded in what our customers need. If you re excited about building systems that power the future of finance in India, we d love to hear from you. Our Vision To empower every Indian with the knowledge, tools, and confidence to make sound financial decisions. Our goal is to be the most trusted financial partner for millions across the country. Our Core Values Customer Obsession We put our users first, always. Extreme Ownership We own everything we do, end-to-end. Simplicity We keep things simple, effective, and intuitive. Long-term Thinking We focus on sustainable, impactful decisions. Transparency We believe in open communication and collaboration. Role Overview: As a Site Reliability Engineer (SRE) at Groww, you will be responsible for ensuring our systems are highly available, performant, and secure. You will work closely with engineering and infrastructure teams to improve reliability, automate deployments, and manage mission-critical services that power our platform. Key Responsibilities: Monitor and troubleshoot issues related to system performance, availability, and security. Define and maintain SLIs, SLOs, and Error Budgets to improve system reliability. Use tools like Grafana to analyze and report on metrics and trace data. Participate in the on-call rotation for 24/7 support of production systems. Collaborate with developers to ensure scalability and reliability are built into new services. Roll out security and infrastructure features proactively. Manage automated deployments, version control, and release rollouts. Perform Root Cause Analysis (RCA) for incidents and implement long-term fixes. Optimize system performance, conduct capacity planning, and create recovery strategies. Identify and automate repetitive tasks to reduce toil. Leverage CI/CD tools such as Git, Jira, Jenkins to streamline development workflows. Requirements: 4 6 years of relevant experience in SRE, DevOps, or infrastructure engineering. Bachelor's or Master's degree in Computer Science or a related field. Strong background in Linux/Unix system administration and networking. Hands-on experience with cloud platforms like GCP or AWS. Proficiency in programming languages such as Python, Java, or Go. Experience with monitoring and alerting tools: Grafana, Prometheus, New Relic, etc. Familiarity with configuration management tools. Experience with Kubernetes, Docker, and container orchestration tools is a strong plus. Excellent problem-solving, communication, and team collaboration skills. Be a part of one of India s fastest-growing fintech startups. Build and scale systems that impact millions of users daily. Work with passionate, driven teammates who are redefining financial services. A culture that encourages continuous learning, ownership, and transparency. If you're ready to help shape the future of fintech infrastructure in India, Groww is the place for you. Let s build something extraordinary together. Qualification : Bachelor's or Master's degree in Computer Science or a related field
Platform Engineer
Colortokens
Platform Engineer Location: Bengaluru, Karnataka, India Full-time partially remote About ColorTokens At ColorTokens, we empower businesses to stay operational and resilient in an increasingly complex cybersecurity landscape. Breaches happen but with our cutting-edge ColorTokens Xshield platform, companies can minimize the impact of breaches by preventing the lateral spread of ransomware and advanced malware. We enable organizations to continue operating while breaches are contained, ensuring critical assets remain protected. Our innovative platform provides unparalleled visibility into traffic patterns between workloads, OT/IoT/IoMT devices, and users, allowing businesses to enforce granular micro-perimeters, swiftly isolate key assets, and respond to breaches with agility. Recognized as a Leader in the Forrester Wave : Microsegmentation Solutions (Q3 2024), ColorTokens safeguards global enterprises and delivers significant savings by preventing costly disruptions. Our culture We foster an environment that values customer focus, innovation, collaboration, mutual respect, and informed decision-making. We believe in alignment and empowerment so you can own and drive initiatives autonomously. Self-starters and high-motivated individuals will enjoy the rewarding experience of solving complex challenges that protect some of world s impactful organizations be it a children s hospital, or a city, or the defense department of an entire country. Position Overview: Colortokens is looking for a Junior Platform Administrator to assist in managing, maintaining, and optimizing our NextGen Security Information and Event Management (SIEM) platform. The ideal candidate will support the day-to-day operations, help onboard customer log sources, troubleshoot integration issues, and provide technical assistance to the security operations team. This role is ideal for a motivated professional with 3+ years of experience in SIEM administration, security operations, or log management. Key Responsibilities: SIEM Platform Administration Assist in deploying, configuring, and maintaining the NextGen SIEM platform (e.g., Stellar Cyber, Splunk, Sentinel, QRadar, Chronicle, Exabeam). Perform basic updates and patches to ensure platform security and functionality. Monitor SIEM health, performance, and uptime under the guidance of senior administrators. Log Source Management Onboard new log sources and validate data ingestion. Help troubleshoot log ingestion, parsing, and formatting issues. Maintain log retention policies for compliance. Rule and Use Case Management Support the development and deployment of detection rules, correlation use cases, and alerts. Tune existing use cases to minimize false positives. Work closely with security analysts to refine alerting strategies. Integration and Automation Assist in integrating SIEM with other security tools (e.g., EDR, microsegmentation, vulnerability scanners). Work on basic automation tasks using scripting (Python, PowerShell) to enhance SIEM efficiency. Platform Security and Compliance Support role-based access control (RBAC) and platform security policies. Help ensure SIEM adheres to compliance standards like SOC2, ISO 27001. Participate in periodic security audits. Network Debugging & Troubleshooting Have a basic understanding of TCP/IP, networking concepts, and protocols. Assist in debugging network connectivity issues related to SIEM log ingestion. Use basic network troubleshooting tools. Collaboration and Support Work alongside SOC analysts, threat hunters, and security engineers. Provide basic technical support for SIEM users. Assist in training and documentation for security teams. Performance Monitoring and Optimization Monitor storage and indexing performance to ensure optimal operations. Report any performance issues to senior administrators. Contribute to platform health reports and alerting metrics. Incident Support Assist SOC teams in log analysis, incident response, and forensic investigations. Ensure log data is readily available for security incidents. Education and Certifications: Bachelor s degree in Computer Science, Information Security, or a related field. Certifications (Preferred but not mandatory): Splunk Certified User/Admin Microsoft Certified: Security Operations Analyst Associate QRadar Certification Any SIEM-related certification Experience: 3+ years of experience in SIEM administration, security operations, or log management. Hands-on experience with at least one SIEM platform (e.g., Stellar Cyber, Splunk, Sentinel, Chronicle, Exabeam). Basic knowledge of log ingestion, rule creation, and data parsing. Exposure to scripting (Python, PowerShell) for automation. Basic understanding of TCP/IP networking concepts and network debugging. Technical Skills: Understanding of log formats, Syslog, JSON, XML, and data pipelines. Basic knowledge of querying languages (KQL, SPL, AQL). Familiarity with SIEM integration with security tools like EDR, SOAR, NDR. Awareness of MITRE ATT&CK, NIST, or CIS security frameworks. Basic experience with network troubleshooting tools (ping, traceroute, netcat (nc)). Soft Skills: Strong problem-solving and troubleshooting abilities. Good verbal and written communication skills. Ability to work collaboratively in a security operations environment. Preferred Skills: Basic understanding of cloud-based security solutions (AWS, Azure, Google Cloud). Exposure to SOAR tools (e.g., Cortex XSOAR, Splunk Phantom). Interest in machine learning-based anomaly detection for SIEM. Key Metrics for Success: Successful onboarding of log sources. Improvement in log ingestion and parsing accuracy. Contribution to fine-tuning detection rules. Timely resolution of SIEM-related support requests. Ability to identify and troubleshoot basic network connectivity issues.
Senior Full Stack Engineer
Commure
Job Title: Senior Full Stack Engineer Location: Bengaluru, India Employment Type: Full-time Department: Engineering About Commure At Commure, we empower healthcare providers by reducing administrative burdens and enabling more time for patient care. Our suite of software and hardware solutions including AI-powered assistants, RTLS, and workflow automation are used by over 250,000 clinicians across hundreds of care sites. From clinical documentation and staff safety to patient engagement and remote monitoring, we're transforming healthcare through technology. With the industry entering a pivotal phase of AI-driven transformation, Commure is leading the charge. About the Role As a Senior Full Stack Engineer on our Patient Experience Platform team, you'll design and build intuitive, secure, and scalable web applications that enhance patient engagement and streamline healthcare workflows. This is a high-impact role contributing to mission-critical projects with real-world outcomes. Key Responsibilities Design and develop full-stack applications that connect patients and healthcare providers. Lead architectural decisions to scale and evolve the platform. Work closely with product, design, QA, and DevOps teams to gather requirements, define solutions, and deliver features. Optimize system performance, reliability, and observability using logging, monitoring, and tracing tools. Maintain cloud infrastructure using Infrastructure-as-Code (IaC) for reproducibility and reliability. Enhance alerting systems to reduce noise and improve incident response. Develop secure authentication and authorization systems that comply with industry standards. Build and maintain CI/CD pipelines, supporting a robust and compliant deployment process. Participate in on-call rotations and production support. Document processes, configurations, and troubleshooting steps for internal knowledge sharing. Promote a culture of engineering excellence through code reviews, best practices, and mentorship. Qualifications Required Bachelor s or Master s degree in Computer Science, Engineering, or a related field. 3+ years of experience in full-stack software development. Proficiency in: Front-end: TypeScript, React, Next.js Back-end: Python and Node.js Cloud Platforms: AWS, GCP, or Azure CI/CD: GitHub Actions, Google Cloud Build Version Control: Git Containerization: Docker and Kubernetes Monitoring/Logging: Cloud-native tools and observability practices Experience with production incident support and on-call rotations. Strong communication, collaboration, and leadership skills. Preferred Familiarity with serverless architectures and microservices. Knowledge of healthcare data standards like HL7, FHIR, and HIPAA compliance. Experience optimizing performance for large-scale distributed systems. Why Join Commure + Athelas Mission-Driven Impact: Transforming healthcare, the largest sector in the country. Top-Tier Investors: Backed by General Catalyst, Sequoia, Y Combinator, Lux, and more. Exceptional Growth: Combined organizations growing 500% YoY, with Series D funding and strong runway. Comprehensive Benefits: Competitive compensation, flexible PTO, medical/dental/vision insurance, parental leave (location-dependent). Join us and help power the future of patient care. Qualification : Bachelors or Masters degree in Computer Science, Engineering, or a related field.
Senior Software Engineer, Customer Solutions
Commure
Job Title: Senior Software Engineer Customer Solutions Location: Bengaluru, India Employment Type: Full-time Department: Engineering About Commure Commure is revolutionizing healthcare with AI-powered technologies designed to eliminate administrative overhead and give clinicians more time with patients. Our platform combines advanced LLM AI, RTLS, and workflow automation to streamline clinical operations, improve patient engagement, and enhance care delivery. We support 250,000+ clinicians across hundreds of care sites nationwide and we re just getting started. If you're passionate about building life-changing solutions in one of the world s most vital industries, now is the time to join. About the Role As a Senior Software Engineer on the Customer Solutions team, you ll be instrumental in building and customizing applications on top of our Patient Experience Platform to address client-specific needs. Your work will directly impact how healthcare providers interact with our technology and serve patients better. Key Responsibilities Translate business and client requirements into scalable, maintainable technical solutions. Design, develop, and integrate customized applications and services using our core platform. Collaborate with internal teams and customers to prioritize features and maintain a customer-focused development backlog. Build long-term client relationships through technical leadership and delivery excellence. Implement and maintain observability through logging, monitoring, and alerting systems. Apply SRE and DevOps practices to improve stability and incident response. Coordinate testing and quality assurance activities in collaboration with QA teams. Stay informed on healthcare tech trends and integrate innovations into the platform. Participate in client-facing meetings to advise on feasibility, risks, and technical trade-offs. Mentor junior engineers and contribute to a strong engineering culture. Required Qualifications Bachelor's or Master s degree in Computer Science, Engineering, or a related field. 3+ years of professional software development experience. Frontend: React, Next.js, TypeScript Backend: Python, Node.js Cloud: Proficiency in AWS, Azure, or GCP with experience in cloud-native architectures CI/CD: Familiarity with tools like GitHub Actions, Google Cloud Build, etc. Infrastructure: Experience with Docker, Kubernetes, and IaC principles Monitoring & Observability: Implemented logging, tracing, and alerting systems Production Support: Experience with on-call rotations and incident response Strong communication and collaboration skills with cross-functional teams Experience working directly with clients to deliver technical solutions Understanding of APIs, webhooks, and third-party system integrations in healthcare Preferred Skills Familiarity with HIPAA, FHIR, HL7, and other healthcare standards Understanding of data privacy, compliance, and security best practices Strong problem-solving abilities and adaptability in dynamic environments Experience in client support, customization, or professional services engineering is a plus Why You ll Love Working at Commure + Athelas Mission-Driven Work Help transform healthcare through meaningful technology. Elite Backing Backed by General Catalyst, Sequoia, Y Combinator, and more. Explosive Growth 500%+ YoY growth pre-merger and Series D funded. Competitive Benefits Flexible PTO, health insurance, parental leave, and more (location-specific). Be part of the future of healthcare. Join Commure and help build intelligent, scalable systems that truly matter. Qualification : Bachelor's or Masters degree in Computer Science, Engineering, or a related field.
Software Engineer, Backend (AI Team)
Limechat
Job Title: Software Engineer, Backend (AI Team) Location: Bengaluru, India Company: LimeChat About LimeChat LimeChat is building the future of conversational commerce enabling brands to deliver human-level interactions at scale via WhatsApp and other messaging platforms. As a proud graduate of Y Combinator s Winter 2021 batch, we serve 300+ top-tier brands like HUL, ITC, Wow Skin Science, Piramal Health, and Snitch. Our mission is simple: use Generative AI to automate and personalize customer interactions in e-commerce and now expanding into BFSI, Health, and Retail sectors. If you're a backend engineer who thrives on impact, collaboration, and building innovative systems at scale, this is your opportunity to do work that truly matters. What You ll Do Architect and Develop Backend Systems: Design robust, scalable backend architectures for AI products that handle millions of conversations. Integrations and APIs: Build and maintain seamless, secure integrations with third-party platforms and internal services. Work with AI Products: Collaborate with ML engineers and product teams to connect AI models and agents to real-time customer journeys. Database Management: Design and optimize relational and NoSQL databases (e.g., PostgreSQL, MongoDB). Performance & Reliability: Identify bottlenecks and implement backend improvements to ensure high performance and reliability. Collaborate Cross-Functionally: Work closely with product, design, and frontend teams to ship features that delight users. Write Clean, Maintainable Code: Follow best practices in code quality, documentation, and testing. Participate in peer code reviews. You Should Have Must-Haves 2 4 years of backend experience in a high-growth tech/startup environment Proficiency in Python, Node.js, and frameworks like Django Strong command of SQL and NoSQL databases (e.g., PostgreSQL, MongoDB) Solid understanding of RESTful API design and best practices Experience with Git, code reviews, and agile development workflows Excellent debugging and code analysis skills, including performance optimization Nice-to-Haves Hands-on experience with Docker, Kubernetes, or other container orchestration tools Familiarity with CI/CD pipelines (GitLab CI, Jenkins, etc.) Experience with API load testing, monitoring, and observability tools Exposure to AI/ML pipelines and/or conversational AI systems Why You ll Love Working Here Massive Impact: Join a lean, fast-moving team where your work directly influences product and user experience. Innovation-First Culture: Work at the intersection of AI, automation, and customer experience. Smart Team: Collaborate with ex-founders, IITians, and top engineers. Fast-Growth Startup: Backed by leading VCs and part of Y Combinator, we re scaling globally. Ownership and Autonomy: You ll be trusted to take full ownership and drive initiatives end to end. Quotes We Live By It s okay to fail. It s not okay to not try. Do the right thing when others are not looking. Apply now and be part of the LimeChat revolution.
Lead Devops Engineer
Neuron7.ai
Lead DevOps Engineer Location: Bengaluru, India Employment Type: Full-time, Hybrid About Neuron7.ai Neuron7.ai is a rapidly growing AI-first SaaS company that is revolutionizing the world of service intelligence. Backed by top-tier venture capitalists in Silicon Valley and a distinguished group of angel investors, we are recognized as a startup to watch. Our platform enables enterprises to make accurate service decisions at scale by delivering service predictions in seconds through the analysis of both structured and unstructured data. At Neuron7.ai, you will be part of a dynamic and innovative team that is pushing the boundaries of service intelligence. We value creativity, collaboration, and a relentless commitment to innovation. This is your opportunity to make a meaningful impact on cutting-edge products at scale, in a fast-growing startup environment. About the Team Join a passionate team of professionals focused on optimizing our infrastructure, deployment processes, and overall system performance. We foster a culture of continuous improvement, where every team member is encouraged to contribute ideas and drive impactful projects. As a Lead DevOps Engineer, you will play a pivotal role in shaping the evolution of our infrastructure and operational efficiency. What You ll Do: CI/CD Pipelines: Lead the design, implementation, and management of CI/CD pipelines to automate and streamline deployment processes. Collaboration: Work closely with software development and IT teams to enhance workflows and ensure efficient release cycles. System Monitoring: Monitor and troubleshoot system performance to ensure high availability and reliability of applications across environments. Cloud Infrastructure: Architect and manage cloud infrastructure (AWS, Azure, GCP) for scalable, secure, and performant application environments. Automation: Automate infrastructure provisioning and configuration management using tools like Terraform, Ansible, or similar technologies. Security & Compliance: Conduct regular system audits, implement security best practices, and ensure compliance with industry standards. Mentorship: Mentor and guide junior DevOps engineers, fostering a collaborative, knowledge-sharing, and growth-focused environment. Documentation: Document processes, configurations, and standard operating procedures to enhance team efficiency and maintain operational excellence. What We re Looking For: Experience: 8+ years of experience in DevOps engineering or a related field. Cloud Expertise: Extensive knowledge and hands-on experience with cloud platforms (AWS, Azure, GCP) and associated services (EC2, S3, Lambda, etc.). Containerization: Strong experience with containerization technologies such as Docker and Kubernetes for managing microservices. Automation Skills: Proficiency in scripting languages (Python, Bash, Ruby) for automation tasks and infrastructure-as-code management. Monitoring & Logging: Familiarity with monitoring and logging tools such as Prometheus, Grafana, and the ELK stack. Problem-Solving: Excellent problem-solving skills with a proactive, solutions-oriented mindset for resolving operational challenges. Collaboration: Strong communication skills with the ability to work collaboratively across teams and influence operational best practices. What We Do and Value: At Neuron7.ai, we prioritize integrity, innovation, and a customer-centric approach. Our mission is to use advanced AI technology to improve service decision-making and we are committed to delivering excellence in all aspects of our work. Company Perks & Benefits: Competitive salary, equity, and spot bonuses Paid sick leave Latest MacBook Pro for your work Comprehensive health insurance Paid parental leave Work from home or from our vibrant Bengaluru office with flexible work arrangements Our Commitment to Diversity and Inclusion: Neuron7.ai is committed to fostering a diverse and inclusive workplace. We ensure equal employment opportunities without discrimination based on race, color, religion, sex, sexual orientation, gender identity, age, disability, national origin, marital status, or any other characteristic protected by law. If you re passionate about optimizing deployment processes, improving infrastructure, and driving operational excellence, we d love to hear from you!
Devops Engineer
Sarvam
DevOps Engineer Location: Bengaluru, Karnataka, India (On-Site) Department: Engineering Employment Type: Full-Time About Sarvam.ai Sarvam.ai is a cutting-edge generative AI startup headquartered in Bengaluru, India, with a mission to make generative AI accessible and impactful for Bharat. Founded by AI experts, we are dedicated to developing high-performance, cost-effective AI agents tailored for the Indian market. We enable enterprises to tap into new opportunities, build deeper customer connections, and reshape the future of AI for India and beyond. Role Overview We are looking for a DevOps Engineer to join our team and help build and manage scalable, secure, and high-performance infrastructure. In this role, you will be a key contributor to automating deployments, managing cloud infrastructure, optimizing CI/CD workflows, and ensuring system reliability. You will work with cutting-edge technologies, including cloud platforms, containerization, and infrastructure as code (IaC), to deliver impactful solutions for AI-driven products. Key Responsibilities CI/CD Pipelines: Design, implement, and manage CI/CD pipelines for seamless software deployment and integration. Cloud Infrastructure: Deploy and manage cloud infrastructure using Terraform, Kubernetes, and Docker for scalability and high performance. Automation & Scaling: Automate infrastructure provisioning, scaling, and security compliance to support high-availability environments. Monitoring & Optimization: Implement logging, monitoring, and alerting solutions using tools like Prometheus, Grafana, ELK Stack, or CloudWatch to monitor system performance and optimize resource utilization. Security & Compliance: Enhance security and compliance by managing IAM policies, encryption, and vulnerability scanning. Troubleshooting & Root Cause Analysis: Troubleshoot system failures, perform root cause analysis, and implement improvements to ensure reliability and uptime. Collaboration: Work closely with development teams to ensure smooth deployment and operation of AI models and applications. Must-Have Skills & Qualifications Educational Background: Bachelor s degree in Computer Science, Engineering, or related field (2024/2025 graduates). Cloud Expertise: Strong experience with AWS, Azure, or GCP for deploying and managing cloud-based applications. Containerization: Proficiency in Docker and Kubernetes for building and managing containerized applications. Infrastructure as Code (IaC): Experience with Terraform, Ansible, or CloudFormation to automate infrastructure management. CI/CD Pipelines: Experience in setting up automated workflows using tools like GitHub Actions, Jenkins, or GitLab CI/CD for smooth deployments. Monitoring & Logging: Experience with Prometheus, Grafana, ELK, or similar tools to implement effective monitoring and logging solutions. Networking & Security: Strong understanding of firewalls, VPNs, SSL, and cloud security best practices for secure infrastructure. Version Control: Proficiency with Git for managing code repositories and version control workflows. Problem Solving: Strong debugging, troubleshooting, and analytical skills to resolve complex system issues. Good to Have (Preferred Experience) Serverless Computing: Exposure to serverless computing models such as AWS Lambda or Azure Functions. Message Queues: Experience with message queues like Kafka, RabbitMQ, or SQS. Site Reliability Engineering (SRE): Familiarity with SRE practices to ensure the reliability and availability of large-scale systems. Open Source Contributions: Contributions to open-source projects or a strong GitHub portfolio showcasing DevOps expertise and best practices. Impactful Work: Work on AI-driven products that are reshaping the future of technology in India. Innovative Team: Collaborate with a team of AI experts and engineers pushing the boundaries of technology. Career Growth: Opportunity to grow in a fast-growing startup at the forefront of the generative AI revolution. Cutting-edge Technologies: Work with cloud technologies, automation, and AI infrastructure to create high-impact products. Qualification : Bachelors degree in Computer Science, Engineering, or related field
Principal Cloud Development Engineer
Cloud Software Group
Job Title: Principal Cloud Development Engineer Location: Bengaluru, India About Cloud Software Group: Cloud Software Group (CSG), home to Citrix and TIBCO, is one of the largest global providers of cloud-based technologies, empowering over 100 million users worldwide. As a Principal Cloud Development Engineer, you will play a pivotal role in shaping the future of Desktop-as-a-Service (DaaS) solutions helping deliver secure, scalable, and intelligent platforms that drive modern work experiences from anywhere. We re entering an era of accelerated innovation and transformation now is the perfect time to bring your technical leadership, cloud expertise, and mentorship mindset to the forefront. About This Team: The DaaS team at CSG is responsible for designing and building scalable and resilient cloud-native microservices that power Citrix s core virtualization offerings. This team collaborates across product, architecture, operations, and customer success groups to build next-gen capabilities on Azure, AWS, and other hybrid environments. Your Role and Responsibilities: As a Principal Cloud Development Engineer, you will be expected to: Lead design and architecture discussions for cloud-native solutions within the Citrix DaaS product line. Drive the development of scalable and secure backend features, with emphasis on business logic, cloud security, and performance. Mentor junior and senior engineers, guiding them in coding best practices, design decisions, and technical growth. Collaborate with Product Managers, UX Designers, Support, and Site Reliability Engineers to build customer-centric features and maintain high service uptime. Contribute to strategic technical initiatives, including the adoption of Gen AI tools, DevSecOps automation, and performance tuning of production systems. Participate in on-call escalation support, helping debug complex issues and lead incident resolution. Promote a culture of continuous learning and improvement through code reviews, technical sessions, and post-incident analysis. Required Experience and Skills: 14+ years of experience in cloud software development using .NET (C#), Java, or equivalent Object-Oriented Programming languages. Strong computer science fundamentals (algorithms, data structures, systems design). Proven track record in building and leading cloud-native microservices with modern deployment practices (CI/CD, IaC, Kubernetes, Docker). Strong cloud platform expertise, especially in Microsoft Azure or Amazon EC2. Deep understanding of cloud security, including identity/access management, encryption, compliance, and incident response. Advanced knowledge in automation scripting (Python, PowerShell). Familiarity with troubleshooting tools like Sumo Logic, Splunk, or equivalent observability platforms. Experience with Terraform, CI/CD pipelines, and managing Kubernetes-based deployments. Strong communication, collaboration, and mentoring abilities. Preferred Qualifications: Prior experience building secure services in the DaaS, VDI, or enterprise SaaS domain. Hands-on experience with Azure Active Directory, Microsoft AD, or other identity solutions. Moderate understanding of cryptographic protocols and encryption standards. Familiarity with Agile/SAFe development methodologies. Contributions to open-source or technical publications are a plus. Impact: Influence the architecture and direction of mission-critical cloud platforms used globally. Mentorship: Be a technical leader shaping the next generation of engineers. Innovation: Work with a company at the edge of a "Cambrian leap" in cloud evolution. Culture: Inclusive, forward-thinking, and driven by curiosity and collaboration. Flexibility & Benefits: Competitive salary, performance bonus, flexible work model, health insurance, wellness programs, and more. Equal Opportunity Statement: Cloud Software Group is committed to Equal Employment Opportunity and prohibits unlawful discrimination of any kind. All qualified applicants will receive consideration without regard to race, color, religion, gender, gender identity or expression, national origin, age, disability, veteran status, or any other characteristic protected by law.
Senior Software Development Engineer Idc Vn Edge
Oracle
Job Description: Senior Software Development Engineer - Oracle Cloud Infrastructure Core Services Development Team Role: Senior Software Development Engineer Team: OCI Virtual Networking Core Services Development Team Location: India Career Level: IC3 Experience: 4+ years Overview: Oracle's Cloud Infrastructure (OCI) is building state-of-the-art infrastructure-as-a-service (IaaS) technologies that operate at high scale across a globally distributed, multi-tenant cloud. The OCI Virtual Networking team is at the heart of this effort, developing distributed, highly available virtual networking services. This team is responsible for foundational cloud services, such as the Virtual Cloud Network (VCN), VPN, Customer Cloud Connectivity, Network Firewalls, and other edge services. As a Senior Software Development Engineer, you will be responsible for designing, developing, and optimizing complex distributed systems that interact with end users and network infrastructure. Your role will involve working on distributed services, developing algorithms for efficient data transfer across networks, and ensuring scalability and reliability within Oracle's cloud environment. You will work closely with a collaborative, agile team of engineers while contributing to building the future of cloud networking services. Key Responsibilities: Software Development & Design: Design, develop, and implement distributed networking services within OCI's Virtual Cloud Network (VCN). Focus on writing clean, maintainable, and optimized code to enhance performance and scalability. Develop and optimize algorithms to ensure efficient data transfer and network operations across the distributed cloud infrastructure. Ensure the performance and scalability of the code, especially when deployed in a cloud environment. Collaboration & Agile Work Environment: Collaborate closely with cross-functional teams in a fast-paced, agile development environment. Participate in the full software development lifecycle, from planning and design to testing and deployment. Work with other team members to ensure the integration of various OCI services, with a focus on automation and scalability. Operational Support & Troubleshooting: Contribute to the operational support of production services, including on-call duties. Troubleshoot and resolve complex issues, ensuring high availability and reliability of networking services. Provide technical leadership and contribute to the continuous improvement of the services. Leadership & Mentorship: Take ownership of parts of the service and its components, leading from design to implementation. Mentor junior engineers and provide technical guidance and support. Share knowledge and contribute to the team s growth through code reviews, knowledge-sharing sessions, and coaching. Technical and Professional Requirements: Programming Expertise: Expert-level experience with Java in developing large-scale, high-performance applications. Experience in concurrent programming and the design of distributed systems. Proficiency in solving complex problems related to scalability, performance, and reliability in cloud environments. Cloud & Distributed Systems: Experience in building and maintaining distributed, scalable services, especially within cloud infrastructures. Strong knowledge of cloud technologies and networking protocols. System Design & Optimization: Solid understanding of system architecture, including how components interact in a distributed, cloud-based system. Ability to optimize code for performance and scalability in production environments. Operational Understanding: Experience in operating production services and providing support during on-call rotations. Understanding of troubleshooting complex system issues, particularly in a distributed cloud environment. Team Collaboration & Communication: Ability to work in a collaborative and agile team environment. Strong verbal and written communication skills for effective coordination across teams. Preferred Qualifications: Experience in Large-Scale Distributed Services: Prior experience in building and scaling distributed services, particularly in cloud or network-related domains. Python Skills: Knowledge of Python for scripting, automation, and solving network-related problems is a plus. Additional Skills: Experience with cloud services, such as VPN, firewalls, network connectivity, and network security. Exposure to containerization technologies such as Docker and orchestration tools like Kubernetes is advantageous. Educational Requirements: Bachelor s or Master s degree in Computer Science, Electrical/Hardware Engineering, or a related field. At Oracle, you will have the opportunity to work on cutting-edge technologies that power cloud networking at a global scale. You will be part of a dynamic and innovative team, contributing to the development of highly scalable and distributed networking services within Oracle's cloud infrastructure. Your expertise will be crucial to driving the evolution of cloud technologies, and you will have a chance to mentor junior engineers while working in a collaborative, fast-paced environment. Qualification : Bachelors or Masters degree in Computer Science, Electrical/Hardware Engineering, or a related field.
Site Reliability Developer 2/3
Oracle
Job Description: Site Reliability Engineer - OCI Cloud Engineering Team Role: Site Reliability Engineer (SRE) Team: OCI OLTP (Online Transaction Processing) Location: Kiev Career Level: IC2 Experience: 5+ years Overview: Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you! As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure. This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement. Key Responsibilities: Cloud Service Operations & Reliability: Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment. Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services. Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions. Automation & Improvements: Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime. Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services. Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations. Security & Incident Response: Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards. Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution. Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance. Collaborative Problem-Solving: Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services. Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders. Documentation & Knowledge Sharing: Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals. Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations. Continuous Learning: Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work. Technical and Professional Requirements: Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments. Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations. Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes). Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems. Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations. Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction. Preferred Qualifications: Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability. Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools. Experience in networking and storage technologies in a cloud environment. Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services. If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!
Senior DevOps / Site Reliability Engineer
Blue Yonder
Job Title: Senior DevOps / Site Reliability Engineer Location: Pune, India Company: Blue Yonder Experience: 10 to 13 years Education: Bachelor s Degree in Computer Science, Engineering, or related STEM fields Company Overview Blue Yonder is a leading AI-driven Global Supply Chain Solutions provider and consistently recognized as one of Glassdoor s Best Places to Work. We are driving the next wave of digital transformation in manufacturing and retail, delivering innovative SaaS solutions that power intelligent supply chains across the globe. We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to lead the design, development, deployment, and operational management of our Azure SaaS solution. This role requires strong DevOps, cloud delivery, and infrastructure automation expertise, along with leadership capabilities to guide a growing global team. Role Overview In this role, you will be responsible for architecting, planning, and executing end-to-end delivery pipelines, supporting both product development and operational stability. Working closely with platform, product, and architecture teams, you will implement best-in-class DevOps and SRE practices, ensuring scalability, resilience, and cost optimization. Key Responsibilities Architect, design, and manage CI/CD pipelines and infrastructure for a cloud-native, multi-tenant SaaS solution on Azure. Lead sprint planning, backlog grooming, and architecture discussions. Develop quality automation scripts and tools to reduce manual efforts and enable self-healing, self-service capabilities. Identify and resolve operational bottlenecks and proactively improve observability (monitoring, alerting, logging). Participate in code reviews, ensure secure and scalable designs, and mentor junior and mid-level engineers. Collaborate with stakeholders to understand business and technical requirements and translate them into actionable user stories. Implement and enforce cloud cost optimization strategies. Conduct post-incident reviews with a blameless culture to identify root causes and drive continuous improvements. Automate service requests and standard operational procedures. Drive improvements to the team s continuous integration pipeline, ensuring rapid and reliable deployments. Stay updated with the latest DevOps, SRE, and cloud technologies and bring innovative ideas to the table. Participate in team hiring and actively contribute to onboarding new team members. Technical Environment Languages: Java, Python, PowerShell, Shell Scripting DevOps Tools: Azure DevOps, GitHub Actions, Jenkins Cloud: Microsoft Azure (ARM Templates, AKS, Event Hub, HDInsight, Azure AD, Application Gateway, Virtual Networks) Architecture: Microservices, Kubernetes, Docker, Event-driven architecture Frameworks: Spring Boot, Hibernate Monitoring & Logging: Elasticsearch, Spark, Kafka Databases: RDBMS, NoSQL Version Control: Git Required Skills & Experience Bachelor s Degree (STEM preferred) with 10 to 13 years of experience in DevOps, Cloud Delivery, or Site Reliability Engineering. Proven hands-on experience with Azure Cloud Services. Expertise in setting up and optimizing CI/CD pipelines. Strong scripting experience: Shell and PowerShell are mandatory; Python is a plus. Strong understanding of container technologies (Docker, Kubernetes) and microservices architecture. Experience integrating and managing third-party monitoring and logging tools. Strong problem-solving skills and ability to work with global, cross-functional teams. Excellent communication and stakeholder management skills. Nice to Have Development experience in Java or Python. Experience working in agile teams with a product-centric mindset. Experience working in manufacturing or retail domains. Exposure to AI/ML-driven monitoring and observability tools. Work with cutting-edge technologies on globally impactful solutions. Collaborate with diverse and talented teams across the US, India, and the UK. Foster your career growth through mentorship, continuous learning, and leadership opportunities. Experience an inclusive, flexible work culture where innovation and creativity thrive. Diversity, Inclusion, Value & Equality (DIVE) At Blue Yonder, we are committed to building an inclusive environment where everyone feels empowered to be themselves. All qualified applicants will receive consideration for employment regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. Qualification : Bachelors Degree in Computer Science, Engineering, or related STEM fields
Senior Site Reliability Engineer
Couchbase
Job Title: Site Reliability Engineer (SRE) Cloud Platform & Production Pipeline Initiatives Location: Bangalore, India (Office-based role) About Couchbase: As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI-driven world. By uniting transactional, analytical, mobile, and AI workloads into a seamless, fully managed solution, Couchbase empowers developers and enterprises to build and scale applications with unmatched flexibility, performance, and cost-efficiency from cloud to edge. Trusted by over 30% of the Fortune 100, Couchbase is unlocking innovation, accelerating AI transformation, and redefining customer experiences. Come join our mission! Job Overview: As a Site Reliability Engineer (SRE), you will play a pivotal role in managing, optimizing, and maintaining Couchbase s cloud infrastructure for Capella, our Database as a Service (DBaaS) platform. You will be responsible for ensuring the reliability and performance of our cloud service while collaborating closely with engineering teams to improve deployment pipelines, security practices, and overall system health. You will work across cloud platforms and multiple tools to provide guidance, mentorship, and contribute to the strategic direction of cloud operations. Responsibilities: Infrastructure Management: Manage, monitor, and maintain the infrastructure for Capella to ensure reliable operations. Security & Compliance: Implement and manage cloud environments in accordance with company security guidelines, including vulnerability management, penetration testing, and compliance requirements (SOC 2, PCI-DSS, GDPR, HIPAA, etc.). CI/CD & Release Pipeline: Collaborate with engineering teams to optimize CI/CD processes, aiming for a highly resilient deployment strategy, ideally with zero downtime. Cloud Optimization: Stay up-to-date with new technologies and industry trends to continuously improve cloud platform architecture and meet the evolving needs of the business. Security Integration: Work with development teams to integrate security scanners within the DevOps lifecycle, enhancing security posture. Leadership & Mentorship: Provide guidance on architecture, code reviews, and technical feedback to improve service reliability, security, cost, and performance. Incident Management: Demonstrate exceptional problem-solving skills, proactively identifying and addressing potential issues before they affect business operations. Collaboration: Partner with development teams, application owners, and stakeholders to integrate best practices and ensure seamless service delivery. Requirements: Experience: 5+ years in Site Reliability Engineering (SRE), DevSecOps, or similar roles, with significant experience working in public cloud environments. Programming & Scripting: Proficiency in languages such as Go, Python, Java, or Ruby. Linux Expertise: High proficiency with Linux operating systems. Kubernetes Management: Experience in managing and maintaining Kubernetes clusters (both self-managed and managed platforms like AWS EKS). Security & Vulnerability Management: In-depth knowledge of security tools and practices (vulnerability management, pen testing, SCA, DAST, SAST), with hands-on experience using tools like Sysdig, Synk, and Blackduck. Cloud Platforms & Tools: Strong experience with cloud platforms (AWS, GCP, Azure) and open-source tools like Artifactory, Jira, Jenkins, Grafana, Prometheus, Datadog, Thanos, etc. Configuration Management: Proficiency with Terraform, Git, and CI/CD platforms (e.g., CircleCI, GitHub, Spinnaker). Networking Security: Solid understanding of TCP/IP, DNS, HTTP, Firewalls, VPNs, and other networking security concepts. Preferred Skills: Availability & Reliability: Knowledge of SLO/SLA, availability, reliability, and performance concepts. Incident Management: Experience with on-call rotations and incident management. Database Experience: Familiarity with databases, particularly Couchbase. Security Certifications: Relevant certifications in security or cloud technologies are a plus. Couchbase reimagines database technology to deliver a fast, flexible, and affordable cloud database platform, empowering developers to build applications with exceptional customer experiences. Trusted by over 30% of the Fortune 100, Couchbase drives innovation and customer success through its Capella platform. Benefits at Couchbase: Generous Time Off Program: Flexibility to care for yourself and your family. Wellness Benefits: Access to world-class medical plans, dental, vision, life insurance, and employee assistance programs. Financial Planning: RSU equity program, ESPP, retirement planning, and business travel insurance. Career Growth: Focused on your career development and success. Fun Perks: Ergonomic and comfortable office setup, food & snacks for in-office employees, and more!
Devops
Mirafra Technologies
DevOps Engineer Location: Bangalore Experience: 5+ Years Education Qualification: B.E. in Computer Science / Electronics About Mirafra Founded in 2004, Mirafra is a fast-growing global product engineering services company specializing in Semiconductor Design, Embedded Systems, Digital Solutions, and Application Software. With over 1,500+ professionals worldwide, we provide cutting-edge solutions to Fortune 500 clients across industries such as Semiconductor, Internet, Aerospace, Networking, Telecom, Medical Devices, and Consumer Electronics. Recognitions: Best Company to Work For SiliconIndia (2016) Most Promising Design Services Provider SiliconIndia (2018) Top 10 Admired Companies for Software Services DigiTech Insight (2022) Key Responsibilities DevOps & Automation Develop automated CI/CD pipelines and manage build & deployment processes. Implement infrastructure automation using scripting (Shell, Batch, Python). Manage configuration, integration, and deployment using DevOps tools. Version Control & Build Management Work with Git, Gitlab, Bitbucket for version control. Maintain build systems like Make, CMake and manage dependencies using Pip, Conda, Poetry, Maven. Handle binary management tools like Artifactory, Nexus. Code Quality & Security Utilize Static Code Analysis tools (SonarQube, Pylint, Coverity) for code quality enforcement. Monitor and ensure security compliance in the DevOps lifecycle. Cloud & Containerization Manage cloud-based deployments and monitoring using ELK, Docker, Kubernetes. Implement scalable and resilient infrastructure solutions. Agile & Collaboration Work in an Agile/Scrum environment, collaborating with cross-functional teams. Utilize UML modeling and software development best practices. Skills & Qualifications Education: B.E. in Computer Science / Electronics Technical Expertise: Scripting & Automation: Shell, Batch, Python CI/CD & Build Tools: Jenkins, Gitlab, Make, CMake Version Control: Git, Bitbucket, Gitlab SCM Static Code Analysis: SonarQube, Pylint, Coverity Package Management: Pip, Conda, Poetry, Maven Binary Management: Artifactory, Nexus Cloud & Containerization: Docker, Kubernetes, ELK Stack Programming Languages: Python, C, C++ Operating Systems: Linux, Unix, Windows Soft Skills: Strong problem-solving and analytical skills. Excellent communication and team collaboration. Ability to work in fast-paced Agile environments. Cutting-edge projects in Semiconductor, Aerospace, Networking, and IoT. Global work environment with top-tier clients. Career growth opportunities and exposure to the latest technologies. Award-winning workplace culture and industry recognition. Excited to take on a challenging DevOps role? Apply now!
Autoit Solutioning Engineer, Lead
Qualcomm
Job Title: Site Reliability Engineer (SRE) General Summary: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. This role is critical in ensuring the stability, scalability, and security of our infrastructure and services. As an SRE, you will work collaboratively with software engineers, data scientists, and product managers to optimize system reliability while driving automation and continuous improvement. You will be responsible for modernizing traditional services, implementing cutting-edge technology, and proactively managing infrastructure to maintain operational excellence. If you are passionate about automation, DevSecOps, system performance, and infrastructure resilience, this role offers an exciting opportunity to make a meaningful impact. Key Responsibilities: System Monitoring & Incident Response: Continuously monitor system health, detect anomalies, and respond to incidents promptly. Investigate and troubleshoot service-related issues, ensuring minimal disruption. Implement proactive measures to prevent downtime and optimize system stability. Infrastructure Automation & DevOps Implementation: Develop and maintain Infrastructure-as-Code (IaC) scripts to automate deployments and scaling. Automate routine operational tasks to improve efficiency and reduce manual intervention. Leverage DevSecOps practices to ensure secure and resilient deployments. Performance Optimization & Capacity Planning: Collaborate with development teams to enhance software performance and system responsiveness. Identify and resolve system bottlenecks to improve speed, efficiency, and reliability. Forecast resource requirements based on traffic patterns and business growth. Security, Compliance & Risk Management: Implement security best practices and compliance measures across all infrastructure layers. Conduct security audits and ensure systems meet industry-standard security guidelines. Proactively assess and mitigate risks associated with infrastructure and deployments. Required Qualifications & Skills: Technical Expertise: Extensive experience with Linux-based environments (Ubuntu, RedHat), including system administration and troubleshooting. Strong proficiency in scripting and automation using Python, Bash, or Go. Experience with containerization and orchestration technologies such as Docker and Kubernetes. Familiarity with CI/CD pipelines and tools like Jenkins, Puppet, Vault, and Splunk. Hands-on experience with cloud platforms (AWS, Azure, or GCP). Problem-Solving & Leadership: Strong analytical skills with the ability to diagnose and resolve complex system issues. Self-driven, highly motivated, and able to work independently in a fast-paced environment. Ability to collaborate cross-functionally and communicate technical solutions effectively. Security & Reliability Focus: Solid understanding of DevSecOps principles and secure system design. Ability to implement monitoring, logging, and alerting solutions to maintain system resilience. Passion for continuous learning and leveraging data-driven approaches for system improvement. Work in a high-impact role that directly contributes to the reliability and scalability of mission-critical systems. Be part of an innovative, forward-thinking team that values automation, collaboration, and continuous improvement. Competitive salary, professional development opportunities, and an environment that fosters growth and innovation. If you are a passionate, results-driven SRE, we invite you to join us and play a pivotal role in shaping the future of our infrastructure.
Sr. Noc Engineer
Databricks
We re growing fast and attracting the best talent in the world. Bricksters as we call ourselves are a special mix of smart, curious, quick thinkers. If you ask a Brickster what they love about working here, you ll likely hear about our culture. We are seeking an experienced NOC Engineer to join our team. The successful candidate will be responsible for monitoring critical Databricks infrastructure and developing monitoring tools and alerting dashboards. They will also work closely with stakeholders to investigate and resolve incidents, perform root cause analysis, and propose solutions to increase the reliability and stability of the Databricks unified analytics platform. The impact you will have here: Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents. Investigate incidents and propose solutions to improve platform reliability and stability. Perform root cause analysis for recurring incidents and provide proactive solutions. Develop toolings or automate processes to improve platform monitoring and alerting. Contribute to software development efforts to improve overall service reliability and stability. Communicate effectively with internal stakeholders, including executive staff, to provide incident analysis. Participate in war rooms and temporary communication channels during outages. Demonstrate cross-functional leadership and establish ownership of incidents and outages. Multitask on several incidents and/or projects Minimum of 5 years of experience as a NOC, SRE, or DevOps engineer Strong knowledge of cloud technologies such as Azure, AWS, and GCP Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, Grafana, Pager Duty, etc. Experience with containers and orchestration technologies such as Docker and Kubernetes. Proficiency in automation and scripting Linux systems administration skills. Excellent communication skills. Willingness to learn Databricks products Bachelor's degree in Computer Science or a related field About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide including Comcast, Cond Nast, Grammarly, and over 50% of the Fortune 500 rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark , Delta Lake and MLflow. To learn more, follow Databricks on Twitter,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visithttps://www.mybenefitsnow.com/databricks. Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone. Qualification : Bachelor's degree in Computer Science or a related field is required.
Senior Software Engineer - Backend
Nvidia
NVIDIA is searching for a highly motivated senior software engineer for the team that is building capabilities for a next generation Network management and Telemetry system in cloud using modern design principles at internet scale.The person will be will be responsible for building distributed cloud applications. It will be a highly scalable, modern network operations toolset that provides visibility, troubleshooting, validation and telemetry for Ethernet networks. What you'll be doing: Development of distributed cloud applications, micro services and SAAS platform with high throughput and reliability. Contribute to applications like data ingestion, distributed computing, near real time analytic engines, RESTful APIs and user interfaces. Drive requirement discussions, design and product improvements. Drive improvements in areas like performance, team productivity, automation, quality, monitoring and reliability of applications. Working closely with the system architects, UI/UX and test engineers What we need to see: Bachelors/Masters Degree in Computer Science/Engineering 5+ years of experience in complex microservices based architectures. Extensive programming experience in Scala, Go, Python Fluent in coding and rapid prototyping. Strong experience in developing, maintaining, and testing of scalable distributed applications. Experience with stream processing frameworks, such as Kafka,Flink , Spark Streaming, Samza etc. Background with NoSQL databases such as Cassandra, MongoDB. Experience with orchestration/scheduling technologies like Kubernetes, SLURM, Nomad etc Ways to stand out from the crowd: Experience with public clouds like AWS. Worked in Reactive application designs (https://www.reactivemanifesto.org/). Experience in network stacks, protocols, SDN. NVIDIA is widely considered to be one of the technology world s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you! NVIDIA is leading the way inground-breakingdevelopments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Qualification : Bachelors/Masters Degree in Computer Science/Engineering
Senior Performance Engineer
Boomi Software
Senior Performance Engineer Are you ready to work on world changing technologies? Today, organizations need to move with increased agility and insight to grow and thrive. Boomi is one of the hottest tech companies in the SaaS/Cloud industry, named a Leader for the eighth year in a row in the Gartner Enterprise iPaaS Magic Quadrant and recently recognized by Inc. Magazine as one of the best workplaces. Our award-winning, patented technology is transforming the world of integration by making enterprise-class integration technology accessible and affordable to companies of all sizes. Boomi provides the foundation on which your business can evolve and innovate. According to a recent survey by Vanson Bourne, connected businesses are far outpacing their competitors. We help organizations connect everything and engage everywhere across any channel, device or platform. More than 7,000 organizations are using Boomi to run better, faster and smarter. Working at Boomi means doing what you love. We hire trailblazers with an entrepreneurial spirit who can solve challenging problems, make a real impact in technology and want to build something big. If you are passionate about solving hard problems, enjoy working with world-class people and developing cutting edge technology, you should explore a career with Boomi. Learn more at http://www.boomi.com/ or visit Boomi Careers. Join us as a Performance Engineer on our Performance, Scalability and Resiliency(PSR) Engineering team in Bangalore/Hyderabad, India to do the best work of your career and make a profound social impact. What you ll achieve As a Performance Engineer, you will be responsible for validating and recommending performance optimizations in Boomi s computing infrastructure and software. You will work with our Product Development and Site Reliability Engineering teams on Performance monitoring, tuning and tooling. You will: Analyze Software Architecture (monolith and micro-service) and identify potential areas of performance, scalability and resiliency improvements Identify KPIs, perform trending and analysis, identify patterns and engineer remedial solutions for a high performant, fault tolerant and resilient platform and application stack. Design, automate and perform scalability and resiliency tests using various tools like JMeter, Chaos Monkey or similar Use observability stack to improve diagnosability and trending around Performance bottlenecks Identify performance tuning opportunities and recommend remedial solutions Take the first step towards your dream career Every Boomer brings something unique to the table. Here s what we are looking for with this role: Essential Requirements Expert in performance engineering fundamentals - arrival rate, workload models, responsiveness, computing resource utilization, time complexity, scalability, resiliency etc.. Expert in monitoring the performance using native Linux OS, Application Performance Management(APM) and Infrastructure monitoring tools Experience in analyzing crash dump, thread dump, SQL slow query log and identify performance bottlenecks Expert in recommending optimal resource configurations in Cloud, Virtual Machine, Container and Container Orchestration technologies Flexibility to work in a remote and geographically distributed team environment Desirable Requirements Experience in writing data extraction and custom monitoring tools using any programming language - Java, Python, R , Bash or similar Experience in capacity planning and modelling using AI/ML, queueing models or similar approaches Performance tuning experience in Java or similar application code
1 - 20 of 0 jobs
* No exact matches found. Showing closest results insteadNo results found
Modify search criteria or create an alert to get relevant jobs as soon as they’re posted