AND System Reliability Engineering SRE Jobs in Bengaluru
1598 Jobs Found
Senior Qa Engineer
Team Vunet Systems
Senior QA Engineer - AI-Powered Observability Platform Location: Bengaluru Experience: 6 10 years About VuNet VuNet is at the forefront of Business Journey Observability, revolutionizing the financial services industry with Big Data and Machine Learning. Our deep-tech platform provides comprehensive visibility into customer journeys, enabling proactive issue resolution, operational resilience, and superior user experiences. We monitor over 28 billion digital transactions monthly, serving 300 million users globally, and we re powering some of the largest banks and financial institutions in India and MEA. VuNet is Series B funded, part of NASSCOM s DeepTech Club, and recognized by analysts like Gartner and Omdia. Your Role: Senior QA Engineer - AI-Powered Observability Platform As a Senior QA Engineer at VuNet, you ll play a crucial role in ensuring the quality and reliability of our VuSmartMaps Observability Platform. You ll lead the design and implementation of cutting-edge test automation, performance validation, and reliability frameworks across distributed systems that handle billions of telemetry events. Working closely with development, operations, and QA teams, you will drive quality across the entire platform and play a key role in ensuring that our systems are scalable, resilient, and performant. Roles & Responsibilities Quality Strategy Ownership: Own the end-to-end quality strategy for observability platform components (metrics, logs, tracing, alerting, dashboards, MLOps). Automated Testing: Build and maintain automated test suites for data pipelines, APIs, and integration flows involving tools like Prometheus, Grafana, Loki, Elastic, and OpenTelemetry. Performance Validation: Design and execute tests to validate high-throughput, distributed systems under real-world load conditions, ensuring performance benchmarks are met. Test Frameworks Development: Develop and maintain test frameworks and tools using Python, Go, Bash, pytest, k6, Playwright, and others. System Reliability & Alerting: Define and implement test coverage for system reliability, alerting accuracy, and visualization correctness. Collaboration: Partner with developers, SREs, and DevOps teams to shift quality left in the development lifecycle, contributing to CI/CD pipelines and automation workflows using GitOps tools. Automation Integration: Integrate automated test suites into smoke, functional, and regression pipelines using Jenkins, Spinnaker, and other CI/CD tools. Mentorship: Mentor junior QA engineers, establish best practices, and ensure consistency in the QA discipline across the team. What You Bring Mandatory Skills: Experience: Minimum 6+ years in software quality engineering, with a focus on automated testing, performance, and reliability. Scripting/Programming: Proficiency in at least one scripting or programming language (JavaScript, Python, Go). CI/CD Systems: Experience with CI/CD systems such as GitHub Actions, Jenkins, or ArgoCD. Debugging Skills: Excellent debugging skills and the ability to analyze code quality and system performance. Distributed Systems Knowledge: Familiarity with Kafka, Kafka Streams, ClickHouse DB, and distributed systems. Kubernetes & Microservices: Strong experience testing Kubernetes-native systems, Helm deployments, and microservices. Observability Tools: Knowledge of observability tools like Prometheus, Grafana, Elastic Stack, OpenTelemetry, Loki, or Jaeger. Tooling & Deployment: Proficiency in Jenkins, Spinnaker, GitOps, Kubernetes, and Docker. Testing Experience: Hands-on experience in various types of testing (functional, performance, load, etc.) and knowledge of testing tools. Documentation Skills: Ability to create clear documentation (e.g., release notes, troubleshooting guides, and migration guides). Nice-to-Have Skills: Performance Testing: Experience designing and executing performance and load testing for high-traffic applications. Web Services & Systems Design: Understanding of web services and distributed systems architecture. Cross-Functional Communication: Excellent communication skills with the ability to coordinate across multiple teams. Life at VuNet: At VuNet, we re building a world-class observability platform proudly Made in India and we re just getting started. Join a passionate team of problem-solvers who love tackling complex challenges and stay ahead of the curve with technologies like Gen AI. We offer an environment where collaboration, innovation, and learning are at the core of everything we do. You ll have the opportunity to work on cutting-edge technologies and make a real impact on a product that powers leading banks and financial institutions globally. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness support and 1:1 counseling. A learning culture that promotes growth, innovation, and ownership. Transparent, inclusive, and high-trust workplace culture. Exposure to Gen AI and integrated technology workspaces. Support for career development with various training programs to enhance your skills and expertise.
Mobile App And Observability Sdk Engineer
Team Vunet Systems
Mobile App and Observability SDK Engineer Experience: 3 6 Years Location: Bengaluru About VuNet VuNet is a pioneer in Business Journey Observability, revolutionizing the financial services industry with Big Data and Machine Learning. Our cutting-edge platform offers end-to-end visibility into customer journeys, driving proactive issue resolution, operational resilience, and superior user satisfaction. With over 28 billion digital transactions monitored monthly touching 400 million users worldwide we re already powering leading banks and financial institutions across India and MEA. VuNet is Series B funded, part of NASSCOM DeepTech Club, and recognized globally by analysts like Gartner and Omdia. Your Role: Mobile App and Observability SDK Engineer At VuNet, the Product Development Team is dedicated to delivering exceptional customer experiences through scalable products. We are looking for a Mobile App and Observability SDK Engineer to join this team. In this role, you ll be at the forefront of building high-quality mobile applications and advancing our Mobile Real User Monitoring (MRUM) initiatives. You ll capture and translate mobile performance data into actionable insights, helping improve the performance and user experience of mobile apps across various platforms. If you re passionate about mobile engineering, user experience, and observability this role offers a unique opportunity to merge these interests into a groundbreaking solution. Roles & Responsibilities Mobile Application Development: Design, develop, and maintain robust, high-performance mobile applications for iOS and Android using Swift, Kotlin, Flutter, or React Native. Testing & Quality Assurance: Implement unit, integration, and UI testing strategies to ensure the app s quality, stability, and regression coverage. Debugging & Profiling: Identify and resolve performance bottlenecks, ANRs, crashes, and memory leaks using tools like Android Studio Profiler, Xcode Instruments, or Flipper. Crash Analysis & Reporting: Integrate crash analytics tools and develop efficient incident tracking and resolution workflows. Performance Monitoring & Insights: Leverage telemetry, profiling, and analytics data to enhance app performance, responsiveness, and overall user experience. Observability Collaboration: Work with SRE and backend teams to export performance metrics, logs, and traces from mobile clients into centralized observability platforms. Code Quality: Write clean, modular, and well-documented code, adhering to best practices in mobile development and SDK maintenance. What You Bring Mandatory Skills: Mobile App Development: 3+ years of hands-on experience in mobile app development using Flutter, React Native, Swift, or Kotlin (experience in at least two of these). App Lifecycle & Performance: Strong understanding of mobile app lifecycle, UI rendering, asynchronous processing, state management, and performance optimization (ANRs, memory management, network latency). Debugging & Profiling Tools: Proficiency in debugging, profiling, and testing mobile applications using tools like Android Studio Profiler, Xcode Instruments, or Flipper. Crash Analytics: Experience integrating and using crash analytics and reporting tools. CI/CD & SDK Versioning: Familiarity with CI/CD pipelines, automated testing, and SDK versioning. Performance Instrumentation: Interest in observability, monitoring, and performance instrumentation with a willingness to learn OpenTelemetry and RUM concepts. Problem-Solving Mindset: Strong analytical and debugging skills, focused on enhancing performance and reliability. Nice-to-Have Skills: OpenTelemetry & SDKs: Exposure to OpenTelemetry SDKs or other instrumentation frameworks for capturing telemetry data (e.g., traces, metrics, logs). Mobile Observability: Familiarity with mobile observability backends. Session Replay & Mobile Analytics: Knowledge of session replay, user behavior tracking, or mobile analytics SDKs. SRE & Monitoring Practices: Understanding of SRE principles, monitoring best practices, and golden signals. Open Source Contributions: Contributions to open-source SDKs or mobile performance tools. Life at VuNet: At VuNet, we re building a world-class observability platform proudly Made in India. We re just getting started, and we re looking for people like you to join us in tackling some of the most complex challenges in the digital world. Our team is filled with passionate problem-solvers who thrive in a collaborative, fast-paced environment. We embrace continuous learning, adapt quickly, and stay ahead of emerging technologies like Gen AI. If you re looking to work on cutting-edge technology, make a real impact, and grow with a supportive team, you ll feel right at home here at VuNet. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness and 1:1 counseling support. A learning culture that promotes growth, innovation, and ownership. A transparent, inclusive, and high-trust workplace culture. Access to Gen AI and integrated technology workspaces. Supportive career development programs to expand your skills with various training opportunities.
Lead Platform Engineer
Team Vunet Systems
Lead Platform Engineer Observability Solutions Location: Bengaluru Experience: 6 10 Years Function: Observability Engineering | Platform Architecture | SRE Enablement Join VuNet Redefining Digital Observability at Scale VuNet is transforming the future of digital experiences through Business Journey Observability, combining Big Data and AI/ML to empower real-time visibility across payments, banking, and financial services. Monitoring 28+ billion transactions/month, our platform is trusted by top financial institutions and powers over 300 million users. Backed by Series B funding and recognized by Gartner, NASSCOM, and Forbes, we are leading the charge in building a new category of observability, proudly Made in India for global impact. Your Role: Lead Platform Engineer As the Lead Platform Engineer, you will architect and drive the development of packaged observability solutions across 100+ infrastructure and application technologies. You will define **golden signals**, build **data collection strategies**, and lead the standardization of alerts, dashboards, and RCA workflows for platforms like **Kubernetes, Oracle DB, and Tomcat**. This is a cross-functional leadership role that sits at the intersection of product, platform, DevOps, and SRE. You will **lead a team** and influence how observability is delivered, scaled, and adopted across complex environments. Key Responsibilities Observability Solution Development Design and lead the delivery of observability packages for databases, middleware, cloud-native, and legacy platforms. Define and implement data collection pipelines, including agents, APIs, logs, metrics, traces, and service discovery. Establish **golden signals, SLIs/SLOs**, and health KPIs for performance, availability, and anomaly detection. Dashboards, Alerts & RCA Develop standardized, reusable dashboards, alerts, reports, and troubleshooting playbooks. Automate **RCA workflows** to improve MTTR and reduce alert fatigue. Platform Enablement & Integration Work with engineering to enhance agent capabilities and support new data sources/formats. Guide implementation of platform features for better observability at scale. Team Leadership & Governance Lead and mentor a team of observability engineers and specialists. Define design patterns, reusable modules, and version-controlled libraries. Stakeholder Collaboration Partner with product managers, DevOps, SREs, and customer teams to gather requirements, align priorities, and validate use cases. Ensure deliverables are scalable, well-documented, and production-ready. What You Bring Must-Have Skills 6 10 years of experience in observability, platform engineering, or SRE roles. Hands-on with tools like Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, Splunk. Strong understanding of logs, metrics, traces, profiling, and collection strategies. Experience developing solutions for platforms like Kubernetes, Oracle, PostgreSQL, Tomcat, etc. Proficient in Python, Shell scripting, APIs, and automation tools (**Terraform**, etc.). Familiar with alert fatigue mitigation, anomaly detection, and RCA frameworks. Excellent communication, technical leadership, and documentation skills. Nice to Have Experience managing an observability marketplace or solution catalog. Contributions to open-source observability projects. Certifications in Kubernetes, Observability platforms, or cloud providers (AWS/GCP/Azure). Background in ITSM tools, CMDBs, or incident workflow automation. At VuNet, you ll help build a category-defining observability platform that s already transforming critical infrastructure for leading financial institutions. You ll work with passionate engineers, push technical boundaries, and grow in a high-trust, high-impact environment. What You ll Experience: Ownership of key observability initiatives impacting 300M+ users. Collaboration with SRE, DevOps, and product teams across real-time financial systems. Opportunity to experiment with and shape Gen AI, ML, and emerging telemetry trends. Perks & Benefits Health insurance for you, your parents, and dependents. 1:1 mental wellness support. Training programs, certifications, and career growth opportunities. Transparent, inclusive, and high-trust work culture. Access to cutting-edge technology and Gen AI-powered workspaces.
Director Quality Engineer
Coindcx
Director Quality Engineering Experience: 15 20 years Location: Bengaluru Team: Engineering About CoinDCX At CoinDCX, we believe Change Starts Together. We are on a mission to make Web3 accessible to all, building cutting-edge products that solve real-world challenges in security, scalability, and user accessibility. In just six years, we ve transformed from India s first crypto unicorn to a platform serving over 125 million users worldwide. As we accelerate Web3 adoption, we are looking for visionary leaders to help us maintain world-class quality and performance standards. Role Overview As Director of Quality Engineering, you will lead and scale our QA and Performance Engineering functions to ensure the reliability, scalability, and security of our fintech products. You ll be responsible for driving the quality strategy across large-scale distributed systems and building a high-performing team passionate about excellence. What You ll Do Leadership & Strategy Lead and grow a team of 50+ QA, automation, and performance engineers. Define and execute a long-term quality engineering strategy aligned with business goals and regulatory requirements. Foster a culture of ownership, accountability, and continuous improvement. Quality Engineering Champion an automation-first approach across functional, regression, and integration testing. Oversee end-to-end validation for core product flows including trading, payments, custody, and compliance. Own testing strategies for microservices architectures and high-throughput APIs. Performance & Scalability Lead performance testing initiatives designed to support systems handling over 1 million TPS with sub-50ms latency. Develop frameworks for continuous performance benchmarking and capacity planning. Collaborate with SRE, DevOps, and Product Engineering to identify and mitigate performance bottlenecks. Non-Functional Testing Ensure comprehensive coverage for reliability, availability, failover, disaster recovery, and security. Drive chaos testing, fault injection, and compliance-related quality assurance processes. Collaboration & Stakeholder Management Partner with Product, Platform, Security, and Compliance teams to align quality standards with regulatory mandates. Provide executive reporting on quality, system resilience, and risk metrics. Influence cross-functional adoption of best practices in testing and release validation. What You Bring Experience 15+ years in QA and Performance Engineering, with at least 5 years in senior leadership roles. Proven experience managing large, high-growth fintech or financial services engineering teams (50+ members). Technical Expertise Deep expertise in testing large-scale distributed systems. Strong knowledge of performance, load, stress, soak, and chaos testing frameworks. Familiarity with cloud-native environments (AWS, Kubernetes), CI/CD pipelines, and observability tools. Domain Knowledge Extensive background in fintech or financial services (trading, payments, banking). Strong understanding of regulatory and compliance requirements in financial applications. Leadership & Soft Skills Exceptional people leadership, mentoring, and organizational scaling capabilities. Excellent stakeholder management with the ability to influence senior engineering and business leaders. Strategic, data-driven decision-making mindset. You re passionate and constantly curious about Web3 and Virtual Digital Assets (VDA). You act with ownership, drive excellence, and focus on measurable impact. You embrace a We over Me philosophy empowering your team as you grow. Change excites you and fuels your innovation mindset. You think beyond limits, challenging the status quo to push boundaries. Perks That Empower You Design Your Own Benefit: Personalize your perks to fit your lifestyle whether it s tech, travel, or pets, your priorities come first. Unlimited Wellness Leaves: Take time off as needed to recharge your health matters most. Mental Wellness Support: Access free counseling, expert sessions, workshops, and social events to stay balanced. Bi-Weekly Learning Sessions: Sharpen your skills and stay current with ongoing industry trends and knowledge. Join Us If you re ready to lead a high-impact team and help build the future of Web3 quality engineering we want to HODL you on our team!
Devops Engineering Manager
Medi Assist
Position: DevOps Engineering Manager Location: Bangalore Experience: 5 10 years Education: BE/BTech/MCA/MTech/MSc Role Overview: We re looking for an experienced DevOps Engineering Manager to lead our cloud infrastructure, automation, and DevOps initiatives. This is a hands-on leadership role focused on driving efficiency, security, and scalability across our IT operations and development pipelines. Key Responsibilities: Cloud & Infrastructure Management: Administer and manage Google Workspace, including user accounts, security policies, and compliance settings. Oversee and optimize AWS resources (EC2, IAM, S3, VPC), ensuring cost-effective and secure cloud operations. Configure and manage A10 vThunder for load balancing and network performance optimization. Serve as Active Directory Administrator, maintaining AD, DNS, and Group Policy Objects (GPOs). Deploy, maintain, and troubleshoot VMware environments to support virtual infrastructure. Security & Compliance: Manage domain and SSL certificates including installation, renewal, and issue resolution. Handle ADFS token certificate renewals to support uninterrupted authentication services. Enforce security best practices across cloud and on-prem environments. Automation & Scripting: Create and maintain automation scripts using Bash, PowerShell, or Python to streamline workflows. Reduce manual intervention and boost system efficiency through smart scripting and task automation. Monitoring & Troubleshooting: Proactively monitor system logs, performance metrics, and security alerts to prevent downtime. Investigate and resolve issues related to network, infrastructure, and cloud environments promptly. Required Skills & Experience: Proven experience with infrastructure automation tools such as Terraform or CloudFormation. Strong understanding of DevOps practices and implementing CI/CD pipelines for cloud deployments. Solid scripting skills in Bash, PowerShell, or Python. Expertise in managing both cloud-based and on-premise infrastructure. Strong troubleshooting capabilities and a proactive approach to system monitoring. Qualification : BE/BTech/MCA/MTech/MSc
Site Reliability Engineer
Groww
Position: Site Reliability Engineer Location: Bengaluru About Groww At Groww, we re on a mission to make financial services simple, accessible, and transparent for every Indian. As one of India s fastest-growing financial platforms, we help millions take control of their financial future through a wide range of products. We re a team driven by ownership, radical customer-centricity, and a deep passion for challenging the status quo. From intuitive design to robust engineering, everything we build is grounded in what our customers need. If you re excited about building systems that power the future of finance in India, we d love to hear from you. Our Vision To empower every Indian with the knowledge, tools, and confidence to make sound financial decisions. Our goal is to be the most trusted financial partner for millions across the country. Our Core Values Customer Obsession We put our users first, always. Extreme Ownership We own everything we do, end-to-end. Simplicity We keep things simple, effective, and intuitive. Long-term Thinking We focus on sustainable, impactful decisions. Transparency We believe in open communication and collaboration. Role Overview: As a Site Reliability Engineer (SRE) at Groww, you will be responsible for ensuring our systems are highly available, performant, and secure. You will work closely with engineering and infrastructure teams to improve reliability, automate deployments, and manage mission-critical services that power our platform. Key Responsibilities: Monitor and troubleshoot issues related to system performance, availability, and security. Define and maintain SLIs, SLOs, and Error Budgets to improve system reliability. Use tools like Grafana to analyze and report on metrics and trace data. Participate in the on-call rotation for 24/7 support of production systems. Collaborate with developers to ensure scalability and reliability are built into new services. Roll out security and infrastructure features proactively. Manage automated deployments, version control, and release rollouts. Perform Root Cause Analysis (RCA) for incidents and implement long-term fixes. Optimize system performance, conduct capacity planning, and create recovery strategies. Identify and automate repetitive tasks to reduce toil. Leverage CI/CD tools such as Git, Jira, Jenkins to streamline development workflows. Requirements: 4 6 years of relevant experience in SRE, DevOps, or infrastructure engineering. Bachelor's or Master's degree in Computer Science or a related field. Strong background in Linux/Unix system administration and networking. Hands-on experience with cloud platforms like GCP or AWS. Proficiency in programming languages such as Python, Java, or Go. Experience with monitoring and alerting tools: Grafana, Prometheus, New Relic, etc. Familiarity with configuration management tools. Experience with Kubernetes, Docker, and container orchestration tools is a strong plus. Excellent problem-solving, communication, and team collaboration skills. Be a part of one of India s fastest-growing fintech startups. Build and scale systems that impact millions of users daily. Work with passionate, driven teammates who are redefining financial services. A culture that encourages continuous learning, ownership, and transparency. If you're ready to help shape the future of fintech infrastructure in India, Groww is the place for you. Let s build something extraordinary together. Qualification : Bachelor's or Master's degree in Computer Science or a related field
Technical Lead Devops
Subex Limited
Position: Technical Lead - DevOps Location: Bangalore Rural, Karnataka, India Department: Data Platform and DevOps Employment Type: Subexian Experience Required: 3 to 6 years Job Overview: We are seeking an experienced Kubernetes Administrator with a strong background in managing containerized environments. The ideal candidate will have 4+ years of hands-on experience in deploying, configuring, and optimizing Kubernetes clusters to drive scalability, reliability, and performance. This is an excellent opportunity to leverage your expertise in Kubernetes orchestration while contributing to the overall success of our platform. Key Responsibilities: Cluster Management: Deploy, configure, and manage Kubernetes clusters both on-premises and across cloud platforms such as AWS, Azure, and GCP. Security & Compliance: Implement best practices for cluster security, including role-based access control (RBAC), network policies, and data encryption at rest and in transit. Automation: Automate cluster provisioning and ongoing management using tools like Terraform, Ansible, or Helm charts, streamlining operations and reducing manual tasks by 40%. Monitoring & Performance: Continuously monitor cluster health and performance metrics using tools like Prometheus, Grafana, ensuring high availability and optimal performance. CI/CD Pipelines: Design and implement CI/CD pipelines for containerized applications using tools such as Jenkins, GitLab CI/CD, and CircleCI to enable smooth continuous delivery. Collaboration: Work closely with development teams to troubleshoot issues, optimize application performance, and ensure compatibility with Kubernetes environments. Security Audits: Conduct regular security audits to identify vulnerabilities and ensure compliance with industry standards. Documentation: Maintain clear and comprehensive documentation for deployment procedures, configuration settings, and troubleshooting guides to enhance knowledge sharing within the team. Infrastructure Management: Administer and maintain Linux/Unix servers and virtualization platforms such as VMware or KVM, ensuring seamless operations across the infrastructure. Backup & Recovery: Implement and manage robust backup and disaster recovery solutions to ensure data integrity and minimize system downtime. Technical Support: Provide expert-level technical support for server and network infrastructure-related issues. Required Skills & Qualifications: Proven experience in Kubernetes deployment, configuration, and administration. Strong command of containerization technologies, including Docker and containerd. Hands-on experience with cloud platforms such as AWS, Azure, and GCP. Proficiency in Infrastructure as Code (IAC) tools like Terraform and Ansible. Familiarity with CI/CD pipelines and automation tools like Jenkins and GitLab CI/CD. Excellent troubleshooting and problem-solving skills. Strong communication and collaboration abilities, with the capability to work effectively across cross-functional teams. If you re passionate about DevOps, Kubernetes, and driving the success of containerized environments, we d love to hear from you!
Systems Development Engineer, Google Cloud
Google Careers
Systems Development Engineer Google Cloud Location: Bengaluru, Karnataka, India Company: Google Minimum Qualifications Bachelor s degree in Computer Science, Information Technology, or a related field; or equivalent practical experience. 2+ years of experience with systems automation. 2+ years of experience in technical infrastructure (e.g., deployment, maintenance, troubleshooting). Preferred Qualifications 3+ years of experience in systems design and implementation. About the Role As a Systems Development Engineer (SDE) at Google Cloud, you will be part of a team responsible for managing and scaling critical services and infrastructure. This role emphasizes automation, reliability, and observability, using engineering practices to eliminate manual toil and improve system efficiency. Google SDEs design and build the tools and systems that power the infrastructure for Google s services, transforming telemetry into actionable insights and proactively solving operational challenges. You ll have the opportunity to work on impactful, large-scale projects in an environment that fosters learning, collaboration, and growth. Key Responsibilities Participate in on-call rotations and incident response, managing services within your domain. Troubleshoot infrastructure and system issues, evaluate diagnostic data, and recommend solutions. Resolve tickets and bugs within defined service-level objectives (SLOs). Collaborate with primary responders to maintain high availability and reliability of systems. Contribute to the design and implementation of systems and services in related domains. Work directly with customers to gather requirements, define distributed system needs, and propose solutions. Develop automation tools and systems to improve efficiency and reduce operational overhead. About Google Cloud Google Cloud helps organizations transform their business with advanced technologies and enterprise-grade solutions. With a focus on sustainability, innovation, and scalability, Google Cloud serves customers in over 200 countries and territories, providing the tools and infrastructure necessary to solve the world s most complex business challenges. Qualification : Bachelor's degree in Computer Science or IT-related field, or equivalent practical experience.
Staff Engineer - Core Infrastructure
Eightfold
Staff Engineer - Core Infrastructure Location: Bangalore, Karnataka, India Employment Type: Full-Time | Hybrid Work Model About Eightfold.ai At Eightfold.ai, we re transforming the future of work by leveraging artificial intelligence to connect individuals with career opportunities based on their skills and potential, not just their network. Our Talent Intelligence Platform powers a more diverse, inclusive workforce by helping organizations plan, hire, develop, and retain top talent. With $410M+ in funding and a $2B+ valuation, we are revolutionizing how the world thinks about skills, potential, and careers. If you re passionate about cutting-edge technology, infrastructure, and creating scalable solutions that impact the world, we want you to join us. The Opportunity We re looking for a Staff Engineer to join our Core Infrastructure Team and help scale the backbone of Eightfold s platform. This high-impact role will involve designing, building, and optimizing foundational systems that power everything from search and machine learning infrastructure to developer platforms and observability tools. You will drive system design across our stack and mentor engineering teams to build scalable, resilient systems that enable Eightfold to grow and deliver AI-powered solutions for our customers. What You ll Own & Drive Architect & Scale Core Systems: Design and build scalable infrastructure systems that support Eightfold s AI-driven products, including search, compute, storage, and machine learning infrastructure. Cross-Functional Leadership: Lead cross-team technical initiatives, collaborating with Product, Security, Data, and Platform teams to align with company-wide goals. Hands-On Development: Contribute directly to system design, code reviews, and incident response, ensuring best practices are followed. Mentorship & Leadership: Guide and mentor engineers to help them grow into future leaders, fostering a culture of technical excellence across teams. Advocate for Engineering Excellence: Champion best practices across areas such as cloud architecture, CI/CD, security, and observability. Solve Complex Infrastructure Challenges: Tackle problems around reliability, scalability, and infrastructure performance, ensuring the systems are robust and perform well at scale. Bring Emerging Tech to Life: Stay on top of the latest trends and technologies, incorporating new scalable design patterns into our architecture. What You Bring 10+ years of experience in backend or infrastructure engineering, with a strong background in building distributed, cloud-native systems. Proven track record in designing and delivering reliable, high-scale services (ideally in AWS, GCP, or Azure environments). Expertise in Infrastructure Technologies: Deep knowledge of containerization, orchestration (Kubernetes), and infrastructure-as-code. Experience with one or more of the following: search infrastructure, ML/AI infrastructure, databases/data warehouses, developer tooling, or platform security. Leadership Experience: A passion for mentoring and guiding engineers, influencing teams and peers, and driving excellence across projects. Strong communication skills, able to translate complex technical challenges into strategic business impact. (Bonus) Experience with SRE principles, cloud security, and compliance for enterprise/government environments. Our Engineering Culture At Eightfold, we believe in ownership over tasks. You won t just be given directions; you ll be trusted to take responsibility and make a measurable impact. We have a growth mindset and continuously improve in all aspects of our work. Collaboration, transparency, and speed are core to everything we do. You ll work in a dynamic, supportive environment where your work directly influences the success of the company and its mission. Meaningful Work: Help shape the future of work by building products that impact careers and businesses globally. Growth Opportunities: Be part of a rapidly scaling company where your contributions are highly valued. Competitive Compensation: Attractive salary, equity, and comprehensive benefits package (including medical, vision, and dental coverage). Hybrid Work Model: Work from our Bangalore office twice a week, with flexibility for remote work. Inclusive Culture: We are committed to fostering a diverse and inclusive work environment where everyone feels valued. Equal Opportunity Employer Eightfold.ai is an Equal Opportunity Employer. We do not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, or disability. If you re a hands-on, innovative engineer with a passion for building scalable systems and tackling infrastructure challenges, we want to hear from you.
Devops Engineer
Sarvam
DevOps Engineer Location: Bengaluru, Karnataka, India (On-Site) Department: Engineering Employment Type: Full-Time About Sarvam.ai Sarvam.ai is a cutting-edge generative AI startup headquartered in Bengaluru, India, with a mission to make generative AI accessible and impactful for Bharat. Founded by AI experts, we are dedicated to developing high-performance, cost-effective AI agents tailored for the Indian market. We enable enterprises to tap into new opportunities, build deeper customer connections, and reshape the future of AI for India and beyond. Role Overview We are looking for a DevOps Engineer to join our team and help build and manage scalable, secure, and high-performance infrastructure. In this role, you will be a key contributor to automating deployments, managing cloud infrastructure, optimizing CI/CD workflows, and ensuring system reliability. You will work with cutting-edge technologies, including cloud platforms, containerization, and infrastructure as code (IaC), to deliver impactful solutions for AI-driven products. Key Responsibilities CI/CD Pipelines: Design, implement, and manage CI/CD pipelines for seamless software deployment and integration. Cloud Infrastructure: Deploy and manage cloud infrastructure using Terraform, Kubernetes, and Docker for scalability and high performance. Automation & Scaling: Automate infrastructure provisioning, scaling, and security compliance to support high-availability environments. Monitoring & Optimization: Implement logging, monitoring, and alerting solutions using tools like Prometheus, Grafana, ELK Stack, or CloudWatch to monitor system performance and optimize resource utilization. Security & Compliance: Enhance security and compliance by managing IAM policies, encryption, and vulnerability scanning. Troubleshooting & Root Cause Analysis: Troubleshoot system failures, perform root cause analysis, and implement improvements to ensure reliability and uptime. Collaboration: Work closely with development teams to ensure smooth deployment and operation of AI models and applications. Must-Have Skills & Qualifications Educational Background: Bachelor s degree in Computer Science, Engineering, or related field (2024/2025 graduates). Cloud Expertise: Strong experience with AWS, Azure, or GCP for deploying and managing cloud-based applications. Containerization: Proficiency in Docker and Kubernetes for building and managing containerized applications. Infrastructure as Code (IaC): Experience with Terraform, Ansible, or CloudFormation to automate infrastructure management. CI/CD Pipelines: Experience in setting up automated workflows using tools like GitHub Actions, Jenkins, or GitLab CI/CD for smooth deployments. Monitoring & Logging: Experience with Prometheus, Grafana, ELK, or similar tools to implement effective monitoring and logging solutions. Networking & Security: Strong understanding of firewalls, VPNs, SSL, and cloud security best practices for secure infrastructure. Version Control: Proficiency with Git for managing code repositories and version control workflows. Problem Solving: Strong debugging, troubleshooting, and analytical skills to resolve complex system issues. Good to Have (Preferred Experience) Serverless Computing: Exposure to serverless computing models such as AWS Lambda or Azure Functions. Message Queues: Experience with message queues like Kafka, RabbitMQ, or SQS. Site Reliability Engineering (SRE): Familiarity with SRE practices to ensure the reliability and availability of large-scale systems. Open Source Contributions: Contributions to open-source projects or a strong GitHub portfolio showcasing DevOps expertise and best practices. Impactful Work: Work on AI-driven products that are reshaping the future of technology in India. Innovative Team: Collaborate with a team of AI experts and engineers pushing the boundaries of technology. Career Growth: Opportunity to grow in a fast-growing startup at the forefront of the generative AI revolution. Cutting-edge Technologies: Work with cloud technologies, automation, and AI infrastructure to create high-impact products. Qualification : Bachelors degree in Computer Science, Engineering, or related field
Engineering Manager- Platform Engineering
Meesho
Engineering Manager Platform Engineering Location: Bangalore, Karnataka | Department: Tech About the Team At Meesho, we support 5% of Indian households with high-scale e-commerce solutions and we do it with zero downtime. We value speed over perfection, embrace failures as learning opportunities, and empower teams with a Founder s Mindset. As part of the Platform Engineering team, you ll be building resilient, low-latency, high-throughput systems that serve millions of users daily. We invest in the growth of every engineer through continuous feedback, open communication, and a supportive culture. And yes we know how to party as hard as we code. About the Role We are looking for a skilled Engineering Manager Platform Engineering to lead a team responsible for designing, scaling, and optimizing our core infrastructure. This role involves managing large-scale distributed systems, fostering engineering excellence, and collaborating across teams to drive innovation. You ll ensure technical quality, delivery speed, and scalable architecture for all projects under your ownership. What You Will Do Design and allocate technical tasks while maintaining Meesho s engineering standards. Own execution of platform projects from inception to deployment, ensuring scalability and reliability. Conduct regular 1:1s, drive feedback cycles, and support career growth of engineers. Partner closely with Product and Design teams to develop new platform capabilities. Coach engineers on best practices for architecture, performance, and scalability. Monitor project health, sprint progress, and engineering KPIs. Foster a high-performing team culture with strong engineering ownership. What You Will Need Bachelor s or Master s degree in Computer Science or a related technical field. 8+ years of professional software development experience, including 1+ year in team management. Proven experience building large-scale distributed systems. Strong coding skills in Java, Python, or Go, and multithreading expertise. Deep understanding of messaging systems (Kafka, etc.), transactional and NoSQL databases. Experience working on cloud platforms like GCP or AWS. Exceptional communication, leadership, and stakeholder management skills. Good to have: Exposure to Elasticsearch, data pipelines, or stream processing systems. About Us Meesho is India s leading e-commerce platform built for the next billion users. With 1.75M+ sellers and a customer base spread across every serviceable pin code, we are democratizing internet commerce by enabling small businesses to sell online at zero commission and with the lowest logistics costs in the industry. From affordable products that reflect local demand to a robust pan-India tech infrastructure, Meesho is transforming how India shops and sells online. Our Culture & Total Rewards At Meesho, we believe in creating a culture of impact, inclusion, and innovation. Our values reflected in 11 guiding principles or "Mantras" shape how we work, collaborate, and grow together. Why You ll Love Working Here: Compensation: Competitive salary with equity-based rewards tailored to your experience and impact. Wellness: Extensive health insurance for you and your family through our MeeCare Program, mental wellness support, gym discounts, and more. Flexibility & Leave: Generous time off, parental benefits, and relocation support. Growth & Learning: Continuous learning through workshops, internal mobility, and performance coaching. Culture of Recognition: Personalized gifts, fun rituals, and regular engagement programs celebrating wins big and small. Join us to build the platform powering the future of digital commerce in India. Apply now and be part of a tech-first, people-driven journey at Meesho. Qualification : Bachelors or Masters degree in Computer Science or a related technical field.
Infrastructure Specialist-cloud Application Operations
International Business Machines
Infrastructure Specialist Cloud Application Operations Location: Bangalore, Karnataka, India Job Type: Full-Time Experience Level: Mid to Senior-Level Industry: IT Consulting / Cloud Infrastructure Company: IBM Consulting Client Innovation Center Introduction: At IBM Consulting, your career is powered by collaboration, innovation, and the opportunity to work with visionary clients across industries. You'll be part of a global team committed to driving transformation across hybrid cloud and AI. Backed by our cutting-edge technology and strong ecosystem of strategic partners, you'll help shape the future of cloud operations. In this role, you will be based out of one of our IBM Client Innovation Centers in Bangalore, delivering localized skills and deep technical expertise to clients in both the public and private sectors. Your work will help clients adopt next-gen technologies and innovate faster. Your Role & Responsibilities: Provide technical operations support for cloud-based applications, middleware, DevOps processes, security systems, and infrastructure components. Manage Application ID provisioning and access control in accordance with client standards. Enable infrastructure elasticity by implementing auto-scaling mechanisms to optimize resources based on business needs. Collaborate with global teams to ensure seamless incident management, change control, and service delivery. Share expertise and assist in training peers on technical and procedural workflows. Support business continuity by managing Disaster Recovery (DR) protocols and executing manual failovers when needed. Prepare and present daily, weekly, and monthly integrated service management reports summarizing infrastructure health and operations. Required Skills & Experience: Bachelor's degree in Computer Science, Information Technology, or a related field. Strong communication, collaboration, and teamwork skills. Experience working in technical support or cloud operations environments. Familiarity with application support, DevOps workflows, middleware, and security in cloud ecosystems. Ability to train team members on both procedural and technical topics. Preferred Qualifications: Master s degree in a relevant field is a plus. In-depth understanding of Platform-as-a-Service (PaaS) environments, high availability (HA) infrastructures, and load balancer configurations. Experience with service reporting, performance monitoring tools, and integrated ITSM frameworks. Be a part of a global innovation leader. Work on challenging and impactful projects that influence industries. Collaborate in a culture of growth, continuous learning, and mentorship. Enjoy a dynamic work environment with a strong emphasis on client success and personal development. Apply now and become part of IBM s journey to reshape the future of infrastructure and application support. Qualification : Bachelor's degree in Computer Science, Information Technology, or a related field.
Site Reliability Developer 2/3
Oracle
Job Description: Site Reliability Engineer - OCI Cloud Engineering Team Role: Site Reliability Engineer (SRE) Team: OCI OLTP (Online Transaction Processing) Location: Kiev Career Level: IC2 Experience: 5+ years Overview: Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you! As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure. This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement. Key Responsibilities: Cloud Service Operations & Reliability: Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment. Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services. Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions. Automation & Improvements: Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime. Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services. Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations. Security & Incident Response: Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards. Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution. Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance. Collaborative Problem-Solving: Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services. Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders. Documentation & Knowledge Sharing: Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals. Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations. Continuous Learning: Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work. Technical and Professional Requirements: Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments. Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations. Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes). Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems. Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations. Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction. Preferred Qualifications: Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability. Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools. Experience in networking and storage technologies in a cloud environment. Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services. If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!
Senior DevOps / Site Reliability Engineer
Blue Yonder
Job Title: Senior DevOps / Site Reliability Engineer Location: Pune, India Company: Blue Yonder Experience: 10 to 13 years Education: Bachelor s Degree in Computer Science, Engineering, or related STEM fields Company Overview Blue Yonder is a leading AI-driven Global Supply Chain Solutions provider and consistently recognized as one of Glassdoor s Best Places to Work. We are driving the next wave of digital transformation in manufacturing and retail, delivering innovative SaaS solutions that power intelligent supply chains across the globe. We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to lead the design, development, deployment, and operational management of our Azure SaaS solution. This role requires strong DevOps, cloud delivery, and infrastructure automation expertise, along with leadership capabilities to guide a growing global team. Role Overview In this role, you will be responsible for architecting, planning, and executing end-to-end delivery pipelines, supporting both product development and operational stability. Working closely with platform, product, and architecture teams, you will implement best-in-class DevOps and SRE practices, ensuring scalability, resilience, and cost optimization. Key Responsibilities Architect, design, and manage CI/CD pipelines and infrastructure for a cloud-native, multi-tenant SaaS solution on Azure. Lead sprint planning, backlog grooming, and architecture discussions. Develop quality automation scripts and tools to reduce manual efforts and enable self-healing, self-service capabilities. Identify and resolve operational bottlenecks and proactively improve observability (monitoring, alerting, logging). Participate in code reviews, ensure secure and scalable designs, and mentor junior and mid-level engineers. Collaborate with stakeholders to understand business and technical requirements and translate them into actionable user stories. Implement and enforce cloud cost optimization strategies. Conduct post-incident reviews with a blameless culture to identify root causes and drive continuous improvements. Automate service requests and standard operational procedures. Drive improvements to the team s continuous integration pipeline, ensuring rapid and reliable deployments. Stay updated with the latest DevOps, SRE, and cloud technologies and bring innovative ideas to the table. Participate in team hiring and actively contribute to onboarding new team members. Technical Environment Languages: Java, Python, PowerShell, Shell Scripting DevOps Tools: Azure DevOps, GitHub Actions, Jenkins Cloud: Microsoft Azure (ARM Templates, AKS, Event Hub, HDInsight, Azure AD, Application Gateway, Virtual Networks) Architecture: Microservices, Kubernetes, Docker, Event-driven architecture Frameworks: Spring Boot, Hibernate Monitoring & Logging: Elasticsearch, Spark, Kafka Databases: RDBMS, NoSQL Version Control: Git Required Skills & Experience Bachelor s Degree (STEM preferred) with 10 to 13 years of experience in DevOps, Cloud Delivery, or Site Reliability Engineering. Proven hands-on experience with Azure Cloud Services. Expertise in setting up and optimizing CI/CD pipelines. Strong scripting experience: Shell and PowerShell are mandatory; Python is a plus. Strong understanding of container technologies (Docker, Kubernetes) and microservices architecture. Experience integrating and managing third-party monitoring and logging tools. Strong problem-solving skills and ability to work with global, cross-functional teams. Excellent communication and stakeholder management skills. Nice to Have Development experience in Java or Python. Experience working in agile teams with a product-centric mindset. Experience working in manufacturing or retail domains. Exposure to AI/ML-driven monitoring and observability tools. Work with cutting-edge technologies on globally impactful solutions. Collaborate with diverse and talented teams across the US, India, and the UK. Foster your career growth through mentorship, continuous learning, and leadership opportunities. Experience an inclusive, flexible work culture where innovation and creativity thrive. Diversity, Inclusion, Value & Equality (DIVE) At Blue Yonder, we are committed to building an inclusive environment where everyone feels empowered to be themselves. All qualified applicants will receive consideration for employment regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. Qualification : Bachelors Degree in Computer Science, Engineering, or related STEM fields
Senior Site Reliability Engineer
Couchbase
Job Title: Site Reliability Engineer (SRE) Cloud Platform & Production Pipeline Initiatives Location: Bangalore, India (Office-based role) About Couchbase: As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI-driven world. By uniting transactional, analytical, mobile, and AI workloads into a seamless, fully managed solution, Couchbase empowers developers and enterprises to build and scale applications with unmatched flexibility, performance, and cost-efficiency from cloud to edge. Trusted by over 30% of the Fortune 100, Couchbase is unlocking innovation, accelerating AI transformation, and redefining customer experiences. Come join our mission! Job Overview: As a Site Reliability Engineer (SRE), you will play a pivotal role in managing, optimizing, and maintaining Couchbase s cloud infrastructure for Capella, our Database as a Service (DBaaS) platform. You will be responsible for ensuring the reliability and performance of our cloud service while collaborating closely with engineering teams to improve deployment pipelines, security practices, and overall system health. You will work across cloud platforms and multiple tools to provide guidance, mentorship, and contribute to the strategic direction of cloud operations. Responsibilities: Infrastructure Management: Manage, monitor, and maintain the infrastructure for Capella to ensure reliable operations. Security & Compliance: Implement and manage cloud environments in accordance with company security guidelines, including vulnerability management, penetration testing, and compliance requirements (SOC 2, PCI-DSS, GDPR, HIPAA, etc.). CI/CD & Release Pipeline: Collaborate with engineering teams to optimize CI/CD processes, aiming for a highly resilient deployment strategy, ideally with zero downtime. Cloud Optimization: Stay up-to-date with new technologies and industry trends to continuously improve cloud platform architecture and meet the evolving needs of the business. Security Integration: Work with development teams to integrate security scanners within the DevOps lifecycle, enhancing security posture. Leadership & Mentorship: Provide guidance on architecture, code reviews, and technical feedback to improve service reliability, security, cost, and performance. Incident Management: Demonstrate exceptional problem-solving skills, proactively identifying and addressing potential issues before they affect business operations. Collaboration: Partner with development teams, application owners, and stakeholders to integrate best practices and ensure seamless service delivery. Requirements: Experience: 5+ years in Site Reliability Engineering (SRE), DevSecOps, or similar roles, with significant experience working in public cloud environments. Programming & Scripting: Proficiency in languages such as Go, Python, Java, or Ruby. Linux Expertise: High proficiency with Linux operating systems. Kubernetes Management: Experience in managing and maintaining Kubernetes clusters (both self-managed and managed platforms like AWS EKS). Security & Vulnerability Management: In-depth knowledge of security tools and practices (vulnerability management, pen testing, SCA, DAST, SAST), with hands-on experience using tools like Sysdig, Synk, and Blackduck. Cloud Platforms & Tools: Strong experience with cloud platforms (AWS, GCP, Azure) and open-source tools like Artifactory, Jira, Jenkins, Grafana, Prometheus, Datadog, Thanos, etc. Configuration Management: Proficiency with Terraform, Git, and CI/CD platforms (e.g., CircleCI, GitHub, Spinnaker). Networking Security: Solid understanding of TCP/IP, DNS, HTTP, Firewalls, VPNs, and other networking security concepts. Preferred Skills: Availability & Reliability: Knowledge of SLO/SLA, availability, reliability, and performance concepts. Incident Management: Experience with on-call rotations and incident management. Database Experience: Familiarity with databases, particularly Couchbase. Security Certifications: Relevant certifications in security or cloud technologies are a plus. Couchbase reimagines database technology to deliver a fast, flexible, and affordable cloud database platform, empowering developers to build applications with exceptional customer experiences. Trusted by over 30% of the Fortune 100, Couchbase drives innovation and customer success through its Capella platform. Benefits at Couchbase: Generous Time Off Program: Flexibility to care for yourself and your family. Wellness Benefits: Access to world-class medical plans, dental, vision, life insurance, and employee assistance programs. Financial Planning: RSU equity program, ESPP, retirement planning, and business travel insurance. Career Growth: Focused on your career development and success. Fun Perks: Ergonomic and comfortable office setup, food & snacks for in-office employees, and more!
Devops
Mirafra Technologies
DevOps Engineer Location: Bangalore Experience: 5+ Years Education Qualification: B.E. in Computer Science / Electronics About Mirafra Founded in 2004, Mirafra is a fast-growing global product engineering services company specializing in Semiconductor Design, Embedded Systems, Digital Solutions, and Application Software. With over 1,500+ professionals worldwide, we provide cutting-edge solutions to Fortune 500 clients across industries such as Semiconductor, Internet, Aerospace, Networking, Telecom, Medical Devices, and Consumer Electronics. Recognitions: Best Company to Work For SiliconIndia (2016) Most Promising Design Services Provider SiliconIndia (2018) Top 10 Admired Companies for Software Services DigiTech Insight (2022) Key Responsibilities DevOps & Automation Develop automated CI/CD pipelines and manage build & deployment processes. Implement infrastructure automation using scripting (Shell, Batch, Python). Manage configuration, integration, and deployment using DevOps tools. Version Control & Build Management Work with Git, Gitlab, Bitbucket for version control. Maintain build systems like Make, CMake and manage dependencies using Pip, Conda, Poetry, Maven. Handle binary management tools like Artifactory, Nexus. Code Quality & Security Utilize Static Code Analysis tools (SonarQube, Pylint, Coverity) for code quality enforcement. Monitor and ensure security compliance in the DevOps lifecycle. Cloud & Containerization Manage cloud-based deployments and monitoring using ELK, Docker, Kubernetes. Implement scalable and resilient infrastructure solutions. Agile & Collaboration Work in an Agile/Scrum environment, collaborating with cross-functional teams. Utilize UML modeling and software development best practices. Skills & Qualifications Education: B.E. in Computer Science / Electronics Technical Expertise: Scripting & Automation: Shell, Batch, Python CI/CD & Build Tools: Jenkins, Gitlab, Make, CMake Version Control: Git, Bitbucket, Gitlab SCM Static Code Analysis: SonarQube, Pylint, Coverity Package Management: Pip, Conda, Poetry, Maven Binary Management: Artifactory, Nexus Cloud & Containerization: Docker, Kubernetes, ELK Stack Programming Languages: Python, C, C++ Operating Systems: Linux, Unix, Windows Soft Skills: Strong problem-solving and analytical skills. Excellent communication and team collaboration. Ability to work in fast-paced Agile environments. Cutting-edge projects in Semiconductor, Aerospace, Networking, and IoT. Global work environment with top-tier clients. Career growth opportunities and exposure to the latest technologies. Award-winning workplace culture and industry recognition. Excited to take on a challenging DevOps role? Apply now!
Software Engineer Iii, Infrastructure, Core
Google Careers
Job Title: Software Engineer About the Role: At Google, our Software Engineers are at the forefront of innovation, designing and developing cutting-edge technologies that shape how billions of users connect, explore, and interact with information. Our products operate at an immense scale, extending far beyond web search, and require engineers who bring fresh perspectives from diverse technical domains, including information retrieval, distributed computing, large-scale system design, networking, security, artificial intelligence, natural language processing, UI design, and mobile development. As a Software Engineer, you will contribute to mission-critical projects, collaborating with teams across Google to develop, test, deploy, maintain, and enhance software solutions. Your versatility, leadership abilities, and enthusiasm for solving complex challenges will be crucial as you navigate projects across the full technology stack. The Core Team serves as the backbone of Google s technical infrastructure, building the foundational elements behind our flagship products. This team is responsible for developing essential developer platforms, product components, and infrastructure that drive innovation across Google s ecosystem. As a member of this team, you will play a pivotal role in breaking down technical barriers, optimizing existing systems, and making key architectural decisions that influence the entire organization. Key Responsibilities: Design, develop, and maintain high-quality software solutions that support Google's technical infrastructure and products. Participate in and lead design reviews with peers and stakeholders, evaluating available technologies to determine optimal solutions. Conduct thorough code reviews to ensure adherence to best practices, including code quality, efficiency, accuracy, testability, and compliance with style guidelines. Contribute to documentation and educational resources, updating content based on product enhancements and user feedback. Troubleshoot and debug complex system issues, analyzing their impact on hardware, networks, and service operations to maintain optimal performance and reliability. At Google, we foster a culture of continuous learning, innovation, and technical excellence. If you're passionate about solving challenging problems and building world-class technology, we invite you to be part of our journey. Qualification : Bachelors degree or equivalent practical experience.
Technical Lead
Cisco Technology Inc
Meet the Team As a Technical Lead, you will drive technical excellence across HPC infrastructure, network automation, DevOps practices, and SRE principles while leading architecture decisions and guiding teams in implementing high-performance solutions for AI/ML workloads on various network topologies. This role combines deep technical expertise with leadership responsibilities, focusing on system architecture, automation, reliability engineering, and development excellence. Your Impact Design and implement end-to-end automation solutions for HPC infrastructure (Compute, network, and storage) using Kubernetes operators, Terraform, and Ansible. Analyze compute, storage, and network traffic patterns during distributed training/inference operations across different AI/ML frameworks. Monitor and optimize network utilization patterns for various model architectures. Identify bottlenecks in network communication patterns. Perform root cause analysis across network, compute, and storage layers, with experience handling various failure scenarios and recovery procedures. Make architectural decisions and drive innovation. Develop infrastructure patterns for different workload types. Provide benchmarking and performance engineering leadership. Mentor junior engineers through architecture reviews and code critiques. Design and implement comprehensive telemetry collection systems for monitoring high-speed network microburst behavior. Develop sophisticated visualization tools and analytics frameworks to enable real-time identification of performance bottlenecks and system constraints, facilitating rapid optimization and troubleshooting. Minimum Qualifications Demonstrated expertise in distributed systems and infrastructure design (compute, storage, and networking). Experience with network automation tools and configuration management (Ansible, Python, Golang, YAML, YANG). Strong background in CI/CD, GitOps, or similar practices and tools. Expert-level experience with observability platforms and practices. Strong background in implementing distributed tracing, metrics collection, and log aggregation systems. Demonstrated experience in at least one completed performance benchmarking project for distributed systems, storage, network, and compute. Preferred Qualifications Bachelor s degree in Computer Science, Software Engineering, or a related technical field with 15-20 years of extensive hands-on experience in distributed systems development and DevOps practices. An advanced degree is a plus. Contributions to open-source projects related to distributed systems or performance engineering. Experience in analyzing and documenting system performance metrics across network, compute, and storage layers. Prior experience mentoring and developing technical talent. Understanding of AI/ML infrastructure and any prior experience with RDMA, RoCE v2 will be an added advantage. #WeAreCisco #WeAreCisco where every individual brings their unique skills and perspectives together to pursue our purpose of powering an inclusive future for all. Qualification : Bachelors degree in Computer Science, Software Engineering, or a related technical field with 15-20 years of extensive hands-on experience in distributed systems development and DevOps practices. An advanced degree is a plus.
Autoit Solutioning Engineer, Lead
Qualcomm
Job Title: Site Reliability Engineer (SRE) General Summary: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. This role is critical in ensuring the stability, scalability, and security of our infrastructure and services. As an SRE, you will work collaboratively with software engineers, data scientists, and product managers to optimize system reliability while driving automation and continuous improvement. You will be responsible for modernizing traditional services, implementing cutting-edge technology, and proactively managing infrastructure to maintain operational excellence. If you are passionate about automation, DevSecOps, system performance, and infrastructure resilience, this role offers an exciting opportunity to make a meaningful impact. Key Responsibilities: System Monitoring & Incident Response: Continuously monitor system health, detect anomalies, and respond to incidents promptly. Investigate and troubleshoot service-related issues, ensuring minimal disruption. Implement proactive measures to prevent downtime and optimize system stability. Infrastructure Automation & DevOps Implementation: Develop and maintain Infrastructure-as-Code (IaC) scripts to automate deployments and scaling. Automate routine operational tasks to improve efficiency and reduce manual intervention. Leverage DevSecOps practices to ensure secure and resilient deployments. Performance Optimization & Capacity Planning: Collaborate with development teams to enhance software performance and system responsiveness. Identify and resolve system bottlenecks to improve speed, efficiency, and reliability. Forecast resource requirements based on traffic patterns and business growth. Security, Compliance & Risk Management: Implement security best practices and compliance measures across all infrastructure layers. Conduct security audits and ensure systems meet industry-standard security guidelines. Proactively assess and mitigate risks associated with infrastructure and deployments. Required Qualifications & Skills: Technical Expertise: Extensive experience with Linux-based environments (Ubuntu, RedHat), including system administration and troubleshooting. Strong proficiency in scripting and automation using Python, Bash, or Go. Experience with containerization and orchestration technologies such as Docker and Kubernetes. Familiarity with CI/CD pipelines and tools like Jenkins, Puppet, Vault, and Splunk. Hands-on experience with cloud platforms (AWS, Azure, or GCP). Problem-Solving & Leadership: Strong analytical skills with the ability to diagnose and resolve complex system issues. Self-driven, highly motivated, and able to work independently in a fast-paced environment. Ability to collaborate cross-functionally and communicate technical solutions effectively. Security & Reliability Focus: Solid understanding of DevSecOps principles and secure system design. Ability to implement monitoring, logging, and alerting solutions to maintain system resilience. Passion for continuous learning and leveraging data-driven approaches for system improvement. Work in a high-impact role that directly contributes to the reliability and scalability of mission-critical systems. Be part of an innovative, forward-thinking team that values automation, collaboration, and continuous improvement. Competitive salary, professional development opportunities, and an environment that fosters growth and innovation. If you are a passionate, results-driven SRE, we invite you to join us and play a pivotal role in shaping the future of our infrastructure.
Senior Staff Software Engineer, Google Cloud
Google Careers
About the Job Google's software engineers develop next-generation technologies that transform how billions of users connect, explore, and interact with information and one another. Beyond web search, our products must manage information at a massive scale, leveraging expertise in fields such as distributed computing, large-scale system design, networking, data storage, artificial intelligence, natural language processing, UI design, and mobile. As a Software Engineer, you'll work on mission-critical projects, with opportunities to switch teams and projects as you and our fast-paced business evolve. We seek engineers who are versatile, demonstrate leadership qualities, and are enthusiastic about solving new challenges across the full-stack as we continue pushing technology forward. With your technical expertise, you will manage project priorities, deadlines, and deliverables while designing, developing, testing, deploying, maintaining, and enhancing large-scale software solutions. Google Cloud helps organizations digitally transform with cutting-edge infrastructure, platforms, and solutions, all while operating on the cleanest cloud in the industry. Trusted by customers in more than 200 countries and territories, Google Cloud is a partner that enables growth and solves critical business challenges. Responsibilities Provide technical leadership on high-impact projects. Influence and mentor a distributed team of engineers. Facilitate alignment across teams, ensuring clarity on goals, outcomes, and timelines. Manage project priorities, deadlines, and deliverables. Design, develop, test, deploy, maintain, and enhance large-scale software solutions. Minimum Qualifications Bachelor s degree or equivalent practical experience. 8 years of experience in software development. 5 years of experience in software design and architecture, including testing and launching software products. Preferred Qualifications Master s degree or PhD in Engineering, Computer Science, or a related technical field. 8 years of experience with data structures and algorithms. 5 years of experience in a technical leadership role, leading project teams and setting technical direction. 3 years of experience working in complex, matrixed organizations on cross-functional or cross-business projects.
1 - 20 of 0 jobs
* No exact matches found. Showing closest results insteadNo results found
Modify search criteria or create an alert to get relevant jobs as soon as they’re posted