SRE Practices Jobs in Bengaluru
743 Jobs Found
Infrastructure Security Leader
Observe.ai Networks Private Limited
Infrastructure Security Leader Location: Bengaluru About Us: Observe.AI Observe.AI is the leading AI-powered platform for customer experience, enabling enterprises to automate customer interactions using AI agents. Our platform ensures natural conversations, delivering predictable outcomes, and is trusted by top companies like DoorDash, Affordable Care, Signify Health, and Verida. Observe.AI blends advanced speech understanding, workflow automation, and enterprise-grade governance to deliver end-to-end AI solutions that optimize both human and AI interactions, providing insights for coaching and quality management. At Observe.AI, we re on a mission to transform customer experiences through AI. As a founding member of our Infrastructure/Cloud Security team, you will have the opportunity to shape and design cloud security from the ground up for a platform trusted by over 80 million users. Reporting directly to the VP of Information Security, you will drive a defense-in-depth approach across infrastructure, IAM, and networks. This is a unique, zero-to-one role where you ll define security strategy, mentor the team, and make a long-lasting impact in a fast-growing AI company. What You ll Be Doing: Security Strategy Development: Design and document security policies, reference architectures, design patterns, and roadmaps to protect our platform. Secure Access & Network Design: Lead efforts to design secure access controls and networks for production environments. Cross-Department Leadership: Collaborate with Corporate IT to implement security measures within the corporate environment. Defense-in-Depth: Implement network segmentation, firewall configurations, VPNs, and deep packet inspection to minimize impact from security incidents. AWS Infrastructure Security: Re-architect AWS infrastructure to enhance security, ensuring that networks, VPCs, and security configurations are optimized. Vulnerability Management: Identify tools and technologies to scan networks, OS, and infrastructure for vulnerabilities, and work with SRE teams to remediate identified risks. Security Compliance: Represent Infrastructure Security in PCI, SOC, ISO, HITRUST, and other regulatory audits, ensuring compliance. Collaborative Design: Partner with engineering teams and architects to ensure infrastructure designs meet both business and security requirements. Stakeholder Collaboration: Work with other teams to integrate up-to-date security features and infrastructure designs across the organization. What You ll Bring to the Role: 9+ years of experience in Software Engineering, Network Security, and AWS Security. Proven track record in designing and implementing secure Cloud Infrastructure, Network Security, and Corporate IT Security. Experience at a SaaS product company with hands-on knowledge of cloud security. Leadership experience in managing Infrastructure Security teams or Security-Focused SRE teams. Strong understanding of network designs, protocols, and certifications like CCNA (or similar). Ability to handle multiple, high-priority projects simultaneously while maintaining focus and quality. Comfort with working off-hours to handle security incidents in a dynamic, fast-paced environment. First-hand experience with major cloud providers, specifically AWS. Deep understanding of large-scale systems and N-tier architectures. Excellent communication skills, able to effectively influence and collaborate with stakeholders across the organization. Perks & Benefits: Medical Insurance: Comprehensive options, including free online doctor consultations. Leave Policies: Yearly privilege and sick leaves as per Karnataka S&E Act, along with generous national, festive, and parental leave. Learning & Development: Access to a fund that supports continuous learning and professional growth. Flexible Benefits: Tax exemptions for meals, PF, etc., along with other flexible benefit plans. Team Culture: Fun events to foster collaboration and culture across the organization.
Lead Software Engineer - Scale & Performance
Team Vunet Systems
Lead Software Engineer - Scale & Performance Location: Bengaluru Experience: 6 12 years About VuNet VuNet is a pioneer in Business Journey Observability, using Big Data and Machine Learning to revolutionize digital experiences in the financial services industry. Our platform delivers end-to-end visibility into customer journeys, helping organizations proactively resolve issues, ensure operational resilience, and deliver superior user satisfaction. With over 28 billion digital transactions monitored every month and serving more than 300 million users globally, VuNet is shaping the future of observability for some of the largest banks and financial institutions. We are Series B funded, part of NASSCOM s DeepTech Club, and recognized by global analysts such as Gartner and Omdia. Your Role: Lead Software Engineer - Scale & Performance As a Lead Software Engineer for Scale & Performance, you ll own the performance and scalability benchmarks for VuNet s observability platform. You will work with cutting-edge technologies, design robust test frameworks, and ensure that our platform scales seamlessly to meet the demands of millions of users. Roles & Responsibilities Own performance and scalability benchmarking for key platform components (ingestion pipelines, data storage, and query services). Design and execute load, stress, soak, and capacity tests across microservices, agents, and ingestion layers. Identify and resolve performance bottlenecks in both infrastructure (CPU/memory/IO) and application layers (API latency, throughput, GC behavior). Develop and maintain performance test frameworks, preferably using Kubernetes-based environments. Collaborate with DevOps and SRE teams to optimize system configurations (Kubernetes, Postgres/TimescaleDB, ClickHouse, Kafka) for scale. Implement OpenTelemetry for service instrumentation to monitor system health and latency (p50/p95/p99 metrics). Contribute to capacity planning, scaling strategies (horizontal/vertical), and resource optimization. Analyze production incidents related to scaling issues and drive permanent fixes. Work with engineering teams to design scalable architecture patterns and define SLIs/SLOs for system performance. Document performance baselines, tuning guides, and scalability best practices for internal use. What You Bring Mandatory Skills: Strong background in performance engineering for large-scale distributed systems or SaaS platforms. Expertise in Kubernetes, container runtimes (containerd/Docker), and resource profiling in containerized environments. Solid understanding of Linux internals, CPU/memory profiling, and network stack tuning. Hands-on experience with observability tools (Prometheus, Grafana, OpenTelemetry, Jaeger, Loki, Tempo, etc.). Familiarity with observability platform datastores like ClickHouse, PostgreSQL/TimescaleDB, Elasticsearch, or Cassandra. Experience with performance benchmarking tools such as k6, Locust, JMeter, or custom Golang/Python scripts. Ability to interpret system metrics (CPU usage, memory, GC, latency) and correlate across different layers. Nice-to-Have Skills: Experience with agent benchmarking (OpenTelemetry Collector, custom data shippers). Exposure to streaming systems like Kafka, NATS, or Pulsar. Familiarity with CI/CD pipelines for performance testing and regression tracking. Knowledge of cost optimization and capacity forecasting in cloud environments (AWS/GCP/Azure). Proficiency in Go, Python, or Bash scripting for automation and data analysis. Life at VuNet: At VuNet, we're building a world-class observability platform, and we re just getting started. You ll be part of a passionate, problem-solving team that embraces collaboration, fast learning, and staying ahead of emerging technologies like Gen AI. We foster a high-trust, inclusive culture where collaboration, ownership, and innovation are central to our success. If you're looking to work on cutting-edge tech, make a real impact, and grow with a supportive team you ll fit right in at VuNet. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness and 1:1 counseling support. A culture that promotes continuous learning, innovation, and career growth. Transparent, inclusive, and high-trust workplace. Opportunities for skill enhancement with training programs focused on new Gen AI technologies.
Mobile App And Observability Sdk Engineer
Team Vunet Systems
Mobile App and Observability SDK Engineer Experience: 3 6 Years Location: Bengaluru About VuNet VuNet is a pioneer in Business Journey Observability, revolutionizing the financial services industry with Big Data and Machine Learning. Our cutting-edge platform offers end-to-end visibility into customer journeys, driving proactive issue resolution, operational resilience, and superior user satisfaction. With over 28 billion digital transactions monitored monthly touching 400 million users worldwide we re already powering leading banks and financial institutions across India and MEA. VuNet is Series B funded, part of NASSCOM DeepTech Club, and recognized globally by analysts like Gartner and Omdia. Your Role: Mobile App and Observability SDK Engineer At VuNet, the Product Development Team is dedicated to delivering exceptional customer experiences through scalable products. We are looking for a Mobile App and Observability SDK Engineer to join this team. In this role, you ll be at the forefront of building high-quality mobile applications and advancing our Mobile Real User Monitoring (MRUM) initiatives. You ll capture and translate mobile performance data into actionable insights, helping improve the performance and user experience of mobile apps across various platforms. If you re passionate about mobile engineering, user experience, and observability this role offers a unique opportunity to merge these interests into a groundbreaking solution. Roles & Responsibilities Mobile Application Development: Design, develop, and maintain robust, high-performance mobile applications for iOS and Android using Swift, Kotlin, Flutter, or React Native. Testing & Quality Assurance: Implement unit, integration, and UI testing strategies to ensure the app s quality, stability, and regression coverage. Debugging & Profiling: Identify and resolve performance bottlenecks, ANRs, crashes, and memory leaks using tools like Android Studio Profiler, Xcode Instruments, or Flipper. Crash Analysis & Reporting: Integrate crash analytics tools and develop efficient incident tracking and resolution workflows. Performance Monitoring & Insights: Leverage telemetry, profiling, and analytics data to enhance app performance, responsiveness, and overall user experience. Observability Collaboration: Work with SRE and backend teams to export performance metrics, logs, and traces from mobile clients into centralized observability platforms. Code Quality: Write clean, modular, and well-documented code, adhering to best practices in mobile development and SDK maintenance. What You Bring Mandatory Skills: Mobile App Development: 3+ years of hands-on experience in mobile app development using Flutter, React Native, Swift, or Kotlin (experience in at least two of these). App Lifecycle & Performance: Strong understanding of mobile app lifecycle, UI rendering, asynchronous processing, state management, and performance optimization (ANRs, memory management, network latency). Debugging & Profiling Tools: Proficiency in debugging, profiling, and testing mobile applications using tools like Android Studio Profiler, Xcode Instruments, or Flipper. Crash Analytics: Experience integrating and using crash analytics and reporting tools. CI/CD & SDK Versioning: Familiarity with CI/CD pipelines, automated testing, and SDK versioning. Performance Instrumentation: Interest in observability, monitoring, and performance instrumentation with a willingness to learn OpenTelemetry and RUM concepts. Problem-Solving Mindset: Strong analytical and debugging skills, focused on enhancing performance and reliability. Nice-to-Have Skills: OpenTelemetry & SDKs: Exposure to OpenTelemetry SDKs or other instrumentation frameworks for capturing telemetry data (e.g., traces, metrics, logs). Mobile Observability: Familiarity with mobile observability backends. Session Replay & Mobile Analytics: Knowledge of session replay, user behavior tracking, or mobile analytics SDKs. SRE & Monitoring Practices: Understanding of SRE principles, monitoring best practices, and golden signals. Open Source Contributions: Contributions to open-source SDKs or mobile performance tools. Life at VuNet: At VuNet, we re building a world-class observability platform proudly Made in India. We re just getting started, and we re looking for people like you to join us in tackling some of the most complex challenges in the digital world. Our team is filled with passionate problem-solvers who thrive in a collaborative, fast-paced environment. We embrace continuous learning, adapt quickly, and stay ahead of emerging technologies like Gen AI. If you re looking to work on cutting-edge technology, make a real impact, and grow with a supportive team, you ll feel right at home here at VuNet. Benefits: Comprehensive health insurance coverage for you, your parents, and dependents. Mental wellness and 1:1 counseling support. A learning culture that promotes growth, innovation, and ownership. A transparent, inclusive, and high-trust workplace culture. Access to Gen AI and integrated technology workspaces. Supportive career development programs to expand your skills with various training opportunities.
Director Quality Engineer
Coindcx
Director Quality Engineering Experience: 15 20 years Location: Bengaluru Team: Engineering About CoinDCX At CoinDCX, we believe Change Starts Together. We are on a mission to make Web3 accessible to all, building cutting-edge products that solve real-world challenges in security, scalability, and user accessibility. In just six years, we ve transformed from India s first crypto unicorn to a platform serving over 125 million users worldwide. As we accelerate Web3 adoption, we are looking for visionary leaders to help us maintain world-class quality and performance standards. Role Overview As Director of Quality Engineering, you will lead and scale our QA and Performance Engineering functions to ensure the reliability, scalability, and security of our fintech products. You ll be responsible for driving the quality strategy across large-scale distributed systems and building a high-performing team passionate about excellence. What You ll Do Leadership & Strategy Lead and grow a team of 50+ QA, automation, and performance engineers. Define and execute a long-term quality engineering strategy aligned with business goals and regulatory requirements. Foster a culture of ownership, accountability, and continuous improvement. Quality Engineering Champion an automation-first approach across functional, regression, and integration testing. Oversee end-to-end validation for core product flows including trading, payments, custody, and compliance. Own testing strategies for microservices architectures and high-throughput APIs. Performance & Scalability Lead performance testing initiatives designed to support systems handling over 1 million TPS with sub-50ms latency. Develop frameworks for continuous performance benchmarking and capacity planning. Collaborate with SRE, DevOps, and Product Engineering to identify and mitigate performance bottlenecks. Non-Functional Testing Ensure comprehensive coverage for reliability, availability, failover, disaster recovery, and security. Drive chaos testing, fault injection, and compliance-related quality assurance processes. Collaboration & Stakeholder Management Partner with Product, Platform, Security, and Compliance teams to align quality standards with regulatory mandates. Provide executive reporting on quality, system resilience, and risk metrics. Influence cross-functional adoption of best practices in testing and release validation. What You Bring Experience 15+ years in QA and Performance Engineering, with at least 5 years in senior leadership roles. Proven experience managing large, high-growth fintech or financial services engineering teams (50+ members). Technical Expertise Deep expertise in testing large-scale distributed systems. Strong knowledge of performance, load, stress, soak, and chaos testing frameworks. Familiarity with cloud-native environments (AWS, Kubernetes), CI/CD pipelines, and observability tools. Domain Knowledge Extensive background in fintech or financial services (trading, payments, banking). Strong understanding of regulatory and compliance requirements in financial applications. Leadership & Soft Skills Exceptional people leadership, mentoring, and organizational scaling capabilities. Excellent stakeholder management with the ability to influence senior engineering and business leaders. Strategic, data-driven decision-making mindset. You re passionate and constantly curious about Web3 and Virtual Digital Assets (VDA). You act with ownership, drive excellence, and focus on measurable impact. You embrace a We over Me philosophy empowering your team as you grow. Change excites you and fuels your innovation mindset. You think beyond limits, challenging the status quo to push boundaries. Perks That Empower You Design Your Own Benefit: Personalize your perks to fit your lifestyle whether it s tech, travel, or pets, your priorities come first. Unlimited Wellness Leaves: Take time off as needed to recharge your health matters most. Mental Wellness Support: Access free counseling, expert sessions, workshops, and social events to stay balanced. Bi-Weekly Learning Sessions: Sharpen your skills and stay current with ongoing industry trends and knowledge. Join Us If you re ready to lead a high-impact team and help build the future of Web3 quality engineering we want to HODL you on our team!
Devops Engineering Manager
Medi Assist
Position: DevOps Engineering Manager Location: Bangalore Experience: 5 10 years Education: BE/BTech/MCA/MTech/MSc Role Overview: We re looking for an experienced DevOps Engineering Manager to lead our cloud infrastructure, automation, and DevOps initiatives. This is a hands-on leadership role focused on driving efficiency, security, and scalability across our IT operations and development pipelines. Key Responsibilities: Cloud & Infrastructure Management: Administer and manage Google Workspace, including user accounts, security policies, and compliance settings. Oversee and optimize AWS resources (EC2, IAM, S3, VPC), ensuring cost-effective and secure cloud operations. Configure and manage A10 vThunder for load balancing and network performance optimization. Serve as Active Directory Administrator, maintaining AD, DNS, and Group Policy Objects (GPOs). Deploy, maintain, and troubleshoot VMware environments to support virtual infrastructure. Security & Compliance: Manage domain and SSL certificates including installation, renewal, and issue resolution. Handle ADFS token certificate renewals to support uninterrupted authentication services. Enforce security best practices across cloud and on-prem environments. Automation & Scripting: Create and maintain automation scripts using Bash, PowerShell, or Python to streamline workflows. Reduce manual intervention and boost system efficiency through smart scripting and task automation. Monitoring & Troubleshooting: Proactively monitor system logs, performance metrics, and security alerts to prevent downtime. Investigate and resolve issues related to network, infrastructure, and cloud environments promptly. Required Skills & Experience: Proven experience with infrastructure automation tools such as Terraform or CloudFormation. Strong understanding of DevOps practices and implementing CI/CD pipelines for cloud deployments. Solid scripting skills in Bash, PowerShell, or Python. Expertise in managing both cloud-based and on-premise infrastructure. Strong troubleshooting capabilities and a proactive approach to system monitoring. Qualification : BE/BTech/MCA/MTech/MSc
Technical Lead Devops
Subex Limited
Position: Technical Lead - DevOps Location: Bangalore Rural, Karnataka, India Department: Data Platform and DevOps Employment Type: Subexian Experience Required: 3 to 6 years Job Overview: We are seeking an experienced Kubernetes Administrator with a strong background in managing containerized environments. The ideal candidate will have 4+ years of hands-on experience in deploying, configuring, and optimizing Kubernetes clusters to drive scalability, reliability, and performance. This is an excellent opportunity to leverage your expertise in Kubernetes orchestration while contributing to the overall success of our platform. Key Responsibilities: Cluster Management: Deploy, configure, and manage Kubernetes clusters both on-premises and across cloud platforms such as AWS, Azure, and GCP. Security & Compliance: Implement best practices for cluster security, including role-based access control (RBAC), network policies, and data encryption at rest and in transit. Automation: Automate cluster provisioning and ongoing management using tools like Terraform, Ansible, or Helm charts, streamlining operations and reducing manual tasks by 40%. Monitoring & Performance: Continuously monitor cluster health and performance metrics using tools like Prometheus, Grafana, ensuring high availability and optimal performance. CI/CD Pipelines: Design and implement CI/CD pipelines for containerized applications using tools such as Jenkins, GitLab CI/CD, and CircleCI to enable smooth continuous delivery. Collaboration: Work closely with development teams to troubleshoot issues, optimize application performance, and ensure compatibility with Kubernetes environments. Security Audits: Conduct regular security audits to identify vulnerabilities and ensure compliance with industry standards. Documentation: Maintain clear and comprehensive documentation for deployment procedures, configuration settings, and troubleshooting guides to enhance knowledge sharing within the team. Infrastructure Management: Administer and maintain Linux/Unix servers and virtualization platforms such as VMware or KVM, ensuring seamless operations across the infrastructure. Backup & Recovery: Implement and manage robust backup and disaster recovery solutions to ensure data integrity and minimize system downtime. Technical Support: Provide expert-level technical support for server and network infrastructure-related issues. Required Skills & Qualifications: Proven experience in Kubernetes deployment, configuration, and administration. Strong command of containerization technologies, including Docker and containerd. Hands-on experience with cloud platforms such as AWS, Azure, and GCP. Proficiency in Infrastructure as Code (IAC) tools like Terraform and Ansible. Familiarity with CI/CD pipelines and automation tools like Jenkins and GitLab CI/CD. Excellent troubleshooting and problem-solving skills. Strong communication and collaboration abilities, with the capability to work effectively across cross-functional teams. If you re passionate about DevOps, Kubernetes, and driving the success of containerized environments, we d love to hear from you!
Aws Cloud Architect
Aptean
Job Title: Cloud Architect SRE Location: Bangalore, India Shift: Rotational Shift Overview At Aptean, we build tailored ERP solutions that power transformation across industries from food production to manufacturing. In a world of generic enterprise software, our targeted products stand apart, delivering measurable results. With over 50 products, 3,000+ employees, and a global customer base, now is the perfect time to grow your career with us. About the Role We are looking for a highly skilled Cloud Architect SRE with deep expertise in Amazon Web Services (AWS) to lead the design, implementation, and management of cloud infrastructure. You ll play a pivotal role in defining our cloud strategy, enhancing system reliability, optimizing performance, and ensuring high availability and security across environments. Key Responsibilities Cloud Architecture & Strategy Design scalable, secure, and resilient AWS cloud architectures. Define and maintain architectural standards, templates, and best practices. Drive cloud governance, including IAM, PIM/PAM, and policy enforcement. Infrastructure & Automation Manage and troubleshoot AWS IaaS and PaaS services. Expertise in Windows Server OS, DNS, DHCP, RDWeb, and domain controllers. Implement automation and scripting for reporting, inventory, and orchestration. Optimize cloud resources for performance, reliability, and cost efficiency. Security & Compliance Implement AWS security controls including IAM, encryption, and network protection. Ensure compliance with frameworks like SOC2, BUPA, and internal policies. Conduct regular security assessments and resolve vulnerabilities. Cost Optimization Analyze and reduce cloud costs using AWS Cost Explorer, Trusted Advisor, etc. Leverage reserved and spot instances, right-sizing, and efficient resource management. Documentation Create and maintain detailed documentation including architecture diagrams, SOPs, and technical guides. Qualifications Education: Bachelor s degree in Computer Science, Information Technology, or a related field. Experience: 5+ years of hands-on experience designing and deploying AWS cloud architectures. Proven experience with AWS services such as EC2, S3, VPC, IAM, RDS, and CloudFormation. Proficiency with Infrastructure as Code (Terraform, CloudFormation). Strong understanding of networking protocols and DevOps principles. Certifications (preferred): AWS Certified Solutions Architect Professional AWS Certified DevOps Engineer Professional Soft Skills: Strong analytical and troubleshooting abilities Excellent communication and team collaboration Proactive and self-driven with the ability to work independently If you're passionate about solving complex technical challenges and shaping the future of cloud infrastructure, Aptean is the place for you. Our culture values diversity, inclusion, and collaboration where every voice matters and innovation thrives. Diversity & Inclusion at Aptean Aptean is committed to fostering a diverse, inclusive workplace. We celebrate differences in race, gender identity, sexual orientation, religion, disability, age, and background believing that diverse teams drive innovation and better results for our customers. Qualification : Bachelors degree in Computer Science, Information Technology, or a related field.
Senior Software Engineer, Customer Solutions
Commure
Job Title: Senior Software Engineer Customer Solutions Location: Bengaluru, India Employment Type: Full-time Department: Engineering About Commure Commure is revolutionizing healthcare with AI-powered technologies designed to eliminate administrative overhead and give clinicians more time with patients. Our platform combines advanced LLM AI, RTLS, and workflow automation to streamline clinical operations, improve patient engagement, and enhance care delivery. We support 250,000+ clinicians across hundreds of care sites nationwide and we re just getting started. If you're passionate about building life-changing solutions in one of the world s most vital industries, now is the time to join. About the Role As a Senior Software Engineer on the Customer Solutions team, you ll be instrumental in building and customizing applications on top of our Patient Experience Platform to address client-specific needs. Your work will directly impact how healthcare providers interact with our technology and serve patients better. Key Responsibilities Translate business and client requirements into scalable, maintainable technical solutions. Design, develop, and integrate customized applications and services using our core platform. Collaborate with internal teams and customers to prioritize features and maintain a customer-focused development backlog. Build long-term client relationships through technical leadership and delivery excellence. Implement and maintain observability through logging, monitoring, and alerting systems. Apply SRE and DevOps practices to improve stability and incident response. Coordinate testing and quality assurance activities in collaboration with QA teams. Stay informed on healthcare tech trends and integrate innovations into the platform. Participate in client-facing meetings to advise on feasibility, risks, and technical trade-offs. Mentor junior engineers and contribute to a strong engineering culture. Required Qualifications Bachelor's or Master s degree in Computer Science, Engineering, or a related field. 3+ years of professional software development experience. Frontend: React, Next.js, TypeScript Backend: Python, Node.js Cloud: Proficiency in AWS, Azure, or GCP with experience in cloud-native architectures CI/CD: Familiarity with tools like GitHub Actions, Google Cloud Build, etc. Infrastructure: Experience with Docker, Kubernetes, and IaC principles Monitoring & Observability: Implemented logging, tracing, and alerting systems Production Support: Experience with on-call rotations and incident response Strong communication and collaboration skills with cross-functional teams Experience working directly with clients to deliver technical solutions Understanding of APIs, webhooks, and third-party system integrations in healthcare Preferred Skills Familiarity with HIPAA, FHIR, HL7, and other healthcare standards Understanding of data privacy, compliance, and security best practices Strong problem-solving abilities and adaptability in dynamic environments Experience in client support, customization, or professional services engineering is a plus Why You ll Love Working at Commure + Athelas Mission-Driven Work Help transform healthcare through meaningful technology. Elite Backing Backed by General Catalyst, Sequoia, Y Combinator, and more. Explosive Growth 500%+ YoY growth pre-merger and Series D funded. Competitive Benefits Flexible PTO, health insurance, parental leave, and more (location-specific). Be part of the future of healthcare. Join Commure and help build intelligent, scalable systems that truly matter. Qualification : Bachelor's or Masters degree in Computer Science, Engineering, or a related field.
Systems Development Engineer, Google Cloud
Google Careers
Systems Development Engineer Google Cloud Location: Bengaluru, Karnataka, India Company: Google Minimum Qualifications Bachelor s degree in Computer Science, Information Technology, or a related field; or equivalent practical experience. 2+ years of experience with systems automation. 2+ years of experience in technical infrastructure (e.g., deployment, maintenance, troubleshooting). Preferred Qualifications 3+ years of experience in systems design and implementation. About the Role As a Systems Development Engineer (SDE) at Google Cloud, you will be part of a team responsible for managing and scaling critical services and infrastructure. This role emphasizes automation, reliability, and observability, using engineering practices to eliminate manual toil and improve system efficiency. Google SDEs design and build the tools and systems that power the infrastructure for Google s services, transforming telemetry into actionable insights and proactively solving operational challenges. You ll have the opportunity to work on impactful, large-scale projects in an environment that fosters learning, collaboration, and growth. Key Responsibilities Participate in on-call rotations and incident response, managing services within your domain. Troubleshoot infrastructure and system issues, evaluate diagnostic data, and recommend solutions. Resolve tickets and bugs within defined service-level objectives (SLOs). Collaborate with primary responders to maintain high availability and reliability of systems. Contribute to the design and implementation of systems and services in related domains. Work directly with customers to gather requirements, define distributed system needs, and propose solutions. Develop automation tools and systems to improve efficiency and reduce operational overhead. About Google Cloud Google Cloud helps organizations transform their business with advanced technologies and enterprise-grade solutions. With a focus on sustainability, innovation, and scalability, Google Cloud serves customers in over 200 countries and territories, providing the tools and infrastructure necessary to solve the world s most complex business challenges. Qualification : Bachelor's degree in Computer Science or IT-related field, or equivalent practical experience.
Staff Engineer - Core Infrastructure
Eightfold
Staff Engineer - Core Infrastructure Location: Bangalore, Karnataka, India Employment Type: Full-Time | Hybrid Work Model About Eightfold.ai At Eightfold.ai, we re transforming the future of work by leveraging artificial intelligence to connect individuals with career opportunities based on their skills and potential, not just their network. Our Talent Intelligence Platform powers a more diverse, inclusive workforce by helping organizations plan, hire, develop, and retain top talent. With $410M+ in funding and a $2B+ valuation, we are revolutionizing how the world thinks about skills, potential, and careers. If you re passionate about cutting-edge technology, infrastructure, and creating scalable solutions that impact the world, we want you to join us. The Opportunity We re looking for a Staff Engineer to join our Core Infrastructure Team and help scale the backbone of Eightfold s platform. This high-impact role will involve designing, building, and optimizing foundational systems that power everything from search and machine learning infrastructure to developer platforms and observability tools. You will drive system design across our stack and mentor engineering teams to build scalable, resilient systems that enable Eightfold to grow and deliver AI-powered solutions for our customers. What You ll Own & Drive Architect & Scale Core Systems: Design and build scalable infrastructure systems that support Eightfold s AI-driven products, including search, compute, storage, and machine learning infrastructure. Cross-Functional Leadership: Lead cross-team technical initiatives, collaborating with Product, Security, Data, and Platform teams to align with company-wide goals. Hands-On Development: Contribute directly to system design, code reviews, and incident response, ensuring best practices are followed. Mentorship & Leadership: Guide and mentor engineers to help them grow into future leaders, fostering a culture of technical excellence across teams. Advocate for Engineering Excellence: Champion best practices across areas such as cloud architecture, CI/CD, security, and observability. Solve Complex Infrastructure Challenges: Tackle problems around reliability, scalability, and infrastructure performance, ensuring the systems are robust and perform well at scale. Bring Emerging Tech to Life: Stay on top of the latest trends and technologies, incorporating new scalable design patterns into our architecture. What You Bring 10+ years of experience in backend or infrastructure engineering, with a strong background in building distributed, cloud-native systems. Proven track record in designing and delivering reliable, high-scale services (ideally in AWS, GCP, or Azure environments). Expertise in Infrastructure Technologies: Deep knowledge of containerization, orchestration (Kubernetes), and infrastructure-as-code. Experience with one or more of the following: search infrastructure, ML/AI infrastructure, databases/data warehouses, developer tooling, or platform security. Leadership Experience: A passion for mentoring and guiding engineers, influencing teams and peers, and driving excellence across projects. Strong communication skills, able to translate complex technical challenges into strategic business impact. (Bonus) Experience with SRE principles, cloud security, and compliance for enterprise/government environments. Our Engineering Culture At Eightfold, we believe in ownership over tasks. You won t just be given directions; you ll be trusted to take responsibility and make a measurable impact. We have a growth mindset and continuously improve in all aspects of our work. Collaboration, transparency, and speed are core to everything we do. You ll work in a dynamic, supportive environment where your work directly influences the success of the company and its mission. Meaningful Work: Help shape the future of work by building products that impact careers and businesses globally. Growth Opportunities: Be part of a rapidly scaling company where your contributions are highly valued. Competitive Compensation: Attractive salary, equity, and comprehensive benefits package (including medical, vision, and dental coverage). Hybrid Work Model: Work from our Bangalore office twice a week, with flexibility for remote work. Inclusive Culture: We are committed to fostering a diverse and inclusive work environment where everyone feels valued. Equal Opportunity Employer Eightfold.ai is an Equal Opportunity Employer. We do not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, or disability. If you re a hands-on, innovative engineer with a passion for building scalable systems and tackling infrastructure challenges, we want to hear from you.
Devops Engineer
Sarvam
DevOps Engineer Location: Bengaluru, Karnataka, India (On-Site) Department: Engineering Employment Type: Full-Time About Sarvam.ai Sarvam.ai is a cutting-edge generative AI startup headquartered in Bengaluru, India, with a mission to make generative AI accessible and impactful for Bharat. Founded by AI experts, we are dedicated to developing high-performance, cost-effective AI agents tailored for the Indian market. We enable enterprises to tap into new opportunities, build deeper customer connections, and reshape the future of AI for India and beyond. Role Overview We are looking for a DevOps Engineer to join our team and help build and manage scalable, secure, and high-performance infrastructure. In this role, you will be a key contributor to automating deployments, managing cloud infrastructure, optimizing CI/CD workflows, and ensuring system reliability. You will work with cutting-edge technologies, including cloud platforms, containerization, and infrastructure as code (IaC), to deliver impactful solutions for AI-driven products. Key Responsibilities CI/CD Pipelines: Design, implement, and manage CI/CD pipelines for seamless software deployment and integration. Cloud Infrastructure: Deploy and manage cloud infrastructure using Terraform, Kubernetes, and Docker for scalability and high performance. Automation & Scaling: Automate infrastructure provisioning, scaling, and security compliance to support high-availability environments. Monitoring & Optimization: Implement logging, monitoring, and alerting solutions using tools like Prometheus, Grafana, ELK Stack, or CloudWatch to monitor system performance and optimize resource utilization. Security & Compliance: Enhance security and compliance by managing IAM policies, encryption, and vulnerability scanning. Troubleshooting & Root Cause Analysis: Troubleshoot system failures, perform root cause analysis, and implement improvements to ensure reliability and uptime. Collaboration: Work closely with development teams to ensure smooth deployment and operation of AI models and applications. Must-Have Skills & Qualifications Educational Background: Bachelor s degree in Computer Science, Engineering, or related field (2024/2025 graduates). Cloud Expertise: Strong experience with AWS, Azure, or GCP for deploying and managing cloud-based applications. Containerization: Proficiency in Docker and Kubernetes for building and managing containerized applications. Infrastructure as Code (IaC): Experience with Terraform, Ansible, or CloudFormation to automate infrastructure management. CI/CD Pipelines: Experience in setting up automated workflows using tools like GitHub Actions, Jenkins, or GitLab CI/CD for smooth deployments. Monitoring & Logging: Experience with Prometheus, Grafana, ELK, or similar tools to implement effective monitoring and logging solutions. Networking & Security: Strong understanding of firewalls, VPNs, SSL, and cloud security best practices for secure infrastructure. Version Control: Proficiency with Git for managing code repositories and version control workflows. Problem Solving: Strong debugging, troubleshooting, and analytical skills to resolve complex system issues. Good to Have (Preferred Experience) Serverless Computing: Exposure to serverless computing models such as AWS Lambda or Azure Functions. Message Queues: Experience with message queues like Kafka, RabbitMQ, or SQS. Site Reliability Engineering (SRE): Familiarity with SRE practices to ensure the reliability and availability of large-scale systems. Open Source Contributions: Contributions to open-source projects or a strong GitHub portfolio showcasing DevOps expertise and best practices. Impactful Work: Work on AI-driven products that are reshaping the future of technology in India. Innovative Team: Collaborate with a team of AI experts and engineers pushing the boundaries of technology. Career Growth: Opportunity to grow in a fast-growing startup at the forefront of the generative AI revolution. Cutting-edge Technologies: Work with cloud technologies, automation, and AI infrastructure to create high-impact products. Qualification : Bachelors degree in Computer Science, Engineering, or related field
Engineering Manager- Platform Engineering
Meesho
Engineering Manager Platform Engineering Location: Bangalore, Karnataka | Department: Tech About the Team At Meesho, we support 5% of Indian households with high-scale e-commerce solutions and we do it with zero downtime. We value speed over perfection, embrace failures as learning opportunities, and empower teams with a Founder s Mindset. As part of the Platform Engineering team, you ll be building resilient, low-latency, high-throughput systems that serve millions of users daily. We invest in the growth of every engineer through continuous feedback, open communication, and a supportive culture. And yes we know how to party as hard as we code. About the Role We are looking for a skilled Engineering Manager Platform Engineering to lead a team responsible for designing, scaling, and optimizing our core infrastructure. This role involves managing large-scale distributed systems, fostering engineering excellence, and collaborating across teams to drive innovation. You ll ensure technical quality, delivery speed, and scalable architecture for all projects under your ownership. What You Will Do Design and allocate technical tasks while maintaining Meesho s engineering standards. Own execution of platform projects from inception to deployment, ensuring scalability and reliability. Conduct regular 1:1s, drive feedback cycles, and support career growth of engineers. Partner closely with Product and Design teams to develop new platform capabilities. Coach engineers on best practices for architecture, performance, and scalability. Monitor project health, sprint progress, and engineering KPIs. Foster a high-performing team culture with strong engineering ownership. What You Will Need Bachelor s or Master s degree in Computer Science or a related technical field. 8+ years of professional software development experience, including 1+ year in team management. Proven experience building large-scale distributed systems. Strong coding skills in Java, Python, or Go, and multithreading expertise. Deep understanding of messaging systems (Kafka, etc.), transactional and NoSQL databases. Experience working on cloud platforms like GCP or AWS. Exceptional communication, leadership, and stakeholder management skills. Good to have: Exposure to Elasticsearch, data pipelines, or stream processing systems. About Us Meesho is India s leading e-commerce platform built for the next billion users. With 1.75M+ sellers and a customer base spread across every serviceable pin code, we are democratizing internet commerce by enabling small businesses to sell online at zero commission and with the lowest logistics costs in the industry. From affordable products that reflect local demand to a robust pan-India tech infrastructure, Meesho is transforming how India shops and sells online. Our Culture & Total Rewards At Meesho, we believe in creating a culture of impact, inclusion, and innovation. Our values reflected in 11 guiding principles or "Mantras" shape how we work, collaborate, and grow together. Why You ll Love Working Here: Compensation: Competitive salary with equity-based rewards tailored to your experience and impact. Wellness: Extensive health insurance for you and your family through our MeeCare Program, mental wellness support, gym discounts, and more. Flexibility & Leave: Generous time off, parental benefits, and relocation support. Growth & Learning: Continuous learning through workshops, internal mobility, and performance coaching. Culture of Recognition: Personalized gifts, fun rituals, and regular engagement programs celebrating wins big and small. Join us to build the platform powering the future of digital commerce in India. Apply now and be part of a tech-first, people-driven journey at Meesho. Qualification : Bachelors or Masters degree in Computer Science or a related technical field.
Site Reliability Developer 2/3
Oracle
Job Description: Site Reliability Engineer - OCI Cloud Engineering Team Role: Site Reliability Engineer (SRE) Team: OCI OLTP (Online Transaction Processing) Location: Kiev Career Level: IC2 Experience: 5+ years Overview: Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you! As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure. This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement. Key Responsibilities: Cloud Service Operations & Reliability: Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment. Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services. Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions. Automation & Improvements: Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime. Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services. Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations. Security & Incident Response: Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards. Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution. Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance. Collaborative Problem-Solving: Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services. Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders. Documentation & Knowledge Sharing: Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals. Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations. Continuous Learning: Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work. Technical and Professional Requirements: Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments. Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations. Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes). Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems. Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations. Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction. Preferred Qualifications: Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability. Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools. Experience in networking and storage technologies in a cloud environment. Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services. If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!
Senior DevOps / Site Reliability Engineer
Blue Yonder
Job Title: Senior DevOps / Site Reliability Engineer Location: Pune, India Company: Blue Yonder Experience: 10 to 13 years Education: Bachelor s Degree in Computer Science, Engineering, or related STEM fields Company Overview Blue Yonder is a leading AI-driven Global Supply Chain Solutions provider and consistently recognized as one of Glassdoor s Best Places to Work. We are driving the next wave of digital transformation in manufacturing and retail, delivering innovative SaaS solutions that power intelligent supply chains across the globe. We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to lead the design, development, deployment, and operational management of our Azure SaaS solution. This role requires strong DevOps, cloud delivery, and infrastructure automation expertise, along with leadership capabilities to guide a growing global team. Role Overview In this role, you will be responsible for architecting, planning, and executing end-to-end delivery pipelines, supporting both product development and operational stability. Working closely with platform, product, and architecture teams, you will implement best-in-class DevOps and SRE practices, ensuring scalability, resilience, and cost optimization. Key Responsibilities Architect, design, and manage CI/CD pipelines and infrastructure for a cloud-native, multi-tenant SaaS solution on Azure. Lead sprint planning, backlog grooming, and architecture discussions. Develop quality automation scripts and tools to reduce manual efforts and enable self-healing, self-service capabilities. Identify and resolve operational bottlenecks and proactively improve observability (monitoring, alerting, logging). Participate in code reviews, ensure secure and scalable designs, and mentor junior and mid-level engineers. Collaborate with stakeholders to understand business and technical requirements and translate them into actionable user stories. Implement and enforce cloud cost optimization strategies. Conduct post-incident reviews with a blameless culture to identify root causes and drive continuous improvements. Automate service requests and standard operational procedures. Drive improvements to the team s continuous integration pipeline, ensuring rapid and reliable deployments. Stay updated with the latest DevOps, SRE, and cloud technologies and bring innovative ideas to the table. Participate in team hiring and actively contribute to onboarding new team members. Technical Environment Languages: Java, Python, PowerShell, Shell Scripting DevOps Tools: Azure DevOps, GitHub Actions, Jenkins Cloud: Microsoft Azure (ARM Templates, AKS, Event Hub, HDInsight, Azure AD, Application Gateway, Virtual Networks) Architecture: Microservices, Kubernetes, Docker, Event-driven architecture Frameworks: Spring Boot, Hibernate Monitoring & Logging: Elasticsearch, Spark, Kafka Databases: RDBMS, NoSQL Version Control: Git Required Skills & Experience Bachelor s Degree (STEM preferred) with 10 to 13 years of experience in DevOps, Cloud Delivery, or Site Reliability Engineering. Proven hands-on experience with Azure Cloud Services. Expertise in setting up and optimizing CI/CD pipelines. Strong scripting experience: Shell and PowerShell are mandatory; Python is a plus. Strong understanding of container technologies (Docker, Kubernetes) and microservices architecture. Experience integrating and managing third-party monitoring and logging tools. Strong problem-solving skills and ability to work with global, cross-functional teams. Excellent communication and stakeholder management skills. Nice to Have Development experience in Java or Python. Experience working in agile teams with a product-centric mindset. Experience working in manufacturing or retail domains. Exposure to AI/ML-driven monitoring and observability tools. Work with cutting-edge technologies on globally impactful solutions. Collaborate with diverse and talented teams across the US, India, and the UK. Foster your career growth through mentorship, continuous learning, and leadership opportunities. Experience an inclusive, flexible work culture where innovation and creativity thrive. Diversity, Inclusion, Value & Equality (DIVE) At Blue Yonder, we are committed to building an inclusive environment where everyone feels empowered to be themselves. All qualified applicants will receive consideration for employment regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. Qualification : Bachelors Degree in Computer Science, Engineering, or related STEM fields
Senior Site Reliability Engineer
Couchbase
Job Title: Site Reliability Engineer (SRE) Cloud Platform & Production Pipeline Initiatives Location: Bangalore, India (Office-based role) About Couchbase: As industries race to embrace AI, traditional database solutions fall short of rising demands for versatility, performance, and affordability. Couchbase is leading the way with Capella, the developer data platform for critical applications in our AI-driven world. By uniting transactional, analytical, mobile, and AI workloads into a seamless, fully managed solution, Couchbase empowers developers and enterprises to build and scale applications with unmatched flexibility, performance, and cost-efficiency from cloud to edge. Trusted by over 30% of the Fortune 100, Couchbase is unlocking innovation, accelerating AI transformation, and redefining customer experiences. Come join our mission! Job Overview: As a Site Reliability Engineer (SRE), you will play a pivotal role in managing, optimizing, and maintaining Couchbase s cloud infrastructure for Capella, our Database as a Service (DBaaS) platform. You will be responsible for ensuring the reliability and performance of our cloud service while collaborating closely with engineering teams to improve deployment pipelines, security practices, and overall system health. You will work across cloud platforms and multiple tools to provide guidance, mentorship, and contribute to the strategic direction of cloud operations. Responsibilities: Infrastructure Management: Manage, monitor, and maintain the infrastructure for Capella to ensure reliable operations. Security & Compliance: Implement and manage cloud environments in accordance with company security guidelines, including vulnerability management, penetration testing, and compliance requirements (SOC 2, PCI-DSS, GDPR, HIPAA, etc.). CI/CD & Release Pipeline: Collaborate with engineering teams to optimize CI/CD processes, aiming for a highly resilient deployment strategy, ideally with zero downtime. Cloud Optimization: Stay up-to-date with new technologies and industry trends to continuously improve cloud platform architecture and meet the evolving needs of the business. Security Integration: Work with development teams to integrate security scanners within the DevOps lifecycle, enhancing security posture. Leadership & Mentorship: Provide guidance on architecture, code reviews, and technical feedback to improve service reliability, security, cost, and performance. Incident Management: Demonstrate exceptional problem-solving skills, proactively identifying and addressing potential issues before they affect business operations. Collaboration: Partner with development teams, application owners, and stakeholders to integrate best practices and ensure seamless service delivery. Requirements: Experience: 5+ years in Site Reliability Engineering (SRE), DevSecOps, or similar roles, with significant experience working in public cloud environments. Programming & Scripting: Proficiency in languages such as Go, Python, Java, or Ruby. Linux Expertise: High proficiency with Linux operating systems. Kubernetes Management: Experience in managing and maintaining Kubernetes clusters (both self-managed and managed platforms like AWS EKS). Security & Vulnerability Management: In-depth knowledge of security tools and practices (vulnerability management, pen testing, SCA, DAST, SAST), with hands-on experience using tools like Sysdig, Synk, and Blackduck. Cloud Platforms & Tools: Strong experience with cloud platforms (AWS, GCP, Azure) and open-source tools like Artifactory, Jira, Jenkins, Grafana, Prometheus, Datadog, Thanos, etc. Configuration Management: Proficiency with Terraform, Git, and CI/CD platforms (e.g., CircleCI, GitHub, Spinnaker). Networking Security: Solid understanding of TCP/IP, DNS, HTTP, Firewalls, VPNs, and other networking security concepts. Preferred Skills: Availability & Reliability: Knowledge of SLO/SLA, availability, reliability, and performance concepts. Incident Management: Experience with on-call rotations and incident management. Database Experience: Familiarity with databases, particularly Couchbase. Security Certifications: Relevant certifications in security or cloud technologies are a plus. Couchbase reimagines database technology to deliver a fast, flexible, and affordable cloud database platform, empowering developers to build applications with exceptional customer experiences. Trusted by over 30% of the Fortune 100, Couchbase drives innovation and customer success through its Capella platform. Benefits at Couchbase: Generous Time Off Program: Flexibility to care for yourself and your family. Wellness Benefits: Access to world-class medical plans, dental, vision, life insurance, and employee assistance programs. Financial Planning: RSU equity program, ESPP, retirement planning, and business travel insurance. Career Growth: Focused on your career development and success. Fun Perks: Ergonomic and comfortable office setup, food & snacks for in-office employees, and more!
Devops
Mirafra Technologies
DevOps Engineer Location: Bangalore Experience: 5+ Years Education Qualification: B.E. in Computer Science / Electronics About Mirafra Founded in 2004, Mirafra is a fast-growing global product engineering services company specializing in Semiconductor Design, Embedded Systems, Digital Solutions, and Application Software. With over 1,500+ professionals worldwide, we provide cutting-edge solutions to Fortune 500 clients across industries such as Semiconductor, Internet, Aerospace, Networking, Telecom, Medical Devices, and Consumer Electronics. Recognitions: Best Company to Work For SiliconIndia (2016) Most Promising Design Services Provider SiliconIndia (2018) Top 10 Admired Companies for Software Services DigiTech Insight (2022) Key Responsibilities DevOps & Automation Develop automated CI/CD pipelines and manage build & deployment processes. Implement infrastructure automation using scripting (Shell, Batch, Python). Manage configuration, integration, and deployment using DevOps tools. Version Control & Build Management Work with Git, Gitlab, Bitbucket for version control. Maintain build systems like Make, CMake and manage dependencies using Pip, Conda, Poetry, Maven. Handle binary management tools like Artifactory, Nexus. Code Quality & Security Utilize Static Code Analysis tools (SonarQube, Pylint, Coverity) for code quality enforcement. Monitor and ensure security compliance in the DevOps lifecycle. Cloud & Containerization Manage cloud-based deployments and monitoring using ELK, Docker, Kubernetes. Implement scalable and resilient infrastructure solutions. Agile & Collaboration Work in an Agile/Scrum environment, collaborating with cross-functional teams. Utilize UML modeling and software development best practices. Skills & Qualifications Education: B.E. in Computer Science / Electronics Technical Expertise: Scripting & Automation: Shell, Batch, Python CI/CD & Build Tools: Jenkins, Gitlab, Make, CMake Version Control: Git, Bitbucket, Gitlab SCM Static Code Analysis: SonarQube, Pylint, Coverity Package Management: Pip, Conda, Poetry, Maven Binary Management: Artifactory, Nexus Cloud & Containerization: Docker, Kubernetes, ELK Stack Programming Languages: Python, C, C++ Operating Systems: Linux, Unix, Windows Soft Skills: Strong problem-solving and analytical skills. Excellent communication and team collaboration. Ability to work in fast-paced Agile environments. Cutting-edge projects in Semiconductor, Aerospace, Networking, and IoT. Global work environment with top-tier clients. Career growth opportunities and exposure to the latest technologies. Award-winning workplace culture and industry recognition. Excited to take on a challenging DevOps role? Apply now!
Software Engineer Iii, Infrastructure, Core
Google Careers
Job Title: Software Engineer About the Role: At Google, our Software Engineers are at the forefront of innovation, designing and developing cutting-edge technologies that shape how billions of users connect, explore, and interact with information. Our products operate at an immense scale, extending far beyond web search, and require engineers who bring fresh perspectives from diverse technical domains, including information retrieval, distributed computing, large-scale system design, networking, security, artificial intelligence, natural language processing, UI design, and mobile development. As a Software Engineer, you will contribute to mission-critical projects, collaborating with teams across Google to develop, test, deploy, maintain, and enhance software solutions. Your versatility, leadership abilities, and enthusiasm for solving complex challenges will be crucial as you navigate projects across the full technology stack. The Core Team serves as the backbone of Google s technical infrastructure, building the foundational elements behind our flagship products. This team is responsible for developing essential developer platforms, product components, and infrastructure that drive innovation across Google s ecosystem. As a member of this team, you will play a pivotal role in breaking down technical barriers, optimizing existing systems, and making key architectural decisions that influence the entire organization. Key Responsibilities: Design, develop, and maintain high-quality software solutions that support Google's technical infrastructure and products. Participate in and lead design reviews with peers and stakeholders, evaluating available technologies to determine optimal solutions. Conduct thorough code reviews to ensure adherence to best practices, including code quality, efficiency, accuracy, testability, and compliance with style guidelines. Contribute to documentation and educational resources, updating content based on product enhancements and user feedback. Troubleshoot and debug complex system issues, analyzing their impact on hardware, networks, and service operations to maintain optimal performance and reliability. At Google, we foster a culture of continuous learning, innovation, and technical excellence. If you're passionate about solving challenging problems and building world-class technology, we invite you to be part of our journey. Qualification : Bachelors degree or equivalent practical experience.
Technical Lead
Cisco Technology Inc
Meet the Team As a Technical Lead, you will drive technical excellence across HPC infrastructure, network automation, DevOps practices, and SRE principles while leading architecture decisions and guiding teams in implementing high-performance solutions for AI/ML workloads on various network topologies. This role combines deep technical expertise with leadership responsibilities, focusing on system architecture, automation, reliability engineering, and development excellence. Your Impact Design and implement end-to-end automation solutions for HPC infrastructure (Compute, network, and storage) using Kubernetes operators, Terraform, and Ansible. Analyze compute, storage, and network traffic patterns during distributed training/inference operations across different AI/ML frameworks. Monitor and optimize network utilization patterns for various model architectures. Identify bottlenecks in network communication patterns. Perform root cause analysis across network, compute, and storage layers, with experience handling various failure scenarios and recovery procedures. Make architectural decisions and drive innovation. Develop infrastructure patterns for different workload types. Provide benchmarking and performance engineering leadership. Mentor junior engineers through architecture reviews and code critiques. Design and implement comprehensive telemetry collection systems for monitoring high-speed network microburst behavior. Develop sophisticated visualization tools and analytics frameworks to enable real-time identification of performance bottlenecks and system constraints, facilitating rapid optimization and troubleshooting. Minimum Qualifications Demonstrated expertise in distributed systems and infrastructure design (compute, storage, and networking). Experience with network automation tools and configuration management (Ansible, Python, Golang, YAML, YANG). Strong background in CI/CD, GitOps, or similar practices and tools. Expert-level experience with observability platforms and practices. Strong background in implementing distributed tracing, metrics collection, and log aggregation systems. Demonstrated experience in at least one completed performance benchmarking project for distributed systems, storage, network, and compute. Preferred Qualifications Bachelor s degree in Computer Science, Software Engineering, or a related technical field with 15-20 years of extensive hands-on experience in distributed systems development and DevOps practices. An advanced degree is a plus. Contributions to open-source projects related to distributed systems or performance engineering. Experience in analyzing and documenting system performance metrics across network, compute, and storage layers. Prior experience mentoring and developing technical talent. Understanding of AI/ML infrastructure and any prior experience with RDMA, RoCE v2 will be an added advantage. #WeAreCisco #WeAreCisco where every individual brings their unique skills and perspectives together to pursue our purpose of powering an inclusive future for all. Qualification : Bachelors degree in Computer Science, Software Engineering, or a related technical field with 15-20 years of extensive hands-on experience in distributed systems development and DevOps practices. An advanced degree is a plus.
Autoit Solutioning Engineer, Lead
Qualcomm
Job Title: Site Reliability Engineer (SRE) General Summary: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. This role is critical in ensuring the stability, scalability, and security of our infrastructure and services. As an SRE, you will work collaboratively with software engineers, data scientists, and product managers to optimize system reliability while driving automation and continuous improvement. You will be responsible for modernizing traditional services, implementing cutting-edge technology, and proactively managing infrastructure to maintain operational excellence. If you are passionate about automation, DevSecOps, system performance, and infrastructure resilience, this role offers an exciting opportunity to make a meaningful impact. Key Responsibilities: System Monitoring & Incident Response: Continuously monitor system health, detect anomalies, and respond to incidents promptly. Investigate and troubleshoot service-related issues, ensuring minimal disruption. Implement proactive measures to prevent downtime and optimize system stability. Infrastructure Automation & DevOps Implementation: Develop and maintain Infrastructure-as-Code (IaC) scripts to automate deployments and scaling. Automate routine operational tasks to improve efficiency and reduce manual intervention. Leverage DevSecOps practices to ensure secure and resilient deployments. Performance Optimization & Capacity Planning: Collaborate with development teams to enhance software performance and system responsiveness. Identify and resolve system bottlenecks to improve speed, efficiency, and reliability. Forecast resource requirements based on traffic patterns and business growth. Security, Compliance & Risk Management: Implement security best practices and compliance measures across all infrastructure layers. Conduct security audits and ensure systems meet industry-standard security guidelines. Proactively assess and mitigate risks associated with infrastructure and deployments. Required Qualifications & Skills: Technical Expertise: Extensive experience with Linux-based environments (Ubuntu, RedHat), including system administration and troubleshooting. Strong proficiency in scripting and automation using Python, Bash, or Go. Experience with containerization and orchestration technologies such as Docker and Kubernetes. Familiarity with CI/CD pipelines and tools like Jenkins, Puppet, Vault, and Splunk. Hands-on experience with cloud platforms (AWS, Azure, or GCP). Problem-Solving & Leadership: Strong analytical skills with the ability to diagnose and resolve complex system issues. Self-driven, highly motivated, and able to work independently in a fast-paced environment. Ability to collaborate cross-functionally and communicate technical solutions effectively. Security & Reliability Focus: Solid understanding of DevSecOps principles and secure system design. Ability to implement monitoring, logging, and alerting solutions to maintain system resilience. Passion for continuous learning and leveraging data-driven approaches for system improvement. Work in a high-impact role that directly contributes to the reliability and scalability of mission-critical systems. Be part of an innovative, forward-thinking team that values automation, collaboration, and continuous improvement. Competitive salary, professional development opportunities, and an environment that fosters growth and innovation. If you are a passionate, results-driven SRE, we invite you to join us and play a pivotal role in shaping the future of our infrastructure.
Sr. Noc Engineer
Databricks
We re growing fast and attracting the best talent in the world. Bricksters as we call ourselves are a special mix of smart, curious, quick thinkers. If you ask a Brickster what they love about working here, you ll likely hear about our culture. We are seeking an experienced NOC Engineer to join our team. The successful candidate will be responsible for monitoring critical Databricks infrastructure and developing monitoring tools and alerting dashboards. They will also work closely with stakeholders to investigate and resolve incidents, perform root cause analysis, and propose solutions to increase the reliability and stability of the Databricks unified analytics platform. The impact you will have here: Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents. Investigate incidents and propose solutions to improve platform reliability and stability. Perform root cause analysis for recurring incidents and provide proactive solutions. Develop toolings or automate processes to improve platform monitoring and alerting. Contribute to software development efforts to improve overall service reliability and stability. Communicate effectively with internal stakeholders, including executive staff, to provide incident analysis. Participate in war rooms and temporary communication channels during outages. Demonstrate cross-functional leadership and establish ownership of incidents and outages. Multitask on several incidents and/or projects Minimum of 5 years of experience as a NOC, SRE, or DevOps engineer Strong knowledge of cloud technologies such as Azure, AWS, and GCP Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, Grafana, Pager Duty, etc. Experience with containers and orchestration technologies such as Docker and Kubernetes. Proficiency in automation and scripting Linux systems administration skills. Excellent communication skills. Willingness to learn Databricks products Bachelor's degree in Computer Science or a related field About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide including Comcast, Cond Nast, Grammarly, and over 50% of the Fortune 500 rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark , Delta Lake and MLflow. To learn more, follow Databricks on Twitter,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visithttps://www.mybenefitsnow.com/databricks. Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone. Qualification : Bachelor's degree in Computer Science or a related field is required.
1 - 20 of 0 jobs
* No exact matches found. Showing closest results insteadNo results found
Modify search criteria or create an alert to get relevant jobs as soon as they’re posted