Runbooks Jobs in Bengaluru

6 Jobs Found

BS

Software Principal Engineer - Sre

Boomi Software

7+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Position: Senior Site Reliability Engineer Join us as a Senior Site Reliability Engineer on our Reliability Team and do the best work of your career while making a profound social impact. In this role, you will design and build sophisticated systems and software that align with our customers business goals and environments. You will collaborate with product management, engineering teams, customer success, and support to deliver innovative features and enhancements across Boomi s product offerings. Key Responsibilities Incident Management & SLAs: Participate in detecting, remediating, and reporting production incidents, ensuring that SLAs and SLOs are well-defined and consistently met. On-Call Rotation: Provide on-call support for planned and unplanned events. Collaboration: Partner with engineering teams to implement improvements, standardize processes, and drive consistent results. Disaster Recovery: Lead DR exercises, game days, and readiness training with SRE and engineering counterparts. Observability & Tooling: Collaborate with service engineering teams to build and automate tooling, implement best practices in observability, and ensure the scalability and reliability of Boomi s production services. Infrastructure Automation: Automate provisioning and maintenance of Boomi s infrastructure using tools like Terraform and Ansible. Technical Mentorship: Guide and mentor other engineers through design collaboration and code reviews. What You ll Bring Essential Requirements Expertise in defining, measuring, and improving reliability metrics (SLOs, SLIs, error budgets). Strong experience in observability practices (monitoring, logging, distributed tracing), preferably using Splunk and New Relic, including the ability to create custom dashboards from scratch. Proficiency in infrastructure automation using Terraform, CloudFormation, and Ansible playbooks, with scripting experience in Python. Hands-on experience conducting and automating disaster recovery (DR) exercises in AWS, validating RPOs and RTOs. Deep understanding of AWS components and the ability to design and implement APIs for internal use. Desirable Requirements 7+ years of experience in the software engineering industry, with exposure to large-scale production systems. Cloud certification (AWS, Azure, GCP, Oracle), with experience in services such as compute, containers, and databases. Experience in containerization best practices, cloud-native concepts, and security awareness in the cloud. Working at Boomi means doing what you love, surrounded by trailblazers with an entrepreneurial spirit. Our culture fosters innovation, encourages collaboration, and celebrates the unique contributions of every individual. Take the first step toward your dream career at Boomi where ideas shape the future of technology.

Software Principal Engineer Software Engineer Engineer software
EX

Gen AI Support Engineer-2

Exotel

4-7 Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Gen AI Support Engineer-2 Location: Bengaluru Experience: 4 7+ years Employment Type: Full-time About Us Exotel is the leading full-stack customer engagement platform and virtual telecom operator for emerging markets. Since its inception in 2011, Exotel has been powering 50 million daily engagements across voice, video, and messaging channels. We provide our unified customer engagement solutions to over 6000 companies globally, including industry leaders like Ola, Swiggy, Flipkart, GoJek, Byjus, Urban Company, HDFC Bank, Zomato, and Oyo. With $100 million in Series D funding and an ARR of $60 million, Exotel is a growth-stage company poised for massive impact. Overview We're seeking a Gen AI Support Engineer-2 to join our team. As an L2 Support Engineer, you will be the highest level of technical escalation within the support organization. Your role will encompass system reliability, platform integrity, troubleshooting mission-critical production issues, and collaborating with engineering teams for architecture feedback. Additionally, you'll help mentor junior engineers and improve operational processes and tools for large-scale environments. If you're passionate about writing clean code with Python and Django and want to contribute to a fast-paced, mission-driven company, this role is for you! Responsibilities Mission-Critical Issue Resolution: Own the resolution of high-priority, time-sensitive production issues. Root Cause Analysis (RCA): Lead RCA reviews and push for systemic improvements in system architecture and processes. Performance Optimization: Identify bottlenecks and propose architectural changes to improve system performance and scalability. Patch Management: Assist in configuring, deploying, and testing patches, releases, and application updates to production environments. SME for Production Systems: Serve as the Subject Matter Expert (SME) for Exotel's production systems and integrations. Cross-Team Collaboration: Work with Delivery, Product, and Engineering teams to influence system design, rollout strategies, and improvement plans. Mentorship: Lead and mentor L1/L2 engineers on troubleshooting best practices and continuous learning. Code Writing & Automation: Write clean, maintainable code for internal tools, scripts, and automation using Python and Django. Support Tooling: Automate recovery workflows and design support tools for proactive monitoring. Operational Excellence: Establish and improve SLAs, monitoring dashboards, alerting systems, and operational runbooks to ensure system reliability. Must Have Skills Backend Development Support: 3+ years of experience in backend development support, production support, or DevOps/SRE roles. Core Technologies: Proficiency in Python, Django, SQL, and troubleshooting in Linux. Web Technologies: Strong understanding of HTML, CSS, JavaScript, and other web technologies. Distributed Systems & Cloud: Experience working with distributed systems, cloud architecture (AWS), Docker, and Kubernetes. Automation: Strong scripting skills with Bash/Python for automation and operational support. CI/CD & Observability: Good understanding of CI/CD, observability tools, and release management workflows. Communication Skills: Excellent communication, leadership, and incident command skills for managing production issues and cross-functional collaboration. Nice to Have Experience with AI-powered systems and machine learning technologies. Familiarity with monitoring systems like Prometheus, Grafana, or Elasticsearch. Knowledge of microservices architectures and scaling distributed systems. Innovative Work: Be at the forefront of cloud-based communications technology and AI-driven customer engagement platforms. Impact: Play a key role in maintaining and optimizing systems that power millions of customer interactions daily. Growth Opportunities: Be part of a fast-growing company with ample learning opportunities and career development. Collaborative Environment: Work in a supportive, inclusive environment where your input and ideas matter. Competitive Benefits: Comprehensive benefits package including health insurance, mental wellness support, and more.

Ai Gen Ai Support Engineer Ai engineer
IB

Technical Consultant-security Intel & Operations Consulting Svcs

International Business Machines

Fresher | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Technical Consultant - Security Intel & Operations Consulting Services Location: Bangalore, Karnataka, India Job Type: Full-Time Experience Level: Senior Introduction: At IBM Consulting, we believe that work is more than just a job it's a calling. In the role of Technical Consultant - Security Intel & Operations, you will be part of our Client Innovation Centers (Delivery Centers), where we deliver deep technical and industry expertise to both public and private sector clients across the globe. Our team helps clients to innovate, adopt new technologies, and improve their security posture. Your Role and Responsibilities: As a Senior SOC Analyst working within the 24/7 Cyber Fusion Center (CFC), your role will involve the proactive monitoring, triaging, analyzing, and escalating incidents in client environments. You will be tasked with utilizing various cyber operations tools and technologies to analyze data, detect security threats, and mitigate risks. Your expertise will contribute to maintaining the security integrity of client systems and ensuring efficient incident response. Key Responsibilities: Incident Monitoring & Analysis: Monitor and analyze security events using various cybersecurity tools like SIEM, IDS/IPS, Firewalls, network traffic logs, cloud platforms, and SOAR solutions to detect potential threats and mitigate risks. Perform event correlation using multiple data sources to understand the nature of security incidents and determine their impact on client environments. Threat Detection & Mitigation: Analyze alerts to identify active threats, perform root cause analysis, and apply appropriate mitigation techniques for both structured and unstructured environments. Evaluate security incidents across AWS and Azure environments, analyzing system, network, and email security events. Proactive Cybersecurity Measures: Conduct root cause analysis of security events and recommend actions to address vulnerabilities. Contribute to the development and constant improvement of SOC runbooks and playbooks to optimize security operations. Collaboration & Reporting: Work closely with cross-functional teams to escalate critical incidents and provide daily summary reports on activities relevant to cyber operations. Lead discussions on incident trends, perform cyber operations trend analysis, and report on findings to ensure continuous security enhancement. Continuous Improvement: Recommend improvements to automations, alert fidelity, and security controls to improve security efficacy and response time. Engage in team meetings, calls, and chats, contributing technical insights to enhance security strategies and tactics. Required Education and Experience: Education: Bachelor s Degree in Computer Science, Information Technology, Cybersecurity, or related fields. A Master s Degree is preferred but not required. Experience: Extensive experience working as a SOC Analyst or similar cybersecurity roles, especially in a 24/7 security operations center environment. Proficient in event analysis, log analysis, and network event management. Hands-on experience with cloud environments such as AWS and Azure, with a focus on cybersecurity threats and mitigations. Solid understanding of TCP/IP network security, modern attack techniques, exploitation methods, and operating system security. Preferred Technical and Professional Experience: Security Tools & Platforms: Experience with CyberArk, Azure SSO, and other enterprise security technologies. Knowledge of enterprise web technologies and cutting-edge security infrastructures. Familiarity with security automation tools and best practices for improving alert fidelity and security controls. Advanced Event & Threat Analysis: Proven ability to perform high-quality triage and in-depth analysis of security alerts. Experience in documenting incidents and escalating critical issues with appropriate cyber operations reports. Communication & Collaboration: Strong verbal and written communication skills, with the ability to convey complex security concepts to both technical and non-technical stakeholders. Ability to actively contribute to team discussions, runbook creation, and security playbook updates. Global Impact: Join a globally recognized team working at the forefront of cybersecurity, helping to shape the future of digital security. Career Development: IBM offers a strong focus on professional growth, offering learning opportunities, certifications, and exposure to the latest security technologies. Collaborative Culture: Be part of a collaborative and dynamic team, working together to tackle the most pressing security challenges faced by businesses around the world. If you are ready to contribute to the security and resilience of leading global organizations, we invite you to apply and be a part of our forward-thinking security team at IBM Consulting. Qualification : Bachelors Degree in Computer Science, Information Technology, Cybersecurity, or related fields.

Technical Consultant Technical consultant Security Technical security
OR

Site Reliability Developer 2/3

Oracle

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Job Description: Site Reliability Engineer - OCI Cloud Engineering Team Role: Site Reliability Engineer (SRE) Team: OCI OLTP (Online Transaction Processing) Location: Kiev Career Level: IC2 Experience: 5+ years Overview: Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you! As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure. This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement. Key Responsibilities: Cloud Service Operations & Reliability: Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment. Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services. Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions. Automation & Improvements: Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime. Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services. Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations. Security & Incident Response: Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards. Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution. Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance. Collaborative Problem-Solving: Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services. Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders. Documentation & Knowledge Sharing: Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals. Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations. Continuous Learning: Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work. Technical and Professional Requirements: Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments. Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations. Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes). Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems. Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations. Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction. Preferred Qualifications: Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability. Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools. Experience in networking and storage technologies in a cloud environment. Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services. If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!

Site Reliability Site reliability Developer Site developer
DA

Spark Backline Engineer

Databricks

5+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Mission As a Spark Backline Engineer you will help our customers to be successful with the Databricks Data Intelligence platform by resolving important technical customer escalations and the support team. You will be the technical bridge between support and engineering and the first line of defense for engineering. You will ensure that all issues are vetted by you before it reaches the engineering team. You will report to the Senior Backline Manager of the Backline Escalations Team. Outcomes Troubleshoot, resolve and suggest deep code-level analysis of Spark to address complex customer issues related to Spark core internals, Spark SQL, Structured Streaming and Databricks Delta. Provide best practices guidance around Spark runtime performance and usage of Spark core libraries and APIs for custom-built solutions developed by Databricks customers. Help the support team with detailed troubleshooting guides and runbooks. Contribute to automation and tooling programs to make daily troubleshooting efficient. Work with the Spark Engineering Team and spread awareness of upcoming features and releases. Identify Spark bugs and suggest possible workarounds. Demonstrate ownership and coordinate with engineering and escalation teams to achieve resolution of customer issues and requests Participate in weekend and weekday on call rotation. Competencies Minimum 5 years' experience developing, testing, and sustaining Python or Java or Scala-based applications. Comfortable with compiling, building and navigating the Apache Spark source code. Comfortable with identifying and applying patches/bug fixes to the Apache Spark source code. Experience in Big Data/Hadoop/Spark/Kafka/Elasticsearch data pipelines. Hands-on experience with SQL-based database systems. Experience in JVM, GC, Thread dump-based troubleshooting is required. Experience with AWS or Azure related services. Bachelor's degree in Computer Science or a related field is required. About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide including Comcast, Cond Nast, Grammarly, and over 50% of the Fortune 500 rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark , Delta Lake and MLflow. To learn more, follow Databricks on Twitter,LinkedIn and Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visithttps://www.mybenefitsnow.com/databricks. Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone. Qualification : Bachelor's degree in Computer Science or a related field is required.

Spark Engineer Spark engineer Full-Time Apache Spark
6S

Security Engineer Ii - Secops & Threat

6sense

4+ Years | Not Disclosed | Bengaluru, Karnataka, India | Full-time

Our Mission 6sense is revolutionizing how B2B organizations generate revenue by predicting customers most likely to buy and recommending the best ways to engage with anonymous buying teams. Through Revenue AI, we unlock the ability to create, manage, and convert high-quality pipelines into revenue, reshaping how businesses thrive. Our People At 6sense, people are at the core of our mission. Guided by our values Accountability, Growth Mindset, Integrity, Fun, and One Team we foster an environment where innovation and impact are celebrated. Every team member plays a key role in shaping our industry-leading technology, making 6sense a place for risk-takers and difference-makers who measure success by the value they deliver to customers. Purpose of the Role As part of the Security Operations and Threat Management team, you will help protect 6sense by proactively preventing, detecting, investigating, and responding to security threats and incidents that may impact the business. Key Responsibilities Incident Response & Monitoring: Monitor security alerts, conduct vulnerability assessments, and analyze logs to identify and respond to security incidents. Collaborate with cross-functional teams (Infrastructure, Engineering, IT, GRC, Cloud, and Application Security) to validate alerts and resolve incidents. Threat Landscape Analysis: Perform proactive reviews to assess and address potential security risks. Continuously tune detection rules in security solutions to adapt to evolving threats. Automation & Tool Administration: Manage security tools and develop basic automation for improved efficiency. Identify and implement opportunities for process automation to enhance security operations. Documentation & Playbooks: Create and maintain a security playbook for various threat scenarios. Keep documentation, runbooks, workflows, and dashboards up to date. Performance & Objectives: Align with quarterly Key Results that support team Objectives (OKRs). Participate in the Security Operations on-call rotation to ensure prompt responses. Performance Metrics Proficient understanding of the 6sense product and platform. Participation in regular 1:1s with managers and monthly skip-level meetings. Efficient identification and closure of incidents within established SLAs. Maintenance of accurate, up-to-date documentation and proactive engagement with SecOps technologies. Educational & Experience Requirements Experience: 4+ years in a Security Operations role or similar position. Hands-on experience with security tools and cloud environments (e.g., Vulnerability Scanners, SIEM, SOAR, AWS). Knowledge: Familiarity with industry frameworks, regulations, and standards, including MITRE ATT&CK, STRIDE, ISO 27001, GDPR, SOC 2, PCI, and NIST. Understanding of AI applications in cybersecurity (preferred). Qualifications: Bachelor's degree in a related field. Relevant certifications, such as CSA, GCDA, GSOC, or CySA, are advantageous. Benefits At 6sense, we offer: Comprehensive health coverage. Paid parental leave. Generous paid time off and holidays. Quarterly self-care days off to prioritize well-being. Stock options to share in the company s success. Support and equipment to work from home or one of our offices. Join us to make an impact in the evolving cybersecurity landscape, empowering organizations to grow revenue through innovation and resilience. Qualification : Bachelor's degree in a related field

Security Engineer Security engineer Ii Engineer ii

1 - 20 of 0 jobs

* No exact matches found. Showing closest results instead
Sort by:

No results found

Modify search criteria or create an alert to get relevant jobs as soon as they’re posted

Create an alert

Continue to Save

Please login to your jobseeker account, or create a new one to save this job.

Feedback

Share Feedback