Postmortems Jobs in Bengaluru
6 Jobs Found
Senior DevOps / Site Reliability Engineer
Blue Yonder
Job Title: Senior DevOps / Site Reliability Engineer Location: Pune, India Company: Blue Yonder Experience: 10 to 13 years Education: Bachelor s Degree in Computer Science, Engineering, or related STEM fields Company Overview Blue Yonder is a leading AI-driven Global Supply Chain Solutions provider and consistently recognized as one of Glassdoor s Best Places to Work. We are driving the next wave of digital transformation in manufacturing and retail, delivering innovative SaaS solutions that power intelligent supply chains across the globe. We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to lead the design, development, deployment, and operational management of our Azure SaaS solution. This role requires strong DevOps, cloud delivery, and infrastructure automation expertise, along with leadership capabilities to guide a growing global team. Role Overview In this role, you will be responsible for architecting, planning, and executing end-to-end delivery pipelines, supporting both product development and operational stability. Working closely with platform, product, and architecture teams, you will implement best-in-class DevOps and SRE practices, ensuring scalability, resilience, and cost optimization. Key Responsibilities Architect, design, and manage CI/CD pipelines and infrastructure for a cloud-native, multi-tenant SaaS solution on Azure. Lead sprint planning, backlog grooming, and architecture discussions. Develop quality automation scripts and tools to reduce manual efforts and enable self-healing, self-service capabilities. Identify and resolve operational bottlenecks and proactively improve observability (monitoring, alerting, logging). Participate in code reviews, ensure secure and scalable designs, and mentor junior and mid-level engineers. Collaborate with stakeholders to understand business and technical requirements and translate them into actionable user stories. Implement and enforce cloud cost optimization strategies. Conduct post-incident reviews with a blameless culture to identify root causes and drive continuous improvements. Automate service requests and standard operational procedures. Drive improvements to the team s continuous integration pipeline, ensuring rapid and reliable deployments. Stay updated with the latest DevOps, SRE, and cloud technologies and bring innovative ideas to the table. Participate in team hiring and actively contribute to onboarding new team members. Technical Environment Languages: Java, Python, PowerShell, Shell Scripting DevOps Tools: Azure DevOps, GitHub Actions, Jenkins Cloud: Microsoft Azure (ARM Templates, AKS, Event Hub, HDInsight, Azure AD, Application Gateway, Virtual Networks) Architecture: Microservices, Kubernetes, Docker, Event-driven architecture Frameworks: Spring Boot, Hibernate Monitoring & Logging: Elasticsearch, Spark, Kafka Databases: RDBMS, NoSQL Version Control: Git Required Skills & Experience Bachelor s Degree (STEM preferred) with 10 to 13 years of experience in DevOps, Cloud Delivery, or Site Reliability Engineering. Proven hands-on experience with Azure Cloud Services. Expertise in setting up and optimizing CI/CD pipelines. Strong scripting experience: Shell and PowerShell are mandatory; Python is a plus. Strong understanding of container technologies (Docker, Kubernetes) and microservices architecture. Experience integrating and managing third-party monitoring and logging tools. Strong problem-solving skills and ability to work with global, cross-functional teams. Excellent communication and stakeholder management skills. Nice to Have Development experience in Java or Python. Experience working in agile teams with a product-centric mindset. Experience working in manufacturing or retail domains. Exposure to AI/ML-driven monitoring and observability tools. Work with cutting-edge technologies on globally impactful solutions. Collaborate with diverse and talented teams across the US, India, and the UK. Foster your career growth through mentorship, continuous learning, and leadership opportunities. Experience an inclusive, flexible work culture where innovation and creativity thrive. Diversity, Inclusion, Value & Equality (DIVE) At Blue Yonder, we are committed to building an inclusive environment where everyone feels empowered to be themselves. All qualified applicants will receive consideration for employment regardless of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. Qualification : Bachelors Degree in Computer Science, Engineering, or related STEM fields
Software Principal Engineer - Sre
Boomi Software
Position: Senior Site Reliability Engineer Join us as a Senior Site Reliability Engineer on our Reliability Team and do the best work of your career while making a profound social impact. In this role, you will design and build sophisticated systems and software that align with our customers business goals and environments. You will collaborate with product management, engineering teams, customer success, and support to deliver innovative features and enhancements across Boomi s product offerings. Key Responsibilities Incident Management & SLAs: Participate in detecting, remediating, and reporting production incidents, ensuring that SLAs and SLOs are well-defined and consistently met. On-Call Rotation: Provide on-call support for planned and unplanned events. Collaboration: Partner with engineering teams to implement improvements, standardize processes, and drive consistent results. Disaster Recovery: Lead DR exercises, game days, and readiness training with SRE and engineering counterparts. Observability & Tooling: Collaborate with service engineering teams to build and automate tooling, implement best practices in observability, and ensure the scalability and reliability of Boomi s production services. Infrastructure Automation: Automate provisioning and maintenance of Boomi s infrastructure using tools like Terraform and Ansible. Technical Mentorship: Guide and mentor other engineers through design collaboration and code reviews. What You ll Bring Essential Requirements Expertise in defining, measuring, and improving reliability metrics (SLOs, SLIs, error budgets). Strong experience in observability practices (monitoring, logging, distributed tracing), preferably using Splunk and New Relic, including the ability to create custom dashboards from scratch. Proficiency in infrastructure automation using Terraform, CloudFormation, and Ansible playbooks, with scripting experience in Python. Hands-on experience conducting and automating disaster recovery (DR) exercises in AWS, validating RPOs and RTOs. Deep understanding of AWS components and the ability to design and implement APIs for internal use. Desirable Requirements 7+ years of experience in the software engineering industry, with exposure to large-scale production systems. Cloud certification (AWS, Azure, GCP, Oracle), with experience in services such as compute, containers, and databases. Experience in containerization best practices, cloud-native concepts, and security awareness in the cloud. Working at Boomi means doing what you love, surrounded by trailblazers with an entrepreneurial spirit. Our culture fosters innovation, encourages collaboration, and celebrates the unique contributions of every individual. Take the first step toward your dream career at Boomi where ideas shape the future of technology.
Devops Engineer-2
Cashfree Payments India Private Limited
Position: DevOps Engineer-2 Location: Bengaluru Employment Type: Full-Time Department: Engineering Job Description: We are looking for a skilled DevOps Engineer-2 to design, implement, and maintain secure, scalable, and highly available infrastructure. You will play a key role in automating infrastructure provisioning, capacity planning, and building robust monitoring and CI/CD pipelines. Responsibilities: Design and implement secure, scalable infrastructure solutions. Automate infrastructure provisioning, demand forecasting, and capacity planning. Develop automation tools and frameworks to enhance system observability, availability, reliability, performance, and latency monitoring. Monitor system health, application performance, security controls, and cost optimization. Participate in sustainable incident response, peer reviews, and blameless postmortems. Lead the adoption and rollout of best DevOps tools and automation practices across services. Build and maintain continuous integration and continuous deployment (CI/CD) pipelines. Required Skills and Experience: Minimum 3 years of experience in DevOps and cloud technologies. Expertise in at least one major cloud platform: AWS, Azure, or GCP. Strong production experience with Kubernetes, including deployment, management, and troubleshooting. Proven ability to design scalable and resilient infrastructure architectures. Proficiency with infrastructure-as-code tools such as Terraform, Pulumi, or CloudFormation. Strong debugging and troubleshooting skills. Deep knowledge of Linux servers and networking fundamentals. Hands-on experience with scripting or programming languages like Python, Shell, Go, or Java. Familiarity with monitoring and observability tools such as DataDog, NewRelic, ELK stack, Prometheus, or Grafana. Understanding of modern cloud-native development practices including microservices architecture and RESTful APIs. Ability to thrive in a fast-paced, dynamic work environment.
Technical Cs Specialist
Amazon Jobs
Position: AWS Trust & Safety Specialist I Overview: The AWS Sales, Marketing, and Global Services (SMGS) team drives revenue, adoption, and growth from the largest and fastest-growing small and mid-market accounts to enterprise-level customers, including public sector clients. As part of AWS Global Support, the Trust & Safety (T&S) Abuse Investigation & Prevention Team plays a crucial role in maintaining the integrity and reputation of AWS's IP space. The team focuses on identifying and mitigating online abuse hosted on AWS infrastructure and acts as the first line of defense by vetting potential abuse issues and collaborating with AWS customers to halt harmful activities. As an AWS Trust & Safety Specialist I, you will engage in high-impact investigations, collaborating with multiple teams across AWS to develop scalable solutions for preventing abuse and protecting the AWS ecosystem. You will play a key role in analyzing trends, troubleshooting complex issues, and ensuring the highest level of support for customers and internal stakeholders. Key Responsibilities: Customer Engagement & Expert Support: Provide subject matter expertise (SME) and escalation support for complex customer inquiries related to abuse investigations. Assist in identifying and addressing root causes of abuse cases to ensure efficient resolution. Root Cause Analysis & Process Improvement: Lead investigations into escalated abuse cases, identifying operational inefficiencies and recommending process improvements. Collaborate with internal teams to develop solutions that prevent abuse and improve response times. Cross-Functional Collaboration: Work closely with AWS Enterprise teams, including Technical Account Managers (TAMs), Sales, and Solutions Architects, to address abuse-related issues, develop strategies, and drive continuous improvement. Mentorship & Knowledge Sharing: Mentor new team members, share best practices, and help evolve the T&S team s capabilities to better mitigate large-scale abuse events. High-Impact Projects: Lead cross-functional initiatives to drive long-term solutions and mitigate abuse risks, while simultaneously managing smaller projects to support global efforts. Customer Advocacy: Act as the Voice of the Customer by identifying trends, communicating findings to leadership, and implementing innovative solutions based on customer feedback. Critical Event Support: Assist in customer communications during critical AWS events, providing timely updates and ensuring effective mitigation of abuse-related issues. Qualifications & Experience: Basic Qualifications: Bachelor s degree or equivalent experience in a technical position. 2+ years of experience in a Trust & Safety or similar environment, handling online abuse and security issues. Strong understanding of internet security concepts and common vulnerabilities. Working knowledge of networking technologies such as DNS, TCP/IP, SSL, DHCP, and Load Balancing. Proven technical support experience in abuse/security practices. Preferred Qualifications: Excellent written and verbal communication skills, with the ability to communicate complex technical information clearly. Willingness to participate in an on-call rotation for emergent abuse-related situations. Strong customer handling, conflict resolution, and problem-solving skills, with a focus on delivering exceptional customer experience. Familiarity with both Windows and Linux/Unix operating systems. Experience resolving complex technical escalations, including post-mortem error analysis. Knowledge of Amazon Web Services products and cloud computing technologies. About the Team: At AWS, we value diverse experiences and encourage applicants from varied backgrounds to apply. Whether your career is just beginning or has followed an unconventional path, we welcome your unique perspective. We believe that diversity strengthens our team and fosters innovation. Amazon Web Services (AWS) is the most comprehensive and widely adopted cloud platform. We pioneered cloud computing and continue to innovate, offering a robust suite of products and services that empower organizations of all sizes. Inclusive Team Culture: We promote curiosity and connection, with employee-led affinity groups that foster an inclusive environment where everyone is proud of their differences. Mentorship & Career Growth: We are committed to providing continuous learning, knowledge-sharing, and career-advancing resources to help you grow as a well-rounded professional. Work/Life Balance: We strive for work-life harmony and flexibility, ensuring that our employees can thrive both professionally and personally. Qualification : Bachelors degree OR equivalent experience in a technical position.
Consultant, Cyber Incident Response
Dell Technologies
What You ll Achieve: As a Consultant, Cyber Incident Response, you will be responsible for handling complex cybersecurity incidents, providing advanced analysis, and offering support to L1 and L2 analysts. Your role will require extensive experience in the full lifecycle of Cybersecurity Incident Response, including preparation, analysis, notification, response, recovery, and post-mortem activities. Key Responsibilities: Global Escalation Point: Serve as the primary escalation point for complex cybersecurity incidents that are not resolved by L1/L2 analysts, offering regional subject matter expertise on incident response. Incident Analysis and Investigation: Conduct in-depth analysis of security incidents, determining the root cause and potential impact to the organization. Investigate and analyze large, unstructured datasets, malicious artifacts, and EDR (Endpoint Detection and Response) tools to identify trends, anomalies, and potential threats. Incident Response Lifecycle: Oversee all phases of incident response, including preparation, analysis, response, recovery, and post-mortem reviews to identify lessons learned and enhance future response efforts. Liaison with Stakeholders: Act as a liaison between various stakeholders and internal CSIRT (Computer Security Incident Response Team) teams, helping implement best security practices and driving process improvements for incident response. Mentorship and Training: Provide guidance and training to L1 and L2 analysts, sharing your knowledge to enhance their skills in cybersecurity incident response. Essential Requirements: Cybersecurity Expertise: 10+ years of experience in cybersecurity incident response and hands-on experience within a Security Operations Center (SOC). Incident Investigation Skills: Exceptional ability to conduct investigations, analyze findings, and determine the root cause of incidents. Strong Technical Knowledge: In-depth understanding of security technologies such as SIEM (Security Information and Event Management), full packet capture, firewalls/NGFW, IDS/IPS, EDR, DLP (Data Loss Prevention), UEBA (User Entity Behavior Analytics), and familiarity with networking protocols. Experience with Cloud Computing, Microsoft Windows, and Linux/Unix platforms. Experience with Cyber-attacks: Strong knowledge of various cyber-attack types and techniques, including incident response, threat hunting, and understanding attack lifecycles. Analytical and Communication Skills: Excellent analytical thinking, time management, and coordination skills. Strong command of English, both written and verbal, for clear communication with stakeholders and teams. Desirable Requirements: Certifications: Industry-recognized certifications such as CISSP, SANS GCIH, GCIA, GNFA, GREM, etc. Additional Skills: Experience in Digital Forensics and reverse malware tools. Proficiency in scripting languages for incident analysis and automation.
Security Engineer
Ericsson-worldwide
Our Exciting Opportunity: We are looking for a Security Engineer to manage, track, and support security-related activities within our organization, ensuring the continuous availability and performance of services as per Service Level Agreements (SLA). This role will involve incident management, security tool integration, process improvement, and governance reporting. As a Security Engineer, you will play a key role in ensuring that security incidents are identified, responded to, and resolved effectively and quickly. You'll work with various teams to mitigate risks and improve overall security posture. What you will do: Incident Management: Respond to after-hours security incidents (on-call support). Coordinate event collection, log management, and compliance automation. Address day-to-day security change requests related to security operations. Conduct research and intelligence gathering on emerging threats and exploits. Create new security rules based on identified threats. Perform postmortem analysis of logs, traffic flows, and activities to identify malicious activity. Analyze security incidents involving networking devices, operating systems, endpoint analysis, and network attacks. Work with Technical Authority teams to resolve security incidents. Provide Root Cause Analysis for security incidents, outages, or impairments. Administer authentication and access controls, including user provisioning and deprovisioning. Tools Integration: Integrate security tools (SIEM, VA, IAM) with various network nodes. Deploy policies, signatures, parsers, and rules for security infrastructure. Communicate with vendors (e.g., SIEM, IPS/IDS, IAM) for application-related issues. Process Improvement: Mentor Level 1 analysts to improve detection capabilities within the Security Operations Center (SOC). Prepare Use Cases and MOPs (Method of Procedures) based on identified scenarios. Create and maintain technical operational work instructions. Drive continuous improvement by identifying opportunities to enhance current processes. Governance and Reporting: Provide business intelligence reporting based on SOC and customer needs. Identify and report risks related to security. Perform periodic security reporting and present findings to management or customers. To be successful in this role, you must have: Strong knowledge of information security concepts and best practices. Experience with SIEM tools (e.g., McAfee ESM, QRadar, ArcSight, Splunk). Experience with scanning tools (e.g., Nessus, Qualys, IBM AppScan). Experience with PAM tools (e.g., BeyondTrust, CyberArk). Knowledge of Linux and MS Windows systems with a technical understanding of TCP/IP networks. Understanding of enterprise computing environments, distributed applications, and security controls. Key Qualifications: Education: Graduate in Computer Science or a similar field. Experience: 5 to 11 years of experience, with at least 2 years in IT and 2 years in security. Certifications (Preferred): ITIL certification CCSP (Certified Cloud Security Professional) OSCP (Offensive Security Certified Professional) Security+ CCNA Security or similar certifications. Why This Role? This is a fantastic opportunity for a Security Engineer to develop your career by working with cutting-edge security technologies and supporting a highly dynamic and crucial role in an organization. You will have the chance to mentor junior team members, improve security processes, and work with state-of-the-art tools to ensure the highest levels of security for the organization. Apply now to join our team and contribute to maintaining and improving the security infrastructure! Qualification : Graduate in Computer Science or similar
1 - 20 of 0 jobs
* No exact matches found. Showing closest results insteadNo results found
Modify search criteria or create an alert to get relevant jobs as soon as they’re posted