Site Reliability Developer 2/3 Job in Oracle India
Site Reliability Developer 2/3
- Bengaluru, Bangalore Urban, Karnataka
- Not Disclosed
- Full-time
Job Description: Site Reliability Engineer - OCI Cloud Engineering Team
Role: Site Reliability Engineer (SRE)
Team: OCI OLTP (Online Transaction Processing)
Location: Kiev
Career Level: IC2
Experience: 5+ years
Overview:
Oracle Cloud Infrastructure s (OCI) OLTP organization is seeking a Site Reliability Engineer (SRE) to join our dynamic and fast-paced Cloud engineering team. The team is responsible for mission-critical distributed systems and cloud services, and we are looking for an engineer who is deeply interested in databases, distributed systems, and cloud services. If you thrive in an environment where innovation, problem-solving, and operational excellence intersect, this is an exciting opportunity for you!
As a member of the SRE services, you will focus on Cloud Services, building deployments, operations, security vulnerability mitigation, and automation. You will be instrumental in fostering a culture of Site Reliability Engineering (SRE) within the team, and your work will directly contribute to ensuring the stability, performance, and reliability of Oracle s global cloud service infrastructure.
This role requires someone who is adaptable, highly motivated, and capable of managing large-scale cloud environments with a focus on continuous improvement.
Key Responsibilities:
Cloud Service Operations & Reliability:
- Deploy, operate, and maintain large-scale cloud service products in a highly available, fault-tolerant, and scalable environment.
- Collaborate with internal teams to identify and mitigate cross-team issues that pose operational risks to cloud services.
- Focus on systems reliability and ensure the continuous availability of cloud services by automating tasks and eliminating manual interventions.
Automation & Improvements:
- Automate operational tasks and improve service deployments, focusing on scaling, performance, and uptime.
- Contribute to CI/CD systems, ensuring seamless integration and continuous delivery for cloud-based services.
- Leverage automation tools such as Terraform, Grafana, and Bitbucket to streamline operations.
Security & Incident Response:
- Mitigate security vulnerabilities within cloud services and ensure compliance with Oracle's security standards.
- Participate in on-call rotations to provide immediate troubleshooting support and ensure rapid issue resolution.
- Perform deep analysis of service performance and collaborate with team members to diagnose and resolve issues that affect service availability or performance.
Collaborative Problem-Solving:
- Work closely with cross-functional teams, including development, database, networking, and storage experts, to ensure the reliability and performance of services.
- Identify systemic issues and potential risks, develop solutions, and ensure proper documentation and communication with stakeholders.
Documentation & Knowledge Sharing:
- Contribute to documentation such as runbooks, operational guides, and troubleshooting manuals.
- Mentor junior engineers and share knowledge on best practices for site reliability engineering and cloud service operations.
Continuous Learning:
- Stay up to date with new cloud technologies, trends, and best practices, and actively implement them in your day-to-day work.
Technical and Professional Requirements:
- Cloud Services & Infrastructure: 5+ years of experience in SRE, DevOps, or Automation roles with a focus on large-scale infrastructure and cloud services. Hands-on experience with cloud platforms (e.g., OCI, AWS, Azure) and expertise in compute, database, networking, and storage services within cloud environments.
- Automation & Tooling: Proficiency with automation tools such as Terraform, Grafana, LumberJack, and Shepherd. Solid experience in using CI/CD tools and processes for cloud service deployments and operations.
- Scripting & Systems: Strong knowledge of scripting languages, particularly Python and Java. Familiarity with Linux systems, docker containers, virtualized infrastructure, and orchestration (e.g., Kubernetes).
- Performance & Troubleshooting: Excellent troubleshooting skills with a focus on performance, availability, reliability, and scalability of distributed systems. Experience in operating fault-tolerant, highly available, high-throughput distributed systems.
- Security & Incident Management: Familiarity with security practices and mitigating security vulnerabilities in cloud services. Proven ability to handle incident response and provide efficient troubleshooting during on-call rotations.
- Collaboration & Communication: Strong verbal and written communication skills, capable of working effectively with diverse teams across multiple geographies. Ability to work in a highly collaborative environment, driving operational excellence and customer satisfaction.
Preferred Qualifications:
- Experience in operating and maintaining multi-tenant, cloud-based infrastructure with a focus on scalability and high availability.
- Familiarity with tools and platforms like Grafana, Prometheus, and other observability and monitoring tools.
- Experience in networking and storage technologies in a cloud environment.
Why Join OCI s OLTP Team?
Joining OCI s OLTP team as an SRE gives you the opportunity to work with cutting-edge technologies and contribute to the operational excellence of Oracle s global cloud infrastructure. This is a chance to grow your skills in a highly dynamic environment and to solve complex problems that directly impact mission-critical cloud services. With a focus on automation, scalability, and high performance, you will be an essential part of a team that powers Oracle s leading cloud services.
If you are an experienced engineer passionate about cloud technologies, automation, and ensuring the reliability of large-scale systems, we encourage you to apply and join us in this exciting journey!

