Technical Lead Job in Cisco Technology Inc

Technical Lead

Apply Now
Job Summary

Meet the Team

As a Technical Lead, you will drive technical excellence across HPC infrastructure, network automation, DevOps practices, and SRE principles while leading architecture decisions and guiding teams in implementing high-performance solutions for AI/ML workloads on various network topologies. This role combines deep technical expertise with leadership responsibilities, focusing on system architecture, automation, reliability engineering, and development excellence.

Your Impact

  • Design and implement end-to-end automation solutions for HPC infrastructure (Compute, network, and storage) using Kubernetes operators, Terraform, and Ansible.
  • Analyze compute, storage, and network traffic patterns during distributed training/inference operations across different AI/ML frameworks.
  • Monitor and optimize network utilization patterns for various model architectures.
  • Identify bottlenecks in network communication patterns.
  • Perform root cause analysis across network, compute, and storage layers, with experience handling various failure scenarios and recovery procedures.
  • Make architectural decisions and drive innovation.
  • Develop infrastructure patterns for different workload types.
  • Provide benchmarking and performance engineering leadership.
  • Mentor junior engineers through architecture reviews and code critiques.
  • Design and implement comprehensive telemetry collection systems for monitoring high-speed network microburst behavior.
  • Develop sophisticated visualization tools and analytics frameworks to enable real-time identification of performance bottlenecks and system constraints, facilitating rapid optimization and troubleshooting.

Minimum Qualifications

  • Demonstrated expertise in distributed systems and infrastructure design (compute, storage, and networking).
  • Experience with network automation tools and configuration management (Ansible, Python, Golang, YAML, YANG).
  • Strong background in CI/CD, GitOps, or similar practices and tools.
  • Expert-level experience with observability platforms and practices.
  • Strong background in implementing distributed tracing, metrics collection, and log aggregation systems.
  • Demonstrated experience in at least one completed performance benchmarking project for distributed systems, storage, network, and compute.

Preferred Qualifications

  • Bachelor s degree in Computer Science, Software Engineering, or a related technical field with 15-20 years of extensive hands-on experience in distributed systems development and DevOps practices. An advanced degree is a plus.
  • Contributions to open-source projects related to distributed systems or performance engineering.
  • Experience in analyzing and documenting system performance metrics across network, compute, and storage layers.
  • Prior experience mentoring and developing technical talent.
  • Understanding of AI/ML infrastructure and any prior experience with RDMA, RoCE v2 will be an added advantage.

#WeAreCisco

#WeAreCisco where every individual brings their unique skills and perspectives together to pursue our purpose of powering an inclusive future for all.


Qualification :
Bachelors degree in Computer Science, Software Engineering, or a related technical field with 15-20 years of extensive hands-on experience in distributed systems development and DevOps practices. An advanced degree is a plus.
Experience Required :

15 to 20 Years

Vacancy :

2 - 4 Hires

Apply Now
Similar Jobs for you

See more recommended jobs

Your 4 Step Guide to Career Success

Apply for jobs
Create Profile
Schedule Interview
Get Hired