Machine Learning Engineer - Speech Ai (asr & Tts) Job in Sarvam
Machine Learning Engineer - Speech Ai (asr & Tts)
- Bengaluru, Bangalore Urban, Karnataka
- Not Disclosed
- Full-time
Machine Learning Engineer - Speech AI (ASR & TTS)
Location: Bengaluru, Karnataka, India (On-Site)
Department: Engineering
Employment Type: Full-Time
About Sarvam.ai
Sarvam.ai is a pioneering generative AI startup headquartered in Bengaluru, India. We specialize in leading transformative research and development in speech and language technologies. Focused on building state-of-the-art ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models, particularly for Indic languages, we aim to redefine human-computer interaction with cutting-edge, AI-driven solutions. Join us as we push the boundaries of Speech AI to create inclusive, scalable, and intelligent voice-based applications for diverse communities worldwide.
Role Overview
We are seeking an experienced Machine Learning Engineer specializing in Speech AI (ASR & TTS). The ideal candidate will work on deep learning-based ASR and TTS models, improving accuracy, efficiency, and multilingual capabilities while deploying them at scale. The role involves developing and optimizing speech recognition and synthesis models with a focus on low-resource languages, real-time inference, and scalability. If you have a passion for speech processing and deep learning, this is a great opportunity to make a significant impact in a rapidly growing field.
Key Responsibilities
ASR (Automatic Speech Recognition)
- Develop, train, and optimize speech-to-text models using state-of-the-art architectures like Wav2Vec, Whisper, Conformer, and DeepSpeech.
- Implement techniques for low-latency ASR inference, including beam search, language model integration, and real-time transcription.
- Improve speech recognition accuracy for low-resource languages, especially Indic languages, using transfer learning and data augmentation.
- Optimize ASR pipelines for noise robustness, speaker adaptation, and domain-specific transcription.
TTS (Text-to-Speech)
- Develop and fine-tune neural TTS models such as Tacotron, FastSpeech, VITS, or WaveNet for high-quality, natural-sounding speech synthesis.
- Implement multilingual and expressive TTS models with prosody and emotion control.
- Optimize TTS inference for deployment on edge devices, mobile, and cloud platforms.
- Improve speech synthesis quality through voice cloning, neural vocoders (HiFi-GAN, WaveGlow), and prosody modeling.
General Speech AI Responsibilities
- Benchmark and profile ASR/TTS models to improve latency, efficiency, and deployment performance.
- Deploy scalable speech AI APIs on AWS, Azure, or GCP for real-world applications.
- Optimize ASR & TTS models for edge and offline inference.
- Stay updated with the latest advancements in speech AI, neural vocoders, and real-time inference techniques.
Must-Have Qualifications
- Experience: 2-3 years of experience in speech AI, deep learning, or machine learning, with a focus on ASR & TTS.
- Education: Bachelor s or Master s degree in Computer Science, AI/ML, Speech Processing, or a related field.
- ML Frameworks: Proficiency in PyTorch or TensorFlow for training and deploying ASR/TTS models.
- ASR Expertise: Experience with speech-to-text architectures like Whisper, Wav2Vec, Conformer, or DeepSpeech.
- TTS Expertise: Experience with speech synthesis models like Tacotron, FastSpeech, or VITS.
- Speech Signal Processing: Understanding of MFCCs, STFT, phonemes, prosody modeling, and feature extraction.
- Inference Optimization: Hands-on experience with TensorRT, ONNX, or quantization (INT8, FP16) for ASR/TTS.
- Cloud & Edge Deployment: Experience deploying speech models on AWS, GCP, or Azure.
Preferred Qualifications
- Experience with speech diarization, speaker recognition, or language modeling for ASR.
- Familiarity with zero-shot TTS, voice cloning, and multilingual speech modeling.
- Understanding of CUDA optimization and low-bit quantization for ASR/TTS models.
- Contributions to open-source speech AI projects or a strong GitHub portfolio showcasing relevant work.
- Experience with real-time streaming ASR/TTS applications and low-latency inference.
Why Join Sarvam.ai?
- Innovative Impact: Work on AI-driven speech solutions that are changing how people interact with technology, especially in low-resource languages.
- Cutting-Edge Technology: Contribute to the development of state-of-the-art speech AI models in a rapidly advancing field.
- Collaborative Environment: Work with a team of experts in AI, machine learning, and speech processing, in a startup culture.
- Growth Opportunities: Sarvam.ai offers exciting career growth in a fast-paced environment with opportunities for personal and professional development.
How to Apply
Interested candidates are invited to submit their resume, cover letter, and any relevant project portfolios or GitHub links showcasing their experience in ASR, TTS, or Speech AI. Strong AI-related projects, whether in industry, research, or personal work, will be highly valued.
Qualification : Bachelors or Masters degree in Computer Science, AI/ML, Speech Processing, or a related field.

