Machine Learning Engineer - Speech Ai (asr & Tts) Job in Sarvam

Machine Learning Engineer - Speech Ai (asr & Tts)

Sarvam 4+ weeks ago

Bengaluru, Bangalore Urban, Karnataka
Not Disclosed
Full-time

Apply Now

Save Job

Job Summary

Machine Learning Engineer - Speech AI (ASR & TTS)

Location: Bengaluru, Karnataka, India (On-Site)

Department: Engineering

Employment Type: Full-Time

About Sarvam.ai

Sarvam.ai is a pioneering generative AI startup headquartered in Bengaluru, India. We specialize in leading transformative research and development in speech and language technologies. Focused on building state-of-the-art ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models, particularly for Indic languages, we aim to redefine human-computer interaction with cutting-edge, AI-driven solutions. Join us as we push the boundaries of Speech AI to create inclusive, scalable, and intelligent voice-based applications for diverse communities worldwide.

Role Overview

We are seeking an experienced Machine Learning Engineer specializing in Speech AI (ASR & TTS). The ideal candidate will work on deep learning-based ASR and TTS models, improving accuracy, efficiency, and multilingual capabilities while deploying them at scale. The role involves developing and optimizing speech recognition and synthesis models with a focus on low-resource languages, real-time inference, and scalability. If you have a passion for speech processing and deep learning, this is a great opportunity to make a significant impact in a rapidly growing field.

Key Responsibilities

ASR (Automatic Speech Recognition)

Develop, train, and optimize speech-to-text models using state-of-the-art architectures like Wav2Vec, Whisper, Conformer, and DeepSpeech.
Implement techniques for low-latency ASR inference, including beam search, language model integration, and real-time transcription.
Improve speech recognition accuracy for low-resource languages, especially Indic languages, using transfer learning and data augmentation.
Optimize ASR pipelines for noise robustness, speaker adaptation, and domain-specific transcription.

TTS (Text-to-Speech)

Develop and fine-tune neural TTS models such as Tacotron, FastSpeech, VITS, or WaveNet for high-quality, natural-sounding speech synthesis.
Implement multilingual and expressive TTS models with prosody and emotion control.
Optimize TTS inference for deployment on edge devices, mobile, and cloud platforms.
Improve speech synthesis quality through voice cloning, neural vocoders (HiFi-GAN, WaveGlow), and prosody modeling.

General Speech AI Responsibilities

Benchmark and profile ASR/TTS models to improve latency, efficiency, and deployment performance.
Deploy scalable speech AI APIs on AWS, Azure, or GCP for real-world applications.
Optimize ASR & TTS models for edge and offline inference.
Stay updated with the latest advancements in speech AI, neural vocoders, and real-time inference techniques.

Must-Have Qualifications

Experience: 2-3 years of experience in speech AI, deep learning, or machine learning, with a focus on ASR & TTS.
Education: Bachelor s or Master s degree in Computer Science, AI/ML, Speech Processing, or a related field.
ML Frameworks: Proficiency in PyTorch or TensorFlow for training and deploying ASR/TTS models.
ASR Expertise: Experience with speech-to-text architectures like Whisper, Wav2Vec, Conformer, or DeepSpeech.
TTS Expertise: Experience with speech synthesis models like Tacotron, FastSpeech, or VITS.
Speech Signal Processing: Understanding of MFCCs, STFT, phonemes, prosody modeling, and feature extraction.
Inference Optimization: Hands-on experience with TensorRT, ONNX, or quantization (INT8, FP16) for ASR/TTS.
Cloud & Edge Deployment: Experience deploying speech models on AWS, GCP, or Azure.

Preferred Qualifications

Experience with speech diarization, speaker recognition, or language modeling for ASR.
Familiarity with zero-shot TTS, voice cloning, and multilingual speech modeling.
Understanding of CUDA optimization and low-bit quantization for ASR/TTS models.
Contributions to open-source speech AI projects or a strong GitHub portfolio showcasing relevant work.
Experience with real-time streaming ASR/TTS applications and low-latency inference.

Why Join Sarvam.ai?

Innovative Impact: Work on AI-driven speech solutions that are changing how people interact with technology, especially in low-resource languages.
Cutting-Edge Technology: Contribute to the development of state-of-the-art speech AI models in a rapidly advancing field.
Collaborative Environment: Work with a team of experts in AI, machine learning, and speech processing, in a startup culture.
Growth Opportunities: Sarvam.ai offers exciting career growth in a fast-paced environment with opportunities for personal and professional development.

How to Apply

Interested candidates are invited to submit their resume, cover letter, and any relevant project portfolios or GitHub links showcasing their experience in ASR, TTS, or Speech AI. Strong AI-related projects, whether in industry, research, or personal work, will be highly valued.

Qualification :
Bachelors or Masters degree in Computer Science, AI/ML, Speech Processing, or a related field.

Experience Required :

Minimum 2 Years

Vacancy :

2 - 4 Hires

Apply Now

Save Job

To Receive email alerts for similar jobs

Similar Jobs for you

Machine Learning Engineer - Speech Ai (asr & Tts)
Sarvam
- Bengaluru, Bangalore Urban, Karnataka
4+ weeks ago

More Jobs Options