Sarvam AI powers multilingual video dubbing and translation in 11 Indian languages

Sarvam AI powers multilingual video dubbing and translation in 11 Indian languages | Image: Pratyush

New Delhi: Sarvam AI has launched Sarvam Studio, an innovation designed to help content creators across India make their work multilingual. This release is part of the company’s larger series of announcements on advancing autonomous AI.

Sarvam Studio: One Content, Many Languages

Sarvam Studio’s goal is to enable content creators to translate a single work into multiple Indian languages.

AI-powered video dubbing allows Studio to generate high-fidelity dubbing in 11 Indian languages. According to a survey of experts cited by the organization, participants preferred Sarvam Studio for its overall quality and production readiness.

Additionally, this technology provides genre-agnostic agent-driven document translation of long-form content. According to Sarvam, the evaluation revealed that readers support Studio’s work across a variety of categories. Studio received the highest translation quality rating based on direct comparison of translations of real documents. The company claimed that its products have consistently been recognized for excellence in a variety of difficult fields, including academia, fiction, and law.

Another feature highlighted is the preservation of structure. Studio preserves the original document layout, including tables, headings, figures, and page hierarchy, without the need for manual redesign.

Sarvam Studio aims to enable the production of multilingual content at scale, from textbooks and novels to national addresses and lecture videos.

Expansion across speech, vision, and voice

Alongside Studio, Sarvam unveiled several AI model updates. All 22 planned Indian languages are now supported in Saaras V3, the latest version of the speech recognition model. This model includes real-time streaming and is built to manage noisy audio and mixed languages. Additional features include word-level timestamping, automatic language identification, and multi-speaker audio diarization.

Additionally, the company announced its latest text-to-speech model, Bulbul V3. Bullbul V3 had the highest listener preference and lowest error rate across use cases and languages in an independent third-party human listening study.

Sarvam Vision, a 3 billion parameter visual language model, was also announced. According to the company, this will set a better standard for Indian languages and rival the best achievements in digitizing the English language.

Conversational AI at scale

Every day, over 1 million minutes of conversations take place with Samvaad, Sarvam’s conversational agent platform. The company says it is seeing increased use cases such as 24/7 sales assistants, hybrid onboarding experiences via phone and WhatsApp, and population-wide outreach.

Sarvam claims that 80% of calls are identical to those made by human callers due to close collaboration between AI research and product teams. According to the report, agents virtually doubled their sales interest acquisition and increased interactions across customer service use cases by more than 5 percent.

The company also focused on results from large-scale conversation log analysis. For example, working with credit card companies can help pinpoint why consumers aren’t purchasing cards.

Advancing sovereign AI

Sarvam AI said it is developing what it calls a full-stack sovereign AI platform based on Indian languages and datasets to deploy services at population scale.

The company has entered into strategic agreements with the governments of Tamil Nadu and Odisha to develop institutional capacity for large-scale computing, sovereign models, and AI deployment. The strategy will integrate AI as a public utility across government agencies and will be implemented across the state rather than in limited pilot projects, the statement said.

Additionally, Sarvam announced Arya, a multi-agent orchestration platform. The company claimed that for a typical ETL process, Arya combined with GPT 4.1 mini can deliver about 5x accuracy at about 10x lower cost than Claude Code with agent swarm. There are plans to open source Arya with a debugging interface and a containerized runtime.

Also read: Sarvam AI launches Saaras V3, enhancing real-time speech recognition in 22 languages

Source link