When it comes to AI models, the spotlight is mainly on the US and China. Despite its size and deep talent pool, India is rarely seen as a source for core AI development. But Bengaluru-based startup Sarvam AI is trying to change that perception with what it calls “sovereign AI.” The company builds basic AI models from scratch in India. Two of those tools are making waves this week: Sarvam Vision and Bullbul. Everything is for a good reason.
Sarvam Vision appears to outperform bigger and more talked-about AI models such as ChatGPT, Google Gemini, and Anthropic Claude on certain benchmarks in optical character recognition (OCR), the company’s specialty. Its performance seems to be very good and has been praised by users and experts alike.
Pratyush Kumar, co-founder of Sarvam AI, recently shared details of the latest achievements of the company’s in-house AI model in a series of posts on X. According to the company, Sarvam Vision achieved an accuracy score of 84.3% on olmOCR-Bench. This score is higher than recent OCR models such as Gemini 3 Pro and DeepSeek OCR v2, which ranked ChatGPT significantly lower.
Additionally, Sarvam Vision also scored highly on OmniDocBench v1.5, a benchmark that tests how AI systems read and understand real-world documents. The overall score was 93.28%, with particularly good results for complex layouts, technical tables, and formulas. These are areas where traditional OCR systems often struggle due to messy formatting and dense content.
The performance of AI tools is attracting attention worldwide. Sarvam had previously faced doubts for his focus on Indian language models, but that skepticism has now turned to approval.
Technology commentator Didi Das, who had previously questioned the value of building smaller Indic models, recently admitted he had underestimated the company. In a post on X, Das said Sarvam’s OCR and speech models for Indian languages are powerful and fill a gap that has been largely ignored by large global AI labs.
“I was wrong about Sarvam. When I wrote about Sarvam a year ago, it felt like the direction of training small-scale Indian language models was wrong. But have they turned it around?” he wrote. “We have the best text-to-speech, speech-to-text and OCR models for Indian languages, and it’s actually a great value. The prices are very reasonable.”
It has received praise from users as well. One user talked about his experience with Sarvam’s model, writing, “I used this a few days ago. It’s amazing.”
Bullbul enables AI voices for Indian languages
In addition to the OCR tool, Sarvam also released a new AI voice model called Bulbul V3. This is a text-to-speech AI model that aims to generate speech using AI. In some ways, this is similar to the AI tools offered by ElevenLabs, a company considered the best in this field.
“Today, we are releasing Bulbul V3, our most capable text-to-speech model designed to deliver natural, expressive, production-ready voices for Indian languages,” Sarvam said in a blog post. “Bulbul V3 minimizes failure modes and provides accurate and stable audio for content across inputs that are critical for India-specific use cases.”
Currently, the tool supports over 35 voices across 11 Indian languages. The company says it plans to expand language support to a total of 22 languages.
Bulbul has also received a certain amount of praise. Pratik Desai, founder of KissanAI, wrote on X: “We use Bulbul as our go-to tts model for Indic language use cases, and it gets better with each release. On the other hand, the cost of Eleven Labs never makes sense for Indic or any other language.”
– end
