As part of India's mission to build a large-scale language model (LLM) that reflects the diversity of Indian languages, the Ministry of IT in India has finalized several companies, including gnani.ai.
Ganesh Gopalan, co-founder and CEO of Gnani.ai, highlights the need for an Indigenous language model tailored to India, explains how capturing the linguistic diversity of a country differs from its global counterparts, and explores the transformational impact of AI across key sectors.
Are there any recent progress or milestones achieved by Gnani AI?
Over the past year, AI has been adopted more dramatically in the industry. This incredible demand has been a huge news for businesses like us, and has been a deep, high-tech company since its birth. For many years, being in AI has not been fashionable, but that has changed. Specializing in the Indian AI mission, it was chosen last Friday to build on models from voice to voice. Use new architectures to build foundational models to organize real-time conversations and help you make them almost instantly. Typically, AI voice conversations have problems with delay and accuracy. The emotional context is often lost. If you have multiple models, the errors tend to cascade across the model. Our model fuses different elements of the architecture into a single architecture, allowing software components to work closely together, thereby reducing latency. It also enables real-time communication and tracks the emotions behind it. Typically, an architecture built around the world is a speech-to-text system, followed by a speech system from LLM and text. We crunch many of these modules together, encoding the pitch, emotions and tones behind the conversation. Therefore, the answer or output also depends on the emotion. We are excited that this will make a huge difference not only in the industry but also in many government use cases.
The basic model of Indigenous peoples appears to reflect the efforts of the US and China. How is it different from a global counterpart?
We are building these models to handle problems that are endemic to India. The global model can work to some extent in English and Hindi, but usually fails in other Indian languages, particularly resource-conscious languages such as those spoken in the Northeast. The intention is to build something new and work for the diverse languages and dialects of India. It also lays the foundation in terms of speech-to-voice-to-voice LLMS, while dealing with Indian language and its diversity.
How can such government-led projects enhance India's global position in AI?
The importance of AI in the world is just as important as the advent of computers and the Internet. Whether it's government or enterprise services, AI becomes the oil where things are executed. A GPU is required for AI to run. This has been an important issue for startups and any company in India. This core resource is hardly available in India that it cannot train AI. There are some GPU components that are outdated or sometimes unusable. It's also an exorbitant price. We are trying to build AI systems, but in many cases we don't have enough GPUs to train them. However, the government will address GPU availability and its pricing.
What are some potential use cases for this model across sectors? Will the model be democratized? Open for widespread access and use?
Sectors that include real-time voice conversations will benefit from this. For example, we intervened to resolve maternal health issues. Infant mortality is a huge embarrassment for the country. The theory that emerged is that access to information can help improve the situation. We have built an autonomous voice AI agent that speaks to pregnant mothers in local dialects and reminds us of vaccination among other things. What I learned there influenced this effort. It was not just the information provided, but the emotions of those who spoke. Another advantage is access to civic services and education. Apart from the obvious use cases for the enterprise, anything that requires real-time conversations with the machine is affected. This model is a great technology challenge as many companies around the world are not doing this. This model can be used by anyone in the industry to solve a specific problem. Voice tends to be a natural form of communication, and we all draw to it.
What data is used to train the model? How many languages and dialects do you support?
In the first few years, we are looking at 22 languages and in the first phase we are looking at 14 languages. We have been working in the Voice AI space for a long time. When we first started our company, we collected data on all languages, districts, industry and noise environments. For example, I speak Tamil, but I come from a small suburb of Mumbai called Chembur. How I speak Tamil is different from someone in Chennai, Tirunelveli, or Thanjavur. It's easy to say that the model understands and speaks Tamil with low latency, but does that understand all dialects? We've collected millions of hours of unedited audio over the years. We have data in all languages and supplement synthetic data from resource-constrained regions or languages. There is also plenty of open source access to data, using all three combinations to build these models. One of the benefits that brings to the table is the vast amount of data that you collect, especially when you start your company. We've only strengthened this over the years.
The startup appears to be at the forefront of India's AI innovation. What advantages do they have over large companies?
IT services companies are essentially focused on services, and that was their success. As AI talents become abundant in the country, I am sure they are amazed with AI services. The problem, however, is that if you are a publicly listed company and have huge profits or large shareholders, you cannot invest or bet on future technology. No matter which technology is in it, you will make the most of it and grow your company. Startups tend to bet on the future. We should always try to discover new things as part of our DNA. The challenge is that when companies like us have established an IPO and have an IPO, they have to maintain that level of innovation to do the next best thing. Today, it is relatively easy because we have less to lose and we can make the innovations we want.