Sudip Bhattacharya has over 23 years of experience applying innovative technology to business solutions.
He has worked extensively on cutting edge technologies in multiple countries including India, USA, France, Taiwan and Japan.
INDIAai interviewed Sudip to get his take on AI.
What first sparked your interest in AI?
In college, I was asked to solve the Traveling Salesman Problem (TSP). This problem assumes there are n cities, and visiting all of them is costly. A salesman should visit every city with minimal cumulative costs.
At first it seemed like an easy question, especially for a 4/5 city. However, the book mentions another exciting way to solve it using neural networks. This was my first step towards building solutions using neural networks, which then led to a serious interest in AI. I’m still solving a TSP (combinatorial optimization problem finding the best from a finite set). Business applications have evolved and become massive.
Tell us about the Vernacular Speech AI project.
We speak in such a way that instincts and environment influence dialects. Our facial expressions, voice, manner of speaking, and vocal characteristics are unique. As we learn and speak other languages, our mother tongue or local dialect influences our pronunciation, tone, and speed of speaking. Heterogeneity is non-trivial and varies in pronunciation, tone, speed, loudness, vocabulary and grammar from individual to individual, from kilometer to kilometer, from geographical region to geographical region. My current project on native speech is to create an automated software system to assess abstract features of spoken language and infer confidence, speech quality, fluency, and proficiency.
The challenge is to define the “right” references and benchmarks for AI model test and training data. However, it is an important problem to solve if we are to achieve global equitable access to systems regardless of linguistic background. Our project explicitly analyzes at least 10 Indian languages at the phonetic level to detect correctness for applied use cases such as:
- Communication skills required for a job interview
- Customer interaction
- situational maturity
- An unbiased local language proficiency check
To do this, rather than using the traditional method of fine-tuning an out-of-the-box model, we decided to train the model to understand the speech features of the local language.
The project is divided into multiple segments.
- Training and test data collection. It is difficult to obtain public data recorded for Indian languages and recorded for a sufficient period of time.
- feature extraction
- Model Algorithms and Training with Deep Neural Networks
- Model testing and evaluation
- Adoption and ongoing feedback
Each segment unearthed new avenues for using speech directly as raw input, beyond speech-to-text conversion. We are still working on creating a generic model that covers most of India’s linguistic diversity.
As a longtime member of the AI and machine learning community, what are the most common misconceptions you’d like to dispel?
From what I’ve heard while arguing with people outside the machine learning community, the most common misconceptions are:
- AI will take many jobs
- AI may soon be like a ferocious robot like science fiction movies
Not only will it educate and dispel the knowledge of AI and its limitations, but it will also bring trust, support and acceptance. Moreover, when the transparency of guardrails and regulations is made available to people, the power of AI can be unlocked to improve lives.
What are your thoughts on the challenges of local language speech in India’s complex language structure?
We can say this from the experience we have while building one of the few direct speech capture and inference systems for detecting local language features in India.
- The Indian situation is like the socio-cultural, diversity and linguistics of many countries rolled into one. Creating a near “correct” model for all Indian languages is complicated.
- Collecting and annotating audio from Youtube, social media, or some public dataset was not enough.
- Obtaining/creating a comprehensive collection of large datasets for training machine learning models is not trivial
- Managing annotation bias in training data is difficult.
- A limited range of research is available/accessible on Indian local language speech.
However, we see different efforts by different stakeholders going in the right direction, and hope to see robust and comprehensive language speech datasets (ML models) coming soon. increase.
What advice would you give to someone considering a career in AI research? What should I focus on to move forward?
AI research is no longer just an exciting field, but a necessity for the good of society. India’s challenges, demographics and opportunities leave immense room for research and real-world AI applications.
In particular, I propose or expect the following from AI researchers in Japan.
- Build a sound foundation in mathematics and computer science
- Collaboration with real application builders (both corporate and individual)
- Prioritize research topics that have a measurable and positive impact on life today
