Pumza fihlaniJohannesburg BBC News
BBCAfrica has a large portion of the world's language, and while some estimates say it's far beyond the quarter, many are missing in regards to AI development.
This is also a problem of lack of investment and available data.
Most AI tools, such as Chat GPT, are used today. It is trained in English and other European and Chinese.
These have a huge amount of online textbooks that can be drawn to.
However, many African languages are mostly spoken rather than written down, and can be useful for speakers of these languages as there is a lack of text to train AI.
For millions across the continent, this means being excluded.
Researchers looking to address this issue have recently released what is considered to be the largest known dataset in African languages.
“We dream in our own language, within them, interpret the world through them. If technology doesn't reflect that, there's a risk that the whole group will remain,” Professor Vukosi Marivathe of the University of Pretoria, who worked on the project, tells the BBC.
“We're going through this AI revolution and imagine everything we can do with it. Now, since all the information is in English, imagine there's a part of the population that doesn't have that access.”
The Africa Next Voices project brought together linguists and computer scientists to create an AI Ready dataset in 18 African languages.
It may be just a small portion of the estimated 2,000 languages spoken across the continent, but those involved in the project want to expand in the future.
In two years, the team recorded 9,000 hours of speeches in Kenya, Nigeria and South Africa, capturing everyday scenarios in agriculture, health and education.
Recorded languages include Kenya in Nigeria, Kikuyu and Druu in Hausa and Yoruba, and Ishizuru and Tsevenda in South Africa, some of which have been spoken by millions of people.
“We need some basis first, and that's the next voice in Africa, and on top of that people add their own innovations,” says Professor Marivate, who led South African research.
His Kenyan counterpart, computational linguist Lillian Wanzale, says that recording speeches on the continent meant creating data aimed at reflecting the way people really live and speak.
“We've gathered voices from different regions, ages and backgrounds, so it's as comprehensive as possible. Big technology can't always see those nuances,” she says.
The project was made possible by a $2.2 million (£1.6 million) Gates Foundation grant.
Data will become open access, allowing developers to build tools to translate, transcription and respond in African languages.
According to Prof Marivathe, there are already small examples of how indigenous languages used in AI can be used to solve real-life challenges in Africa.

Farmer Kelebozil Mosme manages a 21 hectares site in Rustenburg, the heart of South Africa's platinum region.
The 45-year-old works with a small team to grow vegetables such as beans, spinach, cauliflower and tomatoes.
She began three years ago with cabbage crops and began to help her use an app called Ai-Farmer to recognize several South African languages, including Sesotho, Isizulu and Afrikaans.
“You're facing a lot of challenges because someone is still learning to farm,” Mosme said.
“I think there is an advantage to being able to use Home Language Setstrawna in the app whenever you run into problems on the farm every day. You can ask anything and get a useful answer.
“For someone in a rural area like me who is not exposed to technology, it is useful. You can ask about the various options for insect control. It also helps diagnose diseased plants.”
Lelapa Ai is a young South African company that builds AI tools in African languages for banks and telecommunications companies.
For CEO Pelonomi Moiloa, what is currently available is extremely limited.
“English is the language of opportunity. For many South Africans who don't speak it, it's not just an inconvenience. It could mean missing out on important services like healthcare, banking, and even government support,” she tells the BBC.
“Language can be a huge barrier, and I'm saying it shouldn't.”
However, this is not about business or convenience.
Marivathe Prophet also has the risk that without African language initiatives there could be something else to be lost.
“Language is access to the imagination,” he says.
“It's not just words. It's history, culture, knowledge. If it doesn't contain Indigenous language, you lose more than data. You lose a way to see and understand the world.”
You may be interested too:
Getty Images/BBC
