Researchers use AI to improve video accessibility for blind users

Researchers at Northeastern University have made audio descriptions available on user-generated videos as part of a crowdsourced platform called YouDescribe.

Visually impaired and low-visioned people request explanations for YouDescribe videos, but only 7% have been completed. AI is speeding up processes. Photo: Matthew Moderno/Northeastern University

SAN JOSE, CA – For those with blind or poor eyesight, audio descriptions of action in movies and TV shows are essential to understanding what is going on. Networks and streaming services hire experts to create audio descriptions, but this is not the case for billions of YouTube and Tiktok videos.

That doesn't mean people don't want to access content.

Using AI Vision Language Models (VLM), researchers at Northeastern University have made audio descriptions available in user-generated videos as part of a crowdsourcing platform called YouDescribe. Like libraries, blind and low vision users can request descriptions of the video and make subsequent rates and contributions.

“I understand that a 20-second video about Tiktok in Tiktok might not get a professional explanation,” says Lana Do, who earned her Masters in Computer Science from the Silicon Valley campus in Northeastern in May. “But blind and low-minded people might want to see the dance video as well.”

In fact, the 2020 video of the song “Dynamite” by the Korean boy band BTS is at the top of YouDescribe's wish list and is waiting to be explained. The platform has 3,000 volunteer accountants, but the wish list is so long that it can't keep up. Only 7% of the requested videos on the wishlist have an audio description, Do says.

I work in Ilmiyun's lab, where I teach computer science professors on the Silicon Valley campus. Yoon joined the YouDescribe team in 2018 to develop the machine learning elements of the platform.

This year, we added new features to speed up the human loop workflow in YouDescribe. New VLM technology provides better quality explanations, and the new Infobot tool allows users to request more information about a particular video frame. Low-Vision users can even fix mistakes in the description in the collaboration editing interface, Do says.

As a result, video content is explained better and more quickly becomes available. AI-generated drafts reduce the burden on human explainers, she said, allowing users to easily engage in the process through ratings and comments.

“They could say they were watching documentary sets in the forest and heard the sound of unexplained flapping,” Do says.

Do and her colleagues recently published a paper at a symposium on interaction with human computers for work in Amsterdam on the possibility that AI will accelerate the development of audio explanations. In this video, the AI agent explains the steps a chef takes while making cheese rolls.

But there are some consistent weaknesses, she says. AI is not good at reading facial expressions in manga. And overall, humans are good at picking up the most important details of the scene. This is an important skill for creating useful explanations.

“It's very labor-intensive,” says Yun.

Graduate students in her lab compare the first draft of AI to what a human explainer creates.

“Then we'll measure the gap so that we can train our AI to do a better job,” she says. “Blind users don't want to be distracted by too many verbal explanations. It's editorial arts that simplifies the most important information.”

YouDescribe was launched in 2013 by the San Francisco-based Smith-Kettlewell Eye Research Institute and trained volunteers who were spotted in creating audio descriptions. Focusing on YouTube and Tiktok videos, the platform offers recording and timing narration tutorials that will allow user-generated video content to be accessed.

Researchers use AI to improve video accessibility for blind users

Science and Technology

Recent stories

Leave a Reply

RECENT POSTS

UnitedHealth tracks employee AI use to drive corporate transformation

Create AI videos and deepfake styles

Does Asana (ASAN) insider stake and AI buzz define long-term competitiveness?

Science and Technology

Recent stories

Related Posts

Leave a Reply