AI-powered tools improve object detection for visually impaired users

AI News


Systems and applications to help people with visual impairments navigate their environments have been rapidly developed in recent years, but there is still room for growth, according to a team of Penn State researchers. The team recently developed a new tool that combines recommendations from the blind community and artificial intelligence (AI) to provide support specifically tailored to the needs of blind people.

The tool, known as NaviSense, is a smartphone application that can identify the item a user is looking for in real time based on voice prompts, using the phone’s integrated voice and vibration features to guide the user to objects in the environment. Test users reported an improved experience compared to existing visual aid options. The team presented the tool and won the Best Audience Choice Poster Award at the Association for Computing Machinery’s SIGACCESS ASSETS ’25 conference, held October 26-29 in Denver. Details of this tool were published in the meeting minutes.

According to Evan Pugh University Professor, A. Robert Knoll Professor of Electrical Engineering, and NaviSense team leader Vijaykrishnan Narayanan, many existing visual assistance programs connect users to in-person support teams, which can be inefficient or raise privacy concerns. While some programs offer automated services, Narayanan explained that there are obvious problems with these programs.

“Previously, a model of an object had to be preloaded into the service’s memory in order to be recognized,” Narayanan said. “This is highly inefficient and greatly reduces the flexibility users have when using these tools.”

To address this issue, the team implemented a large language model (LLM) and a vision language model (VLM). These are both types of AI that can process large amounts of data to answer queries. Narayanan said the app connects to external servers that host the LLM and VLM, allowing NaviSense to learn about its environment and recognize objects in that environment.

“VLM and LLM allow NaviSense to recognize objects in the environment in real time based on voice commands without the need to preload models of objects,” said Narayanan. “This is a huge milestone for this technology.”

Ajay Narayanan Sridhar, a doctoral student in computer engineering and NaviSense’s lead researcher, said the team conducted a series of interviews with people with visual impairments before development so they could specifically tailor the tool’s features to users’ needs.

“These interviews gave us a better understanding of the real challenges faced by visually impaired people,” Sridhar said.

NaviSense searches the environment for requested objects and specifically filters out objects that do not match the user’s verbal request. If you don’t understand what the user is looking for, ask additional questions to narrow down the search. Sridhar said this conversational feature provides convenience and flexibility that other tools don’t offer.

Additionally, by monitoring the phone’s movements, NaviSense accurately tracks the user’s hand movements in real time and provides feedback on where the object the user is reaching for is positioned relative to the user’s hand.

“This kind of guidance was the most important aspect of this tool,” Sridhar said. “While there really wasn’t an off-the-shelf solution to actively guide the user’s hand to an object, this capability was continually requested in our research.”

After the interviews, the team had 12 participants test the tool in a controlled environment and compared NaviSense against two commercial options. The team tracked the time it took for the tool to identify and direct users to objects, as well as monitoring the overall accuracy of the program’s detection mechanisms.

NaviSense significantly reduces the time users spend searching for objects, while identifying objects in the environment more accurately than commercially available options. In addition, participants reported a better user experience compared to other tools, with one user writing in a post-experiment survey: “I like the fact that it gives you clues as to whether the object is on the left or right, above or below, and that, boom, you got it.”

Narayanan said that while the current version of the tool is effective and user-friendly, there is room for improvement before commercialization. The team is working on optimizing the application’s power usage, which will reduce smartphone battery drain and further improve the efficiency of LLM and VLM.

“This technology is very close to commercial release, and we are working to make it even more accessible,” Narayanan said. “We can use what we learn from these tests and previous prototypes of this tool to further optimize the tool for the visually impaired community.”

Other team members at Penn State include Mehrdad Mahdavi, Penn State Hertz Family Associate Professor of Computer Science and Engineering; and Fuli Qiao, a doctoral student in computer science. Other co-authors include independent researcher Nelson Daniel Troncoso Aldous; Laurent Itti, professor of computer science and psychology at the University of Southern California. and Yanpei Shi, a doctoral candidate in computer science at the University of Southern California.

This research was supported by the National Science Foundation.

sauce:

Reference magazines:

DOI: 10.1145/3663547.3759726



Source link