AI tools help visually impaired users “feel” the location of objects in real time

UNIVERSITY PARK, Pa. — Systems and applications that help visually impaired people navigate their environments have developed rapidly in recent years, but there is still room for growth, according to a team of Penn State researchers. The team recently developed a new tool that combines recommendations from the blind community and artificial intelligence (AI) to provide support specifically tailored to the needs of blind people.

The tool, known as NaviSense, is a smartphone application that can identify the item a user is looking for in real time based on voice prompts, using the phone’s integrated voice and vibration features to guide the user to objects in the environment. Test users reported an improved experience compared to existing visual aid options. The team presented the tool and won the Best Audience Choice Poster Award at the Association for Computing Machinery’s SIGACCESS ASSETS ’25 conference, held October 26-29 in Denver. Details of this tool were published in the meeting minutes.

According to Evan Pugh University Professor, A. Robert Knoll Professor of Electrical Engineering, and NaviSense team leader Vijaykrishnan Narayanan, many existing visual assistance programs connect users to in-person support teams, which can be inefficient or raise privacy concerns. While some programs offer automated services, Narayanan explained that there are obvious problems with these programs.

“Previously, an object’s model had to be preloaded into the service’s memory to be recognized,” Narayanan said. “This is highly inefficient and greatly reduces the flexibility users have when using these tools.”

To address this issue, the team implemented a large language model (LLM) and a vision language model (VLM). These are both types of AI that can process large amounts of data to answer queries. Narayanan said the app connects to external servers that host the LLM and VLM, allowing NaviSense to learn about its environment and recognize objects in that environment.

“VLM and LLM allow NaviSense to recognize objects in the environment in real time based on voice commands without the need to preload models of objects,” said Narayanan. “This is a huge milestone for this technology.”

Ajay Narayanan Sridhar, a doctoral student in computer engineering and NaviSense’s lead researcher, said the team conducted a series of interviews with people with visual impairments before development so they could specifically tailor the tool’s features to users’ needs.

“These interviews gave us a better understanding of the real challenges faced by visually impaired people,” Sridhar said.

NaviSense searches the environment for requested objects and specifically filters out objects that do not match the user’s verbal request. If you don’t understand what the user is looking for, ask additional questions to narrow down the search. Sridhar said this conversational feature provides convenience and flexibility that other tools don’t offer.

Source link