UC Berkeley and UCSF decode brain activity into speech using AI

Researchers from the University of California, Berkeley and University of California, San Francisco have developed a brain computer interface that allows you to decipher voice-related brain activity into near-real-time audio.

The latest developments have significantly reduced processing delays seen in previous systems.

The team used an AI-based approach to achieve voice decoding at speeds comparable to commercial voice assistants. The system samples neural activity from the motor cortex, which is part of the brain that controls vocal movement, and uses artificial intelligence to interpret those signals as spoken language.

Streaming algorithm adapted from commercial speech technology

“Our streaming approach brings the same rapid speech decoding ability to the neuroproducts of devices such as Alexa and Siri,” says Gopala Anumanchipalli, assistant professor of electrical engineering and computer science at UCSF's Edward Chang. “We found that using a similar type of algorithm, we can decode neural data and enable audio streaming that is close to sync.”

Unlike previous neural suppression methods that produce slow or delayed speech synthesis, the system applies streaming algorithms that can decode and generate speech almost instantaneously.

Training AI without audio data

To train AI, the research team collaborated with Ann, a participant with full vocal turtle paralysis. She was unable to produce audible speeches, so the system relied on her quiet attempts to respond to on-screen prompts and speak. These trial responses triggered measurable patterns in the motor cortex of her brain.

“This gave me a mapping between the window of the mass of neural activity she produces and the target sentence she was trying to say without having to utter it at any point,” says Dr. Kaylo Littlejohn. Research students and co-authors.

With no direct audio reference to train AI output, the team used synthetic voice tools to simulate what ANN was trying to say.

“We used a prerequisite text-to-speech model to generate audio and simulate the target,” says Ph.D. Student Cheol Jun Cho. “And we also used Anne's pre-judgement voice, so when I decode the output it sounds like her.”

This approach allows the system to preserve the natural voice aspects of the speaker, while compensating for the lack of vocal output. Researchers say the technique could serve as the basis for future communication tools designed for those who have lost the ability to speak.

Source link