Researchers at Cornell University used acoustic sensing and artificial intelligence to develop a silent speech recognition interface that continuously recognizes up to 31 unspoken commands based on lip and mouth movements.
A low-power wearable interface called EchoSpeech only needs a few minutes of user training data before it can recognize commands and execute them on a smartphone.
Ruidong Zhang, a PhD student in information science, is the lead author of EchoSpeech: Continuous Silent Speech Recognition in Minimally Unobtrusive Eyewear Using Acoustic Sensing and Human Factors in Computing Systems. The Computing Machinery Conference (CHI) will be held this month in Hamburg, Germany.
“For people who can’t produce sounds, this silent-speech technology could be a great input to a speech synthesizer. It could bring the voice back to the patient,” Zhang said. It talks about potential applications with further development.
EchoSpeech, in its current form, can be used to communicate with other users via smartphones in places where conversation is inconvenient or inappropriate, such as noisy restaurants or quiet libraries. Silent speech interfaces can also be used with design software such as CAD in combination with a stylus. No keyboard and mouse required.
Equipped with a microphone and speaker pair smaller than a pencil eraser, the EchoSpeech glasses become a wearable, AI-powered sonar system that sends and receives sound waves across the face and senses mouth movements. A deep learning algorithm then analyzes these echo profiles in real time. Accuracy is about 95%.
Cheng Zhang, Assistant Professor of Information Science and Director of Cornell University’s Smart Computer Interfaces for Future Interactions (SciFi) Lab, said:
“We are very excited about this system,” he said. real world. ”
Most techniques for silent speech recognition are limited to a pre-determined selection set of commands and require the user to face or wear the camera, which is neither practical nor feasible. No, says Cheng Zhang. He also said his wearable camera poses significant privacy concerns for both users and those with whom they interact.
Acoustic sensing technologies such as EchoSpeech eliminate the need for wearable video cameras. And because audio data is much smaller than image and video data, it requires less bandwidth to process and can be relayed to smartphones in real time via Bluetooth, says computer science professor François Guimbretière. .
“And because the data is processed locally on your smartphone rather than being uploaded to the cloud, privacy-sensitive information is never under your control,” he said.