Artificial intelligence continues to make our lives easier. A groundbreaking medical technology transformed the brain signals of a severely paralyzed woman into speech and facial expressions. This enabled her to communicate through a digital avatar. This achievement gives hope to individuals who have lost the ability to communicate due to conditions such as stroke and ALS.
Previously, patients had to rely on slow speech synthesizers, often involving eye tracking or subtle facial movements, to form words. This made it difficult to carry out natural conversations. Below we would like to share with you a video posted on YouTube by UC San Francisco (UCSF) on the subject.
In the video, the new technology uses tiny electrodes on the surface of the brain to detect electrical activity in brain regions that control speech and facial expressions. These signals enable instant speech by a digital avatar. They are translated into corresponding facial emotions, allowing expressions such as smiles, frowns or surprise.
Ann, a 47-year-old patient, suffered from severe paralysis for over 18 years. Unfortunately, this left her unable to speak or write. Typically she communicated slowly at up to 14 words per minute using motion tracking technology. Ann now aspires to work as a consultant with the help of the latest avatar technology.
How does artificial intelligence enable a paralyzed person to speak?
The research team implanted 253 paper-thin electrodes on Ann’s brain surface, specifically in areas involved in speech. These electrodes captured brain signals that would control the tongue, jaw, larynx and facial muscles if she were not paralyzed.
Ann contracted the team to develop an artificial intelligence algorithm that would recognize brain signals for different speech sounds. The computer successfully learned 39 different sounds. A ChatGPT-style language model then translated these signals into coherent sentences. These sentences were used to control an avatar that resembled Ann’s pre-injury voice based on a recording of her wedding.
While not perfect, the word-solving error rate of 28 percent is still quite impressive. It yielded a brain-to-text ratio of 78 words per minute. That compares well with the typical 110-150 words spoken in natural speech. These advances point to practical benefits for patients.
Professor Edward Chang, who is leading the initiative at the University of California, San Francisco (UCSF), said: “Our goal is to recreate a complete and embodied form of communication that is really the most natural way to talk to others. “These developments bring us closer to making this a real solution for patients.”