Mind-Reading Tech Could Bring 'Synthetic Speech' to Brain-Damaged Patients
Reading the brain waves that control a person's vocal tract might be the best way to help return a voice to people who've lost their ability to speak, a new study suggests.
A brain-machine interface creates natural-sounding synthetic speech by using brain activity to control a "virtual" vocal tract -- an anatomically detailed computer simulation that reflects the movements of the lips, jaw, tongue and larynx that occur as a person talks.
This interface created artificial speech that could be accurately understood up to 70% of the time, said study co-author Josh Chartier, a bioengineering graduate student at the University of California, San Francisco (UCSF) Weill Institute for Neuroscience.
The participants involved in this proof-of-concept study still had the ability to speak. They were five patients being treated at the UCSF Epilepsy Center who had electrodes temporarily implanted in their brains to map the source of their seizures, in preparation for neurosurgery.
But researchers believe the speech synthesizer ultimately could help people who've lost the ability to talk due to stroke, traumatic brain injury, cancer or neurodegenerative conditions like Parkinson's disease, multiple sclerosis or amyotrophic lateral sclerosis (Lou Gehrig's disease).
"We found that the neural code for vocal movements is partially shared across individuals," said senior researcher Dr. Edward Chang, a professor of neurosurgery at the UCSF School of Medicine. "An artificial vocal tract modeled on one person's voice can be adapted to synthesize speech from another person's brain activity," he explained.
"This means that a speech decoder that's trained in one person with intact speech could maybe someday act as a starting point for someone who has a speech disability, who could then learn to control the simulated vocal tract using their own brain activity," Chang said.
Reading brain waves to create 'synthetic' speech
Current speech synthesis technology requires people to spell out their thoughts letter-by-letter using devices that track very small eye or facial muscle movements, a laborious and error-prone method, the researchers said.
Directly reading brain activity could produce more accurate synthetic speech more quickly, but researchers have struggled to extract speech sounds from the brain, Chang noted.
So, Chang and his colleagues came up with a different approach -- creating speech by focusing on the signals that the brain sends out to control the various parts of the vocal tract.
For this study, the researchers asked the five epilepsy patients to read several hundred sentences aloud while readings were taken from a brain region in the frontal cortex known to be involved in language production, Chartier said.
Sample sentences included "Is this seesaw safe," "Bob bandaged both wounds with the skill of a doctor," "Those thieves stole thirty jewels," and "Get a calico cat to keep the rodents away."
According to co-researcher Gopala Anumanchipalli, "These are sentences that are particularly geared towards covering all of the articulatory phonetic contexts of the English language." Anumanchipalli is a UCSF School of Medicine speech scientist.
Audio recordings of the participants' voices were used to reverse-engineer the vocal tract movements required to make those sounds, including lip movements, vocal cord tightening, tongue manipulation and jaw movement.
Using that information, the investigators created a computer-based vocal tract for each participant that could be controlled by their brain activity. An algorithm transformed brain patterns produced during speech into movements of the "virtual" vocal tract, and then a synthesizer converted those movements into synthetic speech.
Listeners understood words up to 70% of the time
Researchers then tested whether the synthetic speech could be understood by asking hundreds of human listeners to write down what they thought they heard.
The transcribers more successfully understood the sentences when given shorter lists of words to choose from. For example, they accurately identified 69% of synthesized words from lists of 25 alternatives, including 43% of sentences transcribed with perfect accuracy.
But when given a list of 50 words to choose from, they accurately identified only 47% of words correctly and understood just 21% of synthesized sentences accurately.
Chartier pointed out that the researchers "still have a ways to go to perfectly mimic spoken language. We're quite good at synthesizing slower speech sounds like 'sh' and 'z' as well as maintaining the rhythms and intonations of speech and the speaker's gender and identity, but some of the more abrupt sounds like 'b's and 'p's get a bit fuzzy."
Still, he added, "The levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what's currently available."
The research team is working to extend these findings, using more advanced electrode arrays and computer algorithms to further improve brain-generated synthetic speech.
The next major test will be to determine whether someone who has lost the ability to speak could learn to use the system, even though it's been geared to work with another person who can still talk, the researchers said.
The new study was published April 24 in the journal Nature.
The U.S. National Institutes of Health has more about human speech.
SOURCES: Josh Chartier, bioengineering graduate student, University of California, San Francisco (UCSF) Weill Institute for Neuroscience; Edward Chang, M.D., professor, neurosurgery, UCSF School of Medicine; Gopala Anumanchipalli, Ph.D., speech scientist, UCSF School of Medicine; April 24, 2019, Nature