Enhancing Voice Recognition for People with Speech Disabilities

Summary: A recent study found that automatic speech recognition ( ASR ) systems trained on speech from Parkinson’s disease can record similar speech patterns with 30 % more accuracy. Researchers collected around 151 days of tapes from individuals with varying degrees of paresis, a discourse disorder common in Parkinson’s patients, and used the data to teach ASR systems.

The study finds that people with speech disabilities can tremendously benefit from voice recognition technology by including unconventional speech samples. These findings may produce voice-controlled devices more available for those with neuromotor disorders.

Important Information:

  • Audio systems trained in Parkinson’s talk significantly improved transcription accuracy.
  • The research collected 151 time of tapes from people with dysphagia.
  • These results might improve mobility for people with speech impairments.

Origin: Beckman Institute

As&nbsp, Mark Hasegawa-Johnson&nbsp, combed through information from his latest task, he was wonderfully surprised to discover a recipe for Eggs Florentine. He claimed that sifting through hundreds of hours of recorded conversation will uncover a good deal of wealth.

Hasegawa-Johnson leads the&nbsp, Speech Accessibility Project, an effort at the&nbsp, University of Illinois Urbana-Champaign&nbsp, to make speech recognition products more beneficial for people with speech disability.

In the site’s second published study, researchers asked an involuntary talk recognizer to listen to 151 time — about six-and-a-half time — of tapes from people with speech disabilities related to Parkinson’s disease. Their model recorded a new set of recordings from previous recordings with 30 % more accuracy than a control unit that had not listened to Parkinson’s disease sufferers.

She claimed that the team developed material that would be useful to the life of the participants after consulting with experts in Parkinson’s disease and members of the community. Credit: Neuroscience News

This review appears in the&nbsp, Journal of Speech, Language, and Hearing Research. &nbsp, The conversation recordings used in the research are &nbsp, easily available&nbsp, to researchers, organizations and companies looking to improve their voice recognition products.

” Our results suggest that a large collection of unusual conversation can significantly improve talk systems for people with disabilities”, said Hasegawa-Johnson, a professor of electrical and computer engineering at Illinois and a researcher at the university ‘s&nbsp, Beckman Institute for Advanced Science and Technology, where the task is housed.

” I’m eager to see how other organizations use this data to improve the accessibility of voice recognition devices.”

Automatic speech recognition, which is used by people to queue up a playlist, dictate hands-free messages, seamlessly participate in virtual meetings, and communicate clearly with friends and family members, is used by machines like smartphones and virtual assistants to make meaning out of vocalizations.

Voice recognition technology does not work well for everyone, in particular, those with neuromotor disorders like Parkinson’s disease that can cause a range of strained, slurred or discoordinated speech patterns, collectively called dysarthria.

” Unfortunately, this means that many people who need voice-controlled devices the most may encounter the most difficulty in using them well”, Hasegawa-Johnson said.

” We know from existing research that if you train an ASR on someone’s voice, it will begin to understand them more accurately. We asked:” Can you train an automatic speech recognizer to understand people with dysarthria from Parkinson’s by exposing it to a small group of people with similar speech patterns”?

Hasegawa-Johnson and his colleagues recruited about 250 adults with varying degrees of dysarthria related to Parkinson’s disease. Prior to joining the study, prospective participants met with a speech-language pathologist who evaluated their eligibility.

” Many people who have struggled with a communication disorder for a long time, especially a progressive one, may withdraw from daily communication”, said&nbsp, Clarion Mendes, a speech-language pathologist on the team. They might be asked to share their individual opinions, needs, and ideas less and less frequently, believing that their communication is too heavily impacted to lead meaningful conversations.

” Those are the exact people we’re looking for,” she said.

Participants who chose to participate used their personal computers and smartphones to record their voicemails. They recited well-worn vocal commands like” Set an alarm, opined on open-ended prompts like” Please explain the steps to making breakfast for four people.” Working at their own pace and with optional assistance from a caregiver.

One participant responded to the latter by listing the steps for making eggs Florentine, including Hollandaise sauce, while another pleading with the order to order takeout.

We’ve heard from a lot of participants who have reported finding the participation process to be enjoyable as well as having the confidence to communicate with their families once more, Mendes said. Many of our participants and their loved ones have experienced the positive qualities of hope, excitement, and energy brought by this project.

She claimed that the team developed content that would be useful to the lives of the participants after consulting with experts in Parkinson’s disease and members of the community. Prompts were specific and spontaneous: training a speech algorithm to recognize medication names, for example, may help an end user communicate with their pharmacy, while casual conversation-starters mimic the cadence of daily chit-chat.

We tell participants:” We know that you can make your speech clearer by putting all of your effort into it, but you probably get tired of trying to make yourself understood for the benefit of others.” Try to relax and talk as though you’re chatting with your family on the couch, Mendes said.

The researchers divided the samples into three sets to test how well the speech algorithm listened and learned. The first set of 190 participants, or 151 recorded hours, trained the model.

As its performance improved, the researchers confirmed that the model was learning in earnest ( and not just memorizing participants ‘ responses ) by introducing it to a second, smaller set of recordings. The researchers challenged the model with the test set when it reached peak performance on the second set.

To verify the model’s performance, members of the research team manually transcribed an average of 400 recordings per participant.

The ASR system recorded recordings from the test set with a word error rate of 23.69 % after listening to the training set. For comparison, a system trained on speech samples from people without Parkinson’s disease transcribed the test set with a word error rate of 36.3 % — roughly 30 % less accurate.

Nearly everyone in the test set also experienced a decrease in error rates. Even speakers with less typical Parkinsonian speech, like unusually fast speech or stuttering, experienced modest improvements.

” I was excited to see such a dramatic benefit,” Hasegawa-Johnson said.

He added that participant feedback increases his enthusiasm:

” I spoke with a participant who was interested in the future of this technology, “he said”. That’s the wonderful thing about this project, seeing how enthusiastic people can be about the possibility that their smartphones and smart speakers will comprehend them. That’s really what we’re trying to do.”

Funding: Research described in this press release is supported by Amazon, Apple, Google, Meta and Microsoft, the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under award no. R13DC003383, and the National Science Foundation under award no. 1725729.

The authors are solely responsible for the content, which does not necessarily reflect the National Institutes of Health’s official position.

About the Speech Accessibility Project

The&nbsp, Speech Accessibility Project&nbsp, is a research initiative to make voice recognition technology more useful for people with a range of diverse speech patterns and disabilities.

The Beckman Institute for Advanced Science and Technology, a unit of the University of Illinois Urbana-Champaign, made its announcement in fall 2022. Currently, the project is recruiting English-speaking U. S. and Canadian adults who have Parkinson’s disease, Down syndrome, cerebral palsy, amyotrophic lateral sclerosis and those who have had a stroke.

The project has unprecedented cross-industry support from funders Amazon, Apple, Google, Meta and Microsoft, as well as nonprofit organizations whose communities will benefit from this accessibility initiative.

As of the end of June 2024, the project has shared 235, 000 speech samples with the five funding companies. &nbsp,

The Speech Accessibility Project is accepting applications.

Conduct research through the Speech Accessibility Project&nbsp,

The&nbsp, Speech Accessibility Project&nbsp, has released approximately 170 hours of speech recordings and annotations from 211 participants with Parkinson’s disease ( comprising the training and development datasets ).

The project is accepting proposals for researchers, businesses, and nonprofits that want to use the recordings and annotations to make technology accessible to everyone. &nbsp,

submit a proposal to carry out research as part of the project.

About this research on speech recognition and AI

Author: Jenna Kurtzweil
Source: Beckman Institute
Contact: Jenna Kurtzweil – Beckman Institute
Image: The image is credited to Neuroscience News

Original Research: Open access.
Community-Supported Shared Infrastructure in Support of Speech Accessibility” by Mark Hasegawa-Johnson et al. Journal of Speech, Language and Hearing Research


Abstract

Community-Supported Shared Infrastructure in Support of Speech Accessibility

Purpose:

The Speech Accessibility Project ( SAP ) has the intention of assisting researchers and engineers in developing machine learning applications for people with speech disabilities. This article aims to provide a brief overview of the project’s initial data package as a resource for researchers.

Method:

The project aims to facilitate ASR research by collecting, curating, and distributing transcribed U. S. English speech from people with speech and/or language disabilities. Participants record speech from their place of residence by connecting their personal computer, cell phone, and assistive devices, if needed, to the SAP web portal. All samples are manually transcribed, and 30 per participant are annotated using differential diagnostic pattern dimensions. The participants have been randomly assigned to a training set, a development set for controlled testing of a trained ASR, and a test set to evaluate ASR error rate for ASR experiments.

Results:

The SAP 2023-10-05 Data Package contains the speech of 211 people with dysarthria as a correlate of Parkinson’s disease, and the associated test set contains 42 additional speakers. A baseline ASR, with a word error rate of 3.4 % for typical speakers, transcribes test speech with a word error rate of 36.3 %. Fine-tuning reduces the word error rate to 23.7 %.

Conclusions:

Preliminary findings suggest that a large corpus of dysarthric and dysphonic speech has the potential to significantly enhance speech technology for people with disabilities. The SAP intends to significantly advance research into accessible speech technology by providing these data to researchers.

[ihc-register]