Smartphones, social media, texting – each of these gets its fair share of hate in the modern world for changing the ways that people socialize and interact with each other, but for people with ALS, and for others with debilitating diseases, these technologies are a means of maintaining interaction with a world that gradually becomes harder to participate in.
One of the ways that people with ALS can utilize technology to facilitate communication and the completion of tasks is through application of automatic speech recognition (ASR), most commonly used by way of virtual assistants (Echo, Alexa, Siri, etc.) that “hear” voice commands, interpret them correctly, and respond as needed. ASR technologies can allow people with ALS to communicate with those around them, connect with others with ALS, engage with the world at large, and even maintain a sense of normalcy and independence in a rapidly changing circumstance (1).
However, as ALS progresses, one’s ability to effectively use current ASR technologies decreases, impacting this method of communication. About 25% of people with ALS experience dysarthria, or slurred speech, as their first symptom and up to 95% of people with ALS will ultimately manifest symptoms of dysarthria (2). With current consumer-grade ASR technologies, word error rates (WER) and phoneme mistakes, mistakes in distinguishing one set of sounds from another, increase as dysarthria worsens. This is because most ASR systems are trained and programmed based on “typical” speech. They are often not able to recognize the speech of a person with ALS (3).
In 2018, the ALS Therapy Development Institute (ALS TDI) entered into a collaboration with Google to analyze vast quantities of data gathered with the help of people with ALS. ALS TDI had already begun gathering voice recordings from people with ALS as a way to monitor disease progression, especially in bulbar onset ALS. With these recordings, partners at Google have developed an ASR system that is better able to understand people with ALS-related speech differences.
A paper published earlier this year, titled “Personalizing ASR for Dysarthric and Accented Speech with Limited Data,” explains how data scientists at Google and ALS TDI were able to fine-tune standard ASR models. The base ASR model is formulated using thousands of hours of voice recordings from “typical” users (3). Developing an entirely new model would require a massive database of recordings from people with ALS; this strategy would be impractical given the differences in speech from person to person, the limited supply of voice recordings, and the amount of time it would take to grow the database.
Instead of developing an entirely new model, researchers took the original model and modified it by applying layers of recordings from individuals with ALS, treating ALS-affected speech as a sort of accent (3). In the published report, recordings from 17 speakers totaling 22.1 hours were used to adapt the model to ALS. This method, researchers found, could improve WER of ASR models by up to 70% for dysarthric speech (3). Most of this improvement came from the first 5-10 minutes of audio recordings, proving that significant change could come from a small number of recordings (3). A larger database of recordings, though, could improve these models even more. These findings are promising for the development of ASR technologies that can assist those that need it most.
Results from “Personalizing ASR for Dysarthric and Accented Speech with Limited Data” will be presented next week, September 15-19, 2019, at the 20th Annual Conference of the International Speech Communication Association Interspeech 2019 in Graz, Austria. Interspeech 2019 is the world’s largest and most comprehensive conference on the science and technology of spoken language processing. This year’s conference will explore language diversity, diverse applications of science and technology in the field, and diversity of representation (4). ALS TDI hopes that this conference will inspire new ideas and bring attention to the ways that technology can assist people with ALS, and others with nonstandard speech.
Data collection projects help direct the building of new technologies meant to aid people with ALS and help determine the focus of ALS drug development research. People with ALS can help in these projects by contributing voice data through Google’s Project Euphonia by filling out a form here. People with ALS can also help with other data collection projects by joining ALS TDI’s Precision Medicine Program (PMP), the most comprehensive and longest-running translational research study in ALS.
Science Sunday blogs aim to make ALS research and the work of the ALS Therapy Development Institute more accessible to people with ALS, families, friends, and health care providers. ALS TDI believes in open-source science accessible to all, with the goal of empowering the public with knowledge of ALS. Comments or feedback? Email email@example.com.
(1) Shor J, Emanuel D. Project Euphonia’s Personalized Speech Recognition for Non-Standard Speech. 2019. Google AI Blog. https://ai.googleblog.com/2019/08/project-euphonias-personalized-speech.html?fbclid=IwAR3H4QnBr8XX3nHZieLgSfk5cXmWSdRMee-_UZX-PCied8xR2ryF6spqW0s.
(2) Linse K, Aust E, Joos M, Harmann A. Communication Matters—Pitfalls and Promise of Hightech Communication Devices in Palliative Care of Severely Physically Disabled Patients With Amyotrophic Lateral Sclerosis. 2018. Front Neurol. https://www.frontiersin.org/articles/10.3389/fneur.2018.00603/full.
(3) Shor J, et al. Personalizing ASR for Dysarthric and Accented Speech with Limited Data. 2019. Cornell University. https://arxiv.org/pdf/1907.13511.pdf.