scispace - formally typeset
Search or ask a question

Showing papers on "Dysarthria published in 2022"


Journal ArticleDOI
TL;DR: In this paper , an ensemble of convolutional neural networks (CNNs) was used for the detection of Parkinson's disease from the voice recordings of 50 healthy people and 50 people with PD obtained from PC-GITA, a publicly available database.

20 citations


Journal ArticleDOI
TL;DR: Voice is abnormal in early-stage PD, progressively degrades in mid-advanced-stage and can be improved but not restored by L-Dopa, but L-dopa therapy improves but not restore voice in PD as shown by high accuracy in the comparison between patients OFF and ON therapy.
Abstract: Introduction Parkinson's disease (PD) is characterized by specific voice disorders collectively termed hypokinetic dysarthria. We here investigated voice changes by using machine learning algorithms, in a large cohort of patients with PD in different stages of the disease, OFF and ON therapy. Methods We investigated 115 patients affected by PD (mean age: 68.2 ± 9.2 years) and 108 age-matched healthy subjects (mean age: 60.2 ± 11.0 years). The PD cohort included 57 early-stage patients (Hoehn &Yahr ≤ 2) who never took L-Dopa for their disease at the time of the study, and 58 mid-advanced-stage patients (Hoehn &Yahr >2) who were chronically-treated with L-Dopa. We clinically evaluated voices using specific subitems of the Unified Parkinson's Disease Rating Scale and the Voice Handicap Index. Voice samples recorded through a high-definition audio recorder underwent machine learning analysis based on the support vector machine classifier. We also calculated the receiver operating characteristic curves to examine the diagnostic accuracy of the analysis and assessed possible clinical-instrumental correlations. Results Voice is abnormal in early-stage PD and as the disease progresses, voice increasingly degradres as demonstrated by high accuracy in the discrimination between healthy subjects and PD patients in the early-stage and mid-advanced-stage. Also, L-dopa therapy improves but not restore voice in PD as shown by high accuracy in the comparison between patients OFF and ON therapy. Finally, for the first time we achieved significant clinical-instrumental correlations by using a new score (LR value) calculated by machine learning. Conclusion Voice is abnormal in early-stage PD, progressively degrades in mid-advanced-stage and can be improved but not restored by L-Dopa. Lastly, machine learning allows tracking disease severity and quantifying the symptomatic effect of L-Dopa on voice parameters with previously unreported high accuracy, thus representing a potential new biomarker of PD.

15 citations


Proceedings ArticleDOI
27 Jan 2022
TL;DR: This paper aims to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR, and adds Dysarthria severity level and pause insertion mechanisms to other control parameters such as pitch, energy, and duration.
Abstract: Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. To have robust dysarthria-specific ASR, sufficient training speech is required, which is not readily available. Recent advances in Text-To-Speech (TTS) synthesis multi-speaker end-to-end systems suggest the possibility of using synthesis for data augmentation. In this paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR. In the synthesized speech, we add dysarthria severity level and pause insertion mechanisms to other control parameters such as pitch, energy, and duration. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Audio samples are available at https://mohammadelc.github.io/SpeechGroupUKY/

12 citations


Journal ArticleDOI
Changqin Quan1
TL;DR: In this paper , a deep learning model was proposed for detecting Parkinson's disease from speech signals. But the performance of the proposed model was verified on two databases, and the accuracy of up to 92% was obtained on the speech tasks, which included reading simple (/loslibros/) and complex (/viste/) sentences in Spanish.

10 citations


Journal ArticleDOI
TL;DR: In this paper , a comparative study on the classification of dysarthria severity levels using different deep learning techniques and acoustic features is presented, where the utility of low-dimensional feature representation using subspace modeling to give i-vectors, which are then classified using DNN models.
Abstract: Assessing the severity level of dysarthria can provide an insight into the patient's improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech recognition systems. In this article, we present a comparative study on the classification of dysarthria severity levels using different deep learning techniques and acoustic features. First, we evaluate the basic architectural choices such as deep neural network (DNN), convolutional neural network, gated recurrent units and long short-term memory network using the basic speech features, namely, Mel-frequency cepstral coefficients (MFCCs) and constant-Q cepstral coefficients. Next, speech-disorder specific features computed from prosody, articulation, phonation and glottal functioning are evaluated on DNN models. Finally, we explore the utility of low-dimensional feature representation using subspace modeling to give i-vectors, which are then classified using DNN models. Evaluation is done using the standard UA-Speech and TORGO databases. By giving an accuracy of 93.97% under the speaker-dependent scenario and 49.22% under the speaker-independent scenario for the UA-Speech database, the DNN classifier using MFCC-based i-vectors outperforms other systems.

10 citations


Journal ArticleDOI
TL;DR: Deep brain stimulation for PD is highly specialised; to enable adequate selection and follow-up of patients, DBS requires dedicated multidisciplinary teams of movement disorder neurologists, functional neurosurgeons, specialised DBS nurses and neuropsychologists.
Abstract: Parkinson's disease (PD) is a progressive neurodegenerative illness with both motor and nonmotor symptoms. Deep brain stimulation (DBS) is an established safe neurosurgical symptomatic therapy for eligible patients with advanced disease in whom medical treatment fails to provide adequate symptom control and good quality of life, or in whom dopaminergic medications induce severe side effects such as dyskinesias. DBS can be tailored to the patient's symptoms and targeted to various nodes along the basal ganglia–thalamus circuitry, which mediates the various symptoms of the illness; DBS in the thalamus is most efficient for tremors, and DBS in the pallidum most efficient for rigidity and dyskinesias, whereas DBS in the subthalamic nucleus (STN) can treat both tremors, akinesia, rigidity and dyskinesias, and allows for decrease in doses of medications even in patients with advanced stages of the disease, which makes it the preferred target for DBS. However, DBS in the STN assumes that the patient is not too old, with no cognitive decline or relevant depression, and does not exhibit severe and medically resistant axial symptoms such as balance and gait disturbances, and falls. Dysarthria is the most common side effect of DBS, regardless of the brain target. DBS has a long‐lasting effect on appendicular symptoms, but with progression of disease, nondopaminergic axial features become less responsive to DBS. DBS for PD is highly specialised; to enable adequate selection and follow‐up of patients, DBS requires dedicated multidisciplinary teams of movement disorder neurologists, functional neurosurgeons, specialised DBS nurses and neuropsychologists.

9 citations


Journal ArticleDOI
TL;DR: In this paper , a web-based survey of 359 pediatric speech-language pathologists was used to determine clinical confidence levels in diagnosing apraxia of speech and dysarthria in children.
Abstract: Purpose: While there has been mounting research centered on the diagnosis of childhood apraxia of speech (CAS), little has focused on differentiating CAS from pediatric dysarthria. Because CAS and dysarthria share overlapping speech symptoms and some children have both motor speech disorders, differential diagnosis can be challenging. There is a need for clinical tools that facilitate assessment of both CAS and dysarthria symptoms in children. The goals of this tutorial are to (a) determine confidence levels of clinicians in differentially diagnosing dysarthria and CAS and (b) provide a systematic procedure for differentiating CAS and pediatric dysarthria in children. Method: Evidence related to differential diagnosis of CAS and dysarthria is reviewed. Next, a web-based survey of 359 pediatric speech-language pathologists is used to determine clinical confidence levels in diagnosing CAS and dysarthria. Finally, a checklist of pediatric auditory–perceptual motor speech features is presented along with a procedure to identify CAS and dysarthria in children with suspected motor speech impairments. Case studies illustrate application of this protocol, and treatment implications for complex cases are discussed. Results: The majority (60%) of clinician respondents reported low or no confidence in diagnosing dysarthria in children, and 40% reported they tend not to make this diagnosis as a result. Going forward, clinicians can use the feature checklist and protocol in this tutorial to support the differential diagnosis of CAS and dysarthria in clinical practice. Conclusions: Incorporating this diagnostic protocol into clinical practice should help increase confidence and accuracy in diagnosing motor speech disorders in children. Future research should test the sensitivity and specificity of this protocol in a large sample of children with varying speech sound disorders. Graduate programs and continuing education trainings should provide opportunities to practice rating speech features for children with dysarthria and CAS. Supplemental Material: https://doi.org/10.23641/asha.19709146

9 citations


Journal ArticleDOI
TL;DR: In this paper , a taxonomy of developmental speech production disorders is presented, with particular emphasis on the motor speech disorders childhood apraxia of speech (a disorder of motor planning) and childhood dysarthria (a set of disorders of motor execution).
Abstract: Speech is the most common modality through which language is communicated, and delayed, disordered, or absent speech production is a hallmark of many neurodevelopmental and genetic disorders. Yet, speech is not often carefully phenotyped in neurodevelopmental disorders. In this paper, we argue that such deep phenotyping, defined as phenotyping that is specific to speech production and not conflated with language or cognitive ability, is vital if we are to understand how genetic variations affect the brain regions that are associated with spoken language. Speech is distinct from language, though the two are related behaviorally and share neural substrates. We present a brief taxonomy of developmental speech production disorders, with particular emphasis on the motor speech disorders childhood apraxia of speech (a disorder of motor planning) and childhood dysarthria (a set of disorders of motor execution). We review the history of discoveries concerning the KE family, in whom a hereditary form of communication impairment was identified as childhood apraxia of speech and linked to dysfunction in the FOXP2 gene. The story demonstrates how instrumental deep phenotyping of speech production was in this seminal discovery in the genetics of speech and language. There is considerable overlap between the neural substrates associated with speech production and with FOXP2 expression, suggesting that further genes associated with speech dysfunction will also be expressed in similar brain regions. We then show how a biologically accurate computational model of speech production, in combination with detailed information about speech production in children with developmental disorders, can generate testable hypotheses about the nature, genetics, and neurology of speech disorders.Though speech and language are distinct, specific types of developmental speech disorder are associated with far-reaching effects on verbal communication in children with neurodevelopmental disorders. Therefore, detailed speech phenotyping, in collaboration with experts on pediatric speech development and disorders, can lead us to a new generation of discoveries about how speech development is affected in genetic disorders.

9 citations


Journal ArticleDOI
TL;DR: This article used automated speech timing analysis to diagnose non-fluent/agrammatic variant primary progressive aphasia (nfvPPA) in patients with pathology associated with tauopathies.
Abstract: Motor speech function, including speech timing, is a key domain for diagnosing nonfluent/agrammatic variant primary progressive aphasia (nfvPPA). Yet, standard assessments use subjective, specialist-dependent evaluations, undermining reliability and scalability. Moreover, few studies have examined relevant anatomo-clinical alterations in patients with pathologically confirmed diagnoses. This study overcomes such caveats using automated speech timing analyses in a unique cohort of autopsy-proven cases.In a cross-sectional study, we administered an overt reading task and quantified articulation rate, mean syllable and pause duration, and syllable and pause duration variability. Neuroanatomical disruptions were assessed using cortical thickness and white matter (WM) atrophy analysis.We evaluated 22 persons with nfvPPA (mean age: 67.3 years; 13 female patients) and confirmed underlying 4-repeat tauopathy, 15 persons with semantic variant primary progressive aphasia (svPPA; mean age: 66.5 years; 8 female patients), and 10 healthy controls (HCs; 70 years; 5 female patients). All 5 speech timing measures revealed alterations in persons with nfvPPA relative to both the HC and svPPA groups, controlling for dementia severity. The articulation rate robustly discriminated individuals with nfvPPA from HCs (area under the ROC curve [AUC] = 0.95), outperforming specialist-dependent perceptual measures of dysarthria and apraxia of speech severity. Patients with nfvPPA exhibited structural abnormalities in left precentral and middle frontal as well as bilateral superior frontal regions, including their underlying WM. The articulation rate correlated with atrophy of the left pars opercularis and supplementary/presupplementary motor areas. Secondary analyses showed that, controlling for dementia severity, all measures yielded greater deficits in patients with nfvPPA and corticobasal degeneration (nfvPPA-CBD, n = 12) than in those with progressive supranuclear palsy pathology (nfvPPA-PSP, n = 10). The articulation rate robustly discriminated between individuals in each subgroup (AUC = 0.82). More widespread cortical thinning was observed for the nfvPPA-CBD than the nfvPPA-PSP group across frontal regions.Automated speech timing analyses can capture specific markers of nfvPPA while potentially discriminating between patients with different tauopathies. Thanks to its objectivity and scalability; this approach could support standard speech assessments.This study provides Class III evidence that automated speech analysis can accurately differentiate patients with nonfluent PPA from normal controls and patients with semantic variant PPA.

9 citations


Journal ArticleDOI
TL;DR: Although women showed a higher prevalence of some nonfocal symptoms, the prevalence of focal neurological symptoms, such as motor weakness and speech deficit, was similar for both sexes.
Abstract: Background: Early diagnosis through symptom recognition is vital in the management of acute stroke. However, women who experience stroke are more likely than men to be initially given a nonstroke diagnosis and it is unclear if potential sex differences in presenting symptoms increase the risk of delayed or missed stroke diagnosis. Aims: To quantify sex differences in the symptom presentation of stroke and assess whether these differences are associated with a delayed or missed diagnosis. Methods: PubMed, EMBASE, and the Cochrane Library were systematically searched up to January 2021. Studies were included if they reported presenting symptoms of adult women and men with diagnosed stroke (ischemic or hemorrhagic) or transient ischemic attack (TIA) and were published in English. Mean percentages with 95% confidence intervals (CIs) of each symptom were calculated for women and men. The crude relative risks (RRs) with 95% CI of symptoms being present in women, relative to men, were also calculated and pooled. Any data on the delayed or missed diagnosis of stroke for women compared to men based on symptom presentation were also extracted. Results: Pooled results from 21 eligible articles showed that women and men presented with a similar mean percentage of motor deficit (56% in women vs 56% in men) and speech deficit (41% in women vs 40% in men). Despite this, women more commonly presented with nonfocal symptoms than men: generalized nonspecific weakness (49% vs 36%), mental status change (31% vs 21%), and confusion (37% vs 28%), whereas men more commonly presented with ataxia (44% vs 30%) and dysarthria (32% vs 27%). Women also had a higher risk of presenting with some nonfocal symptoms: generalized weakness (RR 1.49, 95% CI 1.09–2.03), mental status change (RR 1.44, 95% CI 1.22–1.71), fatigue (RR 1.42, 95% CI 1.05–1.92), and loss of consciousness (RR 1.30, 95% CI 1.12–1.51). In contrast, women had a lower risk of presenting with dysarthria (RR 0.89, 95% CI 0.82–0.95), dizziness (RR 0.87, 95% CI 0.80–0.95), gait disturbance (RR 0.79, 95% CI 0.65–0.97), and imbalance (RR 0.68, 95% CI 0.57–0.81). Only one study linking symptoms to definite stroke/TIA diagnosis found that pain and unilateral sensory loss are associated with lower odds of a definite diagnosis in women compared to men. Conclusion: Although women showed a higher prevalence of some nonfocal symptoms, the prevalence of focal neurological symptoms, such as motor weakness and speech deficit, was similar for both sexes. Awareness of sex differences in symptoms in acute stroke evaluation, careful consideration of the full constellation of presenting symptoms, and further studies linking symptoms to diagnostic outcomes can be helpful in improving early diagnosis and management in both sexes.

8 citations


Journal ArticleDOI
TL;DR: This study investigated the clinical, genetic, and neuroimaging spectra of NEFL‐related CMT patients, and results could be helpful in the evaluation of novel NEFL variants and differential diagnosis against other CMT subtypes.
Abstract: Charcot–Marie–Tooth disease (CMT) is the most common hereditary peripheral neuropathy. Mutations in the neurofilament light polypeptide (NEFL) gene produce diverse clinical phenotypes, including demyelinating (CMT1F), axonal (CMT2E), and intermediate (CMTDIG) neuropathies. From 2005 to 2020, 1,143 Korean CMT families underwent gene sequencing, and we investigated the clinical, genetic, and neuroimaging spectra of NEFL‐related CMT patients. Ten NEFL mutations in 17 families (1.49%) were identified, of which three (p.L312P, p.Y443N, and p.K467N) were novel. Eight de novo cases were identified at a rate of 0.47 based on a cosegregation analysis. The age of onset was ≤3 years in five cases (13.5%). The patients revealed additional features including delayed walking, ataxia, dysphagia, dysarthria, dementia, ptosis, waddling gait, tremor, hearing loss, and abnormal visual evoked potential. Signs of ataxia were found in 26 patients (70.3%). In leg MRI analyses, various degrees of intramuscular fat infiltration were found. All compartments were evenly affected in CMT1F patients. The anterior and anterolateral compartments were affected in CMT2E, and the posterior compartment was affected in CMTDIG. Thus, NEFL‐related CMT patients showed phenotypic heterogeneities. This study's clinical, genetic, and neuroimaging results could be helpful in the evaluation of novel NEFL variants and differential diagnosis against other CMT subtypes.


Journal ArticleDOI
TL;DR: In this paper , the authors performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends, and found that free speech and read speech tasks are most commonly used across disorders.
Abstract: Quantifying neurological disorders from voice is a rapidly growing field of research and holds promise for unobtrusive and large-scale disorder monitoring. The data recording setup and data analysis pipelines are both crucial aspects to effectively obtain relevant information from participants. Therefore, we performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends. PRISMA-based literature searches were conducted through PubMed, Web of Science, and IEEE Xplore to identify publications in which original (i.e., newly recorded) datasets were collected. Disorders of interest were psychiatric as well as neurodegenerative disorders, such as bipolar disorder, depression, and stress, as well as amyotrophic lateral sclerosis amyotrophic lateral sclerosis, Alzheimer's, and Parkinson's disease, and speech impairments (aphasia, dysarthria, and dysphonia). Of the 43 retrieved studies, Parkinson's disease is represented most prominently with 19 discovered datasets. Free speech and read speech tasks are most commonly used across disorders. Besides popular feature extraction toolkits, many studies utilise custom-built feature sets. Correlations of acoustic features with psychiatric and neurodegenerative disorders are presented. In terms of analysis, statistical analysis for significance of individual features is commonly used, as well as predictive modeling approaches, especially with support vector machines and a small number of artificial neural networks. An emerging trend and recommendation for future studies is to collect data in everyday life to facilitate longitudinal data collection and to capture the behavior of participants more naturally. Another emerging trend is to record additional modalities to voice, which can potentially increase analytical performance.

Journal ArticleDOI
TL;DR: In this paper , a causal instantiation of the World Health Organization's International Classification of Functioning, Disability and Health (ICF) framework, linking acoustics, intelligibility, and communicative participation in the context of dysarthria, was proposed and tested.
Abstract: We proposed and tested a causal instantiation of the World Health Organization's International Classification of Functioning, Disability and Health (ICF) framework, linking acoustics, intelligibility, and communicative participation in the context of dysarthria.Speech samples and communicative participation scores were collected from individuals with dysarthria (n = 32). Speech was analyzed for two acoustic metrics (i.e., articulatory precision and speech rate), and an objective measure of intelligibility was generated from listener transcripts. Mediation analysis was used to evaluate pathways of effect between acoustics, intelligibility, and communicative participation.We observed a strong relationship between articulatory precision and intelligibility and a moderate relationship between intelligibility and communicative participation. Collectively, data supported a significant relationship between articulatory precision and communicative participation, which was almost entirely mediated through intelligibility. These relationships were not significant when speech rate was specified as the acoustic variable of interest.The statistical corroboration of our causal instantiation of the ICF framework with articulatory acoustics affords important support toward the development of a comprehensive causal framework to understand and, ultimately, address restricted communicative participation in dysarthria.

Proceedings ArticleDOI
13 Jan 2022
TL;DR: It is found that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task.
Abstract: In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretching is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.

Journal ArticleDOI
TL;DR: In this paper , a transfer learning based convolutional neural network model (TL-CNN) was used to detect speech dysarthria and achieved better accuracy when compared with other machine learning models.

Journal ArticleDOI
TL;DR: In this paper , a taxonomy of developmental speech production disorders is presented, with particular emphasis on the motor speech disorders childhood apraxia of speech (a disorder of motor planning) and childhood dysarthria (a set of disorders of motor execution).
Abstract: Speech is the most common modality through which language is communicated, and delayed, disordered, or absent speech production is a hallmark of many neurodevelopmental and genetic disorders. Yet, speech is not often carefully phenotyped in neurodevelopmental disorders. In this paper, we argue that such deep phenotyping, defined as phenotyping that is specific to speech production and not conflated with language or cognitive ability, is vital if we are to understand how genetic variations affect the brain regions that are associated with spoken language. Speech is distinct from language, though the two are related behaviorally and share neural substrates. We present a brief taxonomy of developmental speech production disorders, with particular emphasis on the motor speech disorders childhood apraxia of speech (a disorder of motor planning) and childhood dysarthria (a set of disorders of motor execution). We review the history of discoveries concerning the KE family, in whom a hereditary form of communication impairment was identified as childhood apraxia of speech and linked to dysfunction in the FOXP2 gene. The story demonstrates how instrumental deep phenotyping of speech production was in this seminal discovery in the genetics of speech and language. There is considerable overlap between the neural substrates associated with speech production and with FOXP2 expression, suggesting that further genes associated with speech dysfunction will also be expressed in similar brain regions. We then show how a biologically accurate computational model of speech production, in combination with detailed information about speech production in children with developmental disorders, can generate testable hypotheses about the nature, genetics, and neurology of speech disorders.Though speech and language are distinct, specific types of developmental speech disorder are associated with far-reaching effects on verbal communication in children with neurodevelopmental disorders. Therefore, detailed speech phenotyping, in collaboration with experts on pediatric speech development and disorders, can lead us to a new generation of discoveries about how speech development is affected in genetic disorders.

Journal ArticleDOI
TL;DR: In this paper , a recurrent encoder-decoder model based on deep learning methods was employed to reconstruct speech from EEG recordings with correlations up to 0.8, despite limited amounts of training data.
Abstract: Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from multiple brain regions, including subcortical regions. To evaluate whether sEEG can also be used to synthesize high-quality audio from neural recordings, we employ a recurrent encoder-decoder model based on modern deep learning methods. We find that speech can indeed be reconstructed with correlations up to 0.8 from these minimally invasive recordings, despite limited amounts of training data.

Proceedings ArticleDOI
23 May 2022
TL;DR: In this paper , the authors investigated the effectiveness of multi-modal acoustic modeling for dysarthric speech recognition using acoustic features along with articulatory information and found that fusing the acoustic and articulatory features at the empirically found optimal level of abstraction achieves a remarkable performance gain, leading to up to 4.6% absolute (9.6%) WER reduction for speakers with dysarthria.
Abstract: Building automatic speech recognition (ASR) systems for speakers with dysarthria is a very challenging task. Although multi-modal ASR has received increasing attention recently, incorporating real articulatory data with acoustic features has not been widely explored in the dysarthric speech community. This paper investigates the effectiveness of multi-modal acoustic modelling for dysarthric speech recognition using acoustic features along with articulatory information. The proposed multi-stream architectures consist of convolutional, recurrent and fully-connected layers allowing for bespoke per-stream pre-processing, fusion at the optimal level of abstraction and post-processing. We study the optimal fusion level/scheme as well as training dynamics in terms of cross-entropy and WER using the popular TORGO dysarthric speech database. Experimental results show that fusing the acoustic and articulatory features at the empirically found optimal level of abstraction achieves a remarkable performance gain, leading to up to 4.6% absolute (9.6% relative) WER reduction for speakers with dysarthria.

Journal ArticleDOI
03 Feb 2022-PLOS ONE
TL;DR: Impairments in guideline adherence in the treatment of aphasia, dysarthria and dysphagia after stroke indicate deficits in the implementation of guideline recommendations in stroke aftercare, and underscore the need for regular monitoring of implementation measures in strokes aftercare to address group-based disparities in care.
Abstract: Background Impairments to comprehension and production of speech (aphasia, dysarthria) and swallowing disorders (dysphagia) are common sequelae of stroke, reducing patients’ quality of life and social participation. Treatment oriented on evidence-based guidelines seems likely to improve outcomes. Currently, little is known about guideline adherence in stroke aftercare for the above-mentioned sequelae. This study aims to analyse guideline adherence in the treatment of aphasia, dysarthria and dysphagia after stroke, based on suitable test parameters, and to determine factors that influence the implementation of recommended therapies. Methods Six test parameters were defined, based on systematic study of guidelines for the treatment of speech impairments and swallowing disorders (e.g. comprehensive diagnostics, early initiation and continuity). Guideline adherence in treatment was tested using claims data from four statutory health insurance companies. Multivariate logistic and linear regression analyses were performed in order to test the outcomes. Results 4,486 stroke patients who were diagnosed with specific disorders or received speech therapy were included in the study. The median age was 78 years; the proportion of women was 55.9%. Within the first year after the stroke, 90.3% of patients were diagnosed with speech impairments and swallowing disorders. Overall, 44.1% of patients received outpatient speech and language therapy aftercare. Women were less frequently diagnosed with specific disorders (OR 0.70 [95%CI:0.55/0.88], p = 0.003) and less frequently received longer therapy sessions (OR 0.64 [95%CI:0.43/0.94], p = 0.022). Older age and longer hospitalization duration increased the likelihood of guideline recommendations being implemented and of earlier initiation of stroke aftercare measures. Conclusions Our observations indicate deficits in the implementation of guideline recommendations in stroke aftercare. At the same time, they underscore the need for regular monitoring of implementation measures in stroke aftercare to address group-based disparities in care.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an integrated CNN-GRU model with convolutional neural networks and gated recurrent units to detect dysarthria in Parkinson's disease, stroke, cerebral palsy, and other neurological symptoms.

Journal ArticleDOI
TL;DR: The effect of dopaminergic medication on speech has rarely been examined in early stage Parkinson's disease (PD) and the respective literature is inconclusive and limited by inappropriate design with lack of PD control group as mentioned in this paper .
Abstract: The effect of dopaminergic medication on speech has rarely been examined in early-stage Parkinson's disease (PD) and the respective literature is inconclusive and limited by inappropriate design with lack of PD control group. The study aims to examine the short-term effect of dopaminergic medication on speech in PD using patients with good motor responsiveness to levodopa challenge compared to a control group of PD patients with poor motor responsiveness. A total of 60 early-stage PD patients were investigated before (OFF) and after (ON) acute levodopa challenge and compared to 30 age-matched healthy controls. PD patients were categorised into two clinical subgroups (PD responders vs. PD nonresponders) according to the comparison of their motor performance based on movement disorder society-unified Parkinson's disease rating scale, part III. Seven distinctive parameters of hypokinetic dysarthria were examined using quantitative acoustic analysis. We observed increased monopitch (p > 0.01), aggravated monoloudness (p > 0.05) and longer duration of stop consonants (p > 0.05) in PD compared to healthy controls, confirming the presence of hypokinetic dysarthria in early PD. No speech alterations from OFF to ON state were revealed in any of the two PD groups and speech dimensions investigated including monopitch, monoloudness, imprecise consonants, harsh voice, slow sequential motion rates, articulation rate, or inappropriate silences, although a subgroup of PD responders manifested obvious improvement in motor function after levodopa intake (p > 0.001). Since the short-term usage of levodopa does not easily affect voice and speech performance in PD, speech assessment may provide a medication state-independent motor biomarker of PD.

Journal ArticleDOI
TL;DR: In this paper , the accuracy of speech-language pathologists in perceptually classifying apraxia of speech (AoS) and dysarthria is impacted by speech task, severity of MSD, and listener's expertise and which perceptual features they use to classify.
Abstract: The clinical diagnosis of motor speech disorders (MSDs) is mainly based on perceptual approaches. However, studies on perceptual classification of MSDs often indicate low classification accuracy. The aim of this study was to determine in a forced-choice dichotomous decision-making task (a) how accuracy of speech-language pathologists (SLPs) in perceptually classifying apraxia of speech (AoS) and dysarthria is impacted by speech task, severity of MSD, and listener's expertise and (b) which perceptual features they use to classify.Speech samples from 29 neurotypical speakers, 14 with hypokinetic dysarthria associated with Parkinson's disease (HD), 10 with poststroke AoS, and six with mixed dysarthria associated with amyotrophic lateral sclerosis (MD-FlSp [combining flaccid and spastic dysarthria]), were classified by 20 expert SLPs and 20 student SLPs. Speech samples were elicited in spontaneous speech, text reading, oral diadochokinetic (DDK) tasks, and a sample concatenating text reading and DDK. For each recorded speech sample, SLPs answered three dichotomic questions following a diagnostic approach, (a) neurotypical versus pathological speaker, (b) AoS versus dysarthria, and (c) MD-FlSp versus HD, and a multiple-choice question on the features their decision was based on.Overall classification accuracy was 72% with good interrater reliability, varying with SLP expertise, speech task, and MSD severity. Correct classification of speech samples was higher for speakers with dysarthria than for AoS and higher for HD than for MD-FlSp. Samples elicited with continuous speech reached the best classification rates. An average number of three perceptual features were used for correct classifications, and their type and combination differed between the three MSDs.The auditory-perceptual classification of MSDs in a diagnostic approach reaches substantial performance only in expert SLPs with continuous speech samples, albeit with lower accuracy for AoS. Specific training associated with objective classification tools seems necessary to improve recognition of neurotypical speech and distinction between AoS and dysarthria.

Journal ArticleDOI
TL;DR: The results generally support predictions that orofacial weakness accompanies flaccid and/or spastic dysarthria but not ataxic Dysarthria, and support including type of dysarthrias as a variable of interest when examining oroFacial weakness in motor speech disorders.
Abstract: This study compared orofacial muscle strength between normal and dysarthric speakers and across types of dysarthria, and examined correlations between strength and dysarthria severity. Participants included 79 speakers with flaccid, spastic, mixed spastic–flaccid, ataxic, or hypokinetic dysarthria and 33 healthy controls. Maximum pressure generation (Pmax) by the tongue, lips, and cheeks represented strength. Pmax was lower for speakers with mixed spastic–flaccid dysarthria for all tongue and lip measures, as well as for speakers with flaccid or spastic dysarthria for anterior tongue elevation and lip compression. Anterior tongue elevation and cheek compression tended to be lower than normal for the hypokinetic group. Pmax did not differ significantly between controls and speakers with ataxic dysarthria on any measure. Correlations were generally weak between dysarthria severity and orofacial weakness but were stronger in the dysarthria groups with more prominent orofacial weakness. The results generally support predictions that orofacial weakness accompanies flaccid and/or spastic dysarthria but not ataxic dysarthria. The findings support including type of dysarthria as a variable of interest when examining orofacial weakness in motor speech disorders.

Journal ArticleDOI
TL;DR: It is argued that communicative participation should be a primary focus of treatment planning and intervention to provide patient-centered, holistic, and value-based clinical interventions which are responsive to the needs of individuals living with dysarthria.
Abstract: Communicative participation is restricted in many conditions associated with dysarthria. This position paper defines and describes the construct of communicative participation. In it, the emergence of this construct is reviewed, along with the predictors of and variables associated with communicative participation in the dysarthrias. In doing so, the features that make communicative participation unique and distinct from other measures of dysarthria are highlighted, through emphasizing how communicative participation cannot be predicted solely from other components of the World Health Organization’s International Classification of Functioning, Disability and Health (ICF), including levels of impairment or activity limitations. Next, the empirical literature related to the measurement of communicative participation and how this research relates to dysarthria management is presented. Finally, the development of robust clinical measures of communicative participation and approaches to management is described from the point of view of the clinician. We argue that communicative participation should be a primary focus of treatment planning and intervention to provide patient-centered, holistic, and value-based clinical interventions which are responsive to the needs of individuals living with dysarthria.

Journal ArticleDOI
TL;DR: In this article , the authors investigated the relationship between speech and oral diadochokinetic (DDK) tasks in patients with motor speech disorders and found that the repetition of heterogeneous syllables shares more principles and properties with speech than the homogeneous DDK task.
Abstract: ABSTRACT Background Oral diadochokinetic (DDK) tasks are widely used in the assessment of patients with motor speech disorders (MSD). They require the repetition of homogeneous and of heterogeneous syllables at a maximum rate. The nature of DDK tasks is a matter of debate: although they are usually considered as “non-speech”, given the involvement of real speech syllables, they have also been recognized to share more properties and requirements with speech as compared to other oromotor behaviours (i.e., non-speech gestures). Additionally, the production of heterogeneous syllables has been suggested to exhibit closer correspondence with speech given the production of a cluster of different syllables, as compared to the repetition of a monosyllable. Each DDK task being differently related to speech supports the assumption of a continuum between speech and other oromotor processes, in which tasks with increasingly speech-like properties may overlap to a larger extent with speech. Aim To test the potentially differential relationship between speech and DDK tasks in patients with MSD. Traditional views of MSD claim that speech and other oromotor behaviours are impaired in patients with dysarthria, while patients with AoS present with speech being exclusively compromised. This suggests that DDK tasks should be differently affected in these two MSD populations. Moreover, in the framework of a continuum, oromotor tasks sharing some characteristics with speech (i.e., heterogeneous DDK task) should also be impaired in patients with AoS. Method Syllabic rates of a sentence production (SP) task are contrasted with those of homogeneous and heterogeneous DDK tasks in three pathological populations: patients with dysarthria due to Parkinson’s Disease (PD, n = 10) or due to amyotrophic lateral sclerosis (ALS, n = 10) and patients with apraxia of speech (AoS, n = 10); and 30 matched neurotypical controls. Results While patients with PD show similar rates across the SP and the DDK tasks relative to control speakers, patients with ALS display significantly slower rates across all tasks as compared to matched controls. Patients with AoS show reduced performances in the SP and in the heterogeneous DDK task, but not on the repetition of homogeneous syllables. Conclusions The findings of this study confirm that the repetition of heterogeneous syllables shares more principles and properties with speech than the homogeneous DDK task. This dissimilar relationship between speech and the two types of DDK tasks points to the existence of a continuum among oromotor tasks.

Journal ArticleDOI
TL;DR: Few individuals with DYRK1A syndrome use verbal speech as their sole means of communication, and hence, all individuals need early access to tailored, graphic AAC systems to support their communication.

Journal ArticleDOI
TL;DR: This study evaluated the efficacy of a home‐delivered, ataxia‐tailored biofeedback‐driven speech therapy in CAG‐SCA in 16 individuals with SCA1, 2, 3, or 6 by leveraging an intra‐individual control design.
Abstract: CAG repeat‐expansion spinocerebellar ataxias (CAG‐SCAs) are genetically defined multisystemic degenerative diseases, resulting in motor symptoms including dysarthria with a substantial impact on daily living. Whilst speech therapy is widely recommended in ataxia, very limited evidence exists for its use. We evaluated the efficacy of a home‐delivered, ataxia‐tailored biofeedback‐driven speech therapy in CAG‐SCA in 16 individuals with SCA1, 2, 3, or 6. Treatment was delivered intensively over 20 days. Efficacy was evaluated by blinded ratings of intelligibility (primary) and acoustic measures (secondary) leveraging an intra‐individual control design. Intelligibility improved post‐treatment (Z = −3.18, p = 0.004) whilst remaining stable prior to treatment (Z = 0.53, p = 1.00).

Proceedings ArticleDOI
18 Sep 2022
TL;DR: An end-to-end ASR using a Transformer acoustic model is used to evaluate the data augmentation scheme on speech from the UA dysarthric speech corpus and achieves an absolute improvement of 16% in word error rate (WER) over a baseline with no augmentation.
Abstract: Machine learning (ML) and Deep Neural Networks (DNN) have greatly aided the problem of Automatic Speech Recognition (ASR). However, accurate ASR for dysarthric speech remains a serious challenge. Dearth of usable data remains a problem in applying ML and DNN techniques for dysarthric speech recognition. In the current research, we address this challenge using a novel two-stage data augmentation scheme, a combination of static and dynamic data augmentation techniques that are designed by leveraging an understanding of the characteristics of dysarthric speech. Deep Autoencoder (DAE)-based healthy speech modification and various perturbations comprise static augmentations, whereas SpecAugment techniques modified to specifically augment dysarthric speech comprise the dynamic data augmentation. The objective of this work is to improve the ASR performance for dysarthric speech using the two-stage data augmentation scheme. An end-to-end ASR using a Transformer acoustic model is used to evaluate the data augmentation scheme on speech from the UA dysarthric speech corpus. We achieve an absolute improvement of 16% in word error rate (WER) over a baseline with no augmentation, with a final WER of 20.6% .

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors reported a novel splicing mutation in the SLC16A2 gene in an 18-month-old male patient with allan-herndon-dudley syndrome.
Abstract: Allan-Herndon-Dudley syndrome (AHDS) is an X-linked recessive neurodegenerative disorder caused by mutations in the SLC16A2 gene that encodes thyroid hormone transporter. AHDS has been rarely reported in China.This study reported a novel splicing mutation in the SLC16A2 gene in an 18-month-old male patient with AHDS. The patient was born to non-consanguineous, healthy parents of Chinese origin. He passed new-born screening for hypothyroidism, but failed to reach developmental milestones. He presented with hypotonia, severe mental retardation, dysarthria and ataxia. Genetic analysis identified a novel splicing mutation, NM_006517.4: c.431-2 A > G, in the SLC16A2 gene inherited from his mother. The patient received Triac treatment, (triiodothyroacetic acid), a thyroid hormone analogue for 3 months. Triac treatment effectively reduced serum TSH concentrations and normalized serum T3 concentrations in the patient.This study reported the first case of AHDS treated by Triac in China. And the study expanded the mutational spectrum of the SLC16A2 gene in AHDS patients.