scispace - formally typeset
Search or ask a question

Showing papers on "Dysarthria published in 2019"


Journal ArticleDOI
TL;DR: The ability to detect the CCAS in real time in clinical neurology with a brief and validated scale should make it possible to develop a deeper understanding of the clinical consequences of cerebellar lesions in a wide range of neurological and neuropsychiatric disorders with a link to the cerebellum.

1,002 citations


Proceedings ArticleDOI
15 Sep 2019
TL;DR: This paper trains personalized models that achieve 62% and 35% relative WER improvement on these two groups of non-standard speech: speech from people with amyotrophic lateral sclerosis and accented speech, and shows that 71% of the improvement comes from only 5 minutes of training data.
Abstract: Automatic speech recognition (ASR) systems have dramatically improved over the last few years. ASR systems are most often trained from 'typical' speech, which means that underrepresented groups don't experience the same level of improvement. In this paper, we present and evaluate finetuning techniques to improve ASR for users with non-standard speech. We focus on two types of non-standard speech: speech from people with amyotrophic lateral sclerosis (ALS) and accented speech. We train personalized models that achieve 62% and 35% relative WER improvement on these two groups, bringing the absolute WER for ALS speakers, on a test set of message bank phrases, down to 10% for mild dysarthria and 20% for more serious dysarthria. We show that 71% of the improvement comes from only 5 minutes of training data. Finetuning a particular subset of layers (with many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state of the art ASR models for dysarthric speech.

76 citations


Journal ArticleDOI
TL;DR: It is suggested that diagnostic classification information from standardized motor speech assessment protocols can contribute to research in the pathobiologies of CND.
Abstract: Estimates of the prevalence of speech and motor speech disorders in persons with complex neurodevelopmental disorders (CND) can inform research in the biobehavioural origins and treatment of CND. The goal of this research was to use measures and analytics in a diagnostic classification system to estimate the prevalence of speech and motor speech disorders in convenience samples of speakers with one of eight types of CND. Audio-recorded conversational speech samples from 346 participants with one of eight types of CND were obtained from a database of participants recruited for genetic and behavioural studies of speech sound disorders (i.e., excluding dysfluency) during the past three decades. Data reduction methods for the speech samples included narrow phonetic transcription, prosody-voice coding, and acoustic analyses. Standardized measures were used to cross-classify participants' speech and motor speech status. Compared to the 17.8% prevalence of four types of motor speech disorders reported in a study of 415 participants with idiopathic Speech Delay (SD), 47.7% of the present participants with CND met criteria for one of four motor speech disorders, including Speech Motor Delay (25.1%), Childhood Dysarthria (13.3%), Childhood Apraxia of Speech (4.3%), and concurrent Childhood Dysarthria and Childhood Apraxia of Speech (4.9%). Findings are interpreted to indicate a substantial prevalence of speech disorders, and notably, a substantial prevalence of motor speech disorders in persons with some types of CND. We suggest that diagnostic classification information from standardized motor speech assessment protocols can contribute to research in the pathobiologies of CND. Abbreviations: 16p: 16p11.2 deletion and duplication syndrome; 22q: 22q11.2 deletion syndrome; ASD: Autism Spectrum Disorder; CAS: Childhood Apraxia of Speech; CD: Childhood Dysarthria; CND: Complex Neurodevelopmental Disorder; DS: Down syndrome; FXS: Fragile X syndrome; GAL: Galactosemia; IID: Idiopathic Intellectual Disability; MSD: Motor Speech Disorder; No MSD: No Motor Speech Disorder; NSA: Normal(ized) Speech Acquisition; PEPPER: Programs to Examine Phonetic and Phonologic Evaluation Records; PSD: Persistent Speech Delay; PSE: Persistent Speech Errors; SD: Speech Delay; SDCS: Speech Disorders Classification System; SDCSS: Speech Disorders Classification System Summary; SE: Speech Errors; SMD: Speech Motor Delay; SSD: Speech Sound Disorders; TBI: Traumatic Brain Injury.

46 citations


Journal ArticleDOI
TL;DR: Initial estimates of the prevalence of each of four types of motor speech disorders in children with idiopathic Speech Delay (SD) were obtained and findings are interpreted to indicate a substantial prevalence of motor Speech Disorder in children in the USA.
Abstract: The goal of this research was to obtain initial estimates of the prevalence of each of four types of motor speech disorders in children with idiopathic Speech Delay (SD) and to use findings...

45 citations


Proceedings ArticleDOI
12 May 2019
TL;DR: An approach that non-linearly modifies speech tempo to reduce mismatch between typical and atypical speech is explored, resulting in a nearly 7% absolute improvement in comparison to baseline speaker-dependent trained system evaluated using UASpeech corpus.
Abstract: Improving the accuracy of personalised speech recognition for speakers with dysarthria is a challenging research field. In this paper, we explore an approach that non-linearly modifies speech tempo to reduce mismatch between typical and atypical speech. Speech tempo analysis at the phonetic level is accomplished using a forced-alignment process from traditional GMM-HMM in automatic speech recognition (ASR). Estimated tempo adjustments are applied directly to the acoustic features rather than to the time-domain signals. Two approaches are considered: i) adjusting dysarthric speech towards typical speech for input into ASR systems trained with typical speech, and ii) adjusting typical speech towards dysarthric speech for data augmentation in personalised dysarthric ASR training. Experimental results show that the latter strategy with data augmentation is more effective, resulting in a nearly 7% absolute improvement in comparison to baseline speaker-dependent trained system evaluated using UASpeech corpus. Consistent recognition performance improvements are observed across speakers, with greatest benefit in cases of moderate and severe dysarthria.

42 citations


Journal ArticleDOI
TL;DR: The results showed that the glottal features in combination with the openSMILE-based acoustic features resulted in improved classification accuracies, which validate the complementary nature of glattal features.

31 citations


Journal ArticleDOI
TL;DR: The findings indicate subtle changes in speech appear prior to a clinical diagnosis of Huntington's disease, however, distinct patterns of decline and the magnitude of these deficits require further investigation.

30 citations


Journal ArticleDOI
TL;DR: Assessment of support for motor speech disorders in adolescents with DS as explanatory constructs for their reduced intelligibility found low intelligibility was significantly associated with across-the-board reductions in phonemic and phonetic accuracy and with inappropriate prosody and voice.
Abstract: The goal of this research was to assess the support for motor speech disorders as explanatory constructs to guide research and treatment of reduced intelligibility in persons with Down syndrome (DS...

28 citations


Journal ArticleDOI
TL;DR: Comparing the accuracy of the dysarthric speech recognition as achieved by three speech recognition cloud platforms, namely IBM Watson Speech-to-Text, Google Cloud Speech, and Microsoft Azure Bing Speech, suggests that the three platforms have comparable performance in recognizing dysarthria and that the accuracy is related to the speech intelligibility of the person.
Abstract: The spread of voice-driven devices has a positive impact for people with disabilities in smart environments, since such devices allow them to perform a series of daily activities that were difficult or impossible before. As a result, their quality of life and autonomy increase. However, the speech recognition technology employed in such devices becomes limited with people having communication disorders, like dysarthria. People with dysarthria may be unable to control their smart environments, at least with the needed proficiency; this problem may negatively affect the perceived reliability of the entire environment. By exploiting the TORGO database of speech samples pronounced by people with dysarthria, this paper compares the accuracy of the dysarthric speech recognition as achieved by three speech recognition cloud platforms, namely IBM Watson Speech-to-Text, Google Cloud Speech, and Microsoft Azure Bing Speech. Such services, indeed, are used in many virtual assistants deployed in smart environments, such as Google Home. The goal is to investigate whether such cloud platforms are usable to recognize dysarthric speech, and to understand which of them is the most suitable for people with dysarthria. Results suggest that the three platforms have comparable performance in recognizing dysarthric speech and that the accuracy of the recognition is related to the speech intelligibility of the person. Overall, the platforms are limited when the dysarthric speech intelligibility is low (80–90% of word error rate), while they improve up to reach a word error rate of 15–25% for people without abnormality in their speech intelligibility.

28 citations


Journal ArticleDOI
TL;DR: Initial estimates of the prevalence of types of speech disorders and motor speech disorders in adolescents with Down syndrome are provided to guide the selection and sequencing of treatment targets for persons with DS.
Abstract: Although there is substantial rationale for a motor component in the speech of persons with Down syndrome (DS), there presently are no published estimates of the prevalence of subtypes of motor spe...

26 citations


Journal ArticleDOI
TL;DR: Cross-validation of the present findings would support viewing MSDs as a core phenotypic feature of 22q, as well as estimates in speakers with other complex neurodevelopmental disorders: Down syndrome, fragile X syndrome, and galactosemia.
Abstract: Purpose Speech sound disorders and velopharyngeal dysfunction are frequent features of 22q11.2 deletion syndrome (22q). We report the first estimate of the prevalence of motor speech disorders (MSD...

Journal ArticleDOI
TL;DR: The results suggest that the applied methodologies are efficient and useful in characterizing the different behavior of vocal signal in healthy and pathological subjects and could be a valid support for physicians in disease evaluation and progression monitoring.

Journal ArticleDOI
TL;DR: Investigation of the effect of sentence length on intelligibility and measures of speech motor performance in persons with amyotrophic lateral sclerosis suggests that producing shorter sentences may help maximize intelligibility for speakers with moderate-to-severe dysarthria secondary to ALS and may be a beneficial compensatory strategy for preserving motor effort.
Abstract: Purpose The purpose of this study was to investigate the effect of sentence length on intelligibility and measures of speech motor performance in persons with amyotrophic lateral sclerosis (ALS) an...

Journal ArticleDOI
TL;DR: Objective speech assessment may provide an inexpensive and widely applicable screening instrument for differentiation of MSA and PD from controls and among subtypes of M SA.
Abstract: Although motor speech disorders represent an early and prominent clinical feature of multiple system atrophy (MSA), the potential usefulness of speech assessment as a diagnostic tool has not yet been explored. This cross-sectional study aimed to provide a comprehensive, objective description of motor speech function in the parkinsonian (MSA-P) and cerebellar (MSA-C) variants of MSA. Speech samples were acquired from 80 participants including 18 MSA-P, 22 MSA-C, 20 Parkinson's disease (PD), and 20 healthy controls. The accurate differential diagnosis of dysarthria subtypes was based on quantitative acoustic analysis of 14 speech dimensions. A mixed type of dysarthria involving hypokinetic, ataxic and spastic components was found in the majority of MSA patients independent of phenotype. MSA-P showed significantly greater speech impairment than PD, and predominantly exhibited harsh voice, imprecise consonants, articulatory decay, monopitch, excess pitch fluctuation and pitch breaks. MSA-C was dominated by prolonged phonemes, audible inspirations and voice stoppages. Inappropriate silences, irregular motion rates and overall slowness of speech were present in both MSA phenotypes. Speech features allowed discrimination between MSA-P and PD as well as between both MSA phenotypes with an area under curve up to 0.86. Hypokinetic, ataxic and spastic dysarthria components in MSA were correlated to the clinical evaluation of rigidity, cerebellar and bulbar/pseudobulbar manifestations, respectively. Distinctive speech alterations reflect underlying pathophysiology in MSA. Objective speech assessment may provide an inexpensive and widely applicable screening instrument for differentiation of MSA and PD from controls and among subtypes of MSA.

Journal ArticleDOI
TL;DR: Reference values of four maximum performance tests are presented for comparing the performance of dysarthric patients with non-pathological performance, with age identified as most important factor influencing maximum speech performance.
Abstract: Purpose: Maximum performance tests examine upper limits of speech motor performance, as used by speech-language pathologists in dysarthria assessment protocols. The Radboud Dysarthria Assessment includes maximum repetition rate, maximum phonation time, fundamental frequency range and maximum phonation volume to assist in detecting pathological performance. This study aims to obtain reference values for each of these tests.Method: A group of 224 healthy Dutch adults aged 18–80 years performed the maximum performance tests. Age, sex, body height, smoking habit, and profession were registered. Using multivariable linear regression, a wide range of models was tested to examine the relationship between these person characteristics and speech performance. The likelihood ratio was used to test the goodness of fit to the data.Result: Above 60 years of age, maximum repetition rate, fundamental frequency range and maximum phonation volume were all negatively affected by age. Below 60 years, only women showe...

Proceedings ArticleDOI
12 May 2019
TL;DR: Experimental results show the merit of the proposed approach of using multiple databases for speech recognition, and an end-to-end ASR framework trained by not only the speech data of a Japanese person with an articulation disorder but also the speechData of a physically unimpaired Japanese person and a non-Japanese person withAn articulation Disorder to relieve the lack of training data of an target speaker.
Abstract: We present in this paper an end-to-end automatic speech recognition (ASR) system for a person with an articulation disorder resulting from athetoid cerebral palsy. In the case of a person with this type of articulation disorder, the speech style is quite different from that of a physically unimpaired person, and the amount of their speech data available to train the model is limited because their burden is large due to strain on the speech muscles. Therefore, the performance of ASR systems for people with an articulation disorder degrades significantly. In this paper, we propose an end-to-end ASR framework trained by not only the speech data of a Japanese person with an articulation disorder but also the speech data of a physically unimpaired Japanese person and a non-Japanese person with an articulation disorder to relieve the lack of training data of a target speaker. An end-to-end ASR model encapsulates an acoustic and language model jointly. In our proposed model, an acoustic model portion is shared between persons with dysarthria, and a language model portion is assigned to each language regardless of dysarthria. Experimental results show the merit of our proposed approach of using multiple databases for speech recognition.

Proceedings ArticleDOI
12 May 2019
TL;DR: The authors proposed a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture for dysarthria detection from sentence-level audio recordings.
Abstract: Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-level features, by selecting the relevant information for the task at hand. We explore an alternative to this selection, by learning jointly the classifier, and the feature extraction. Recent work on speech recognition has shown improved performance over speech features by learning from the waveform. We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture. We apply this model to dysarthria detection from sentence-level audio recordings. Starting from a strong attention-based baseline on which mel-filterbanks outperform standard low-level descriptors, we show that learning the filters or the normalization and compression improves over fixed features by 10% absolute accuracy. We also observe a gain over OpenSmile features by learning jointly the feature extraction, the normalization, and the compression factor with the architecture. This constitutes a first attempt at learning jointly all these operations from raw audio for a speech classification task.

Journal ArticleDOI
TL;DR: Language, literacy and social-pragmatic deficits are common in males with Klinefelter syndrome, and data suggested a trend for more notable deficits with age and increasing academic and social demands.

Journal ArticleDOI
TL;DR: This review describes the endophenotype of speech, voice, cognition and language modalities in PD and investigates the speech as a ‘proxy marker’ of PD disease state to provide early identification of PD and objective monitoring of disease progression.
Abstract: Introduction: Idiopathic Parkinson's Disease (PD) results in a range of motor and non-motor impairments. Clinical diagnosis commonly occurs after substantial neurophysiological damage limiting the opportunity for neuroprotective treatments. Uncovering sensitive objective markers with the capacity to detect pre-symptomatic disease and track disease progression is therefore a priority. Speech may provide an ideal proxy marker for PD; a quantifiable biometric that displays salient changes in early disease and appears to evolve with disease progression.Areas covered: This review describes the endophenotype of speech, voice, cognition and language modalities in PD and investigates the speech as a 'proxy marker' of PD disease state.Expert opinion: Detailed characterization at different disease stages are needed and must incorporate longitudinal assessment to capture small but significant changes in speech, voice, cognition and language modalities within patient changes over time. Advances in technology are leading to new opportunities for acquiring data remotely and more frequently, offering more ecologically valid testing environments. Combined with automated signal processing and analysis, symptoms may also be tracked in-home readily. Features extracted may provide a 'proxy marker' for early identification of PD and objective monitoring of disease progression.

Journal ArticleDOI
TL;DR: This prospective study prospectively examined the incidence and co-occurrence of dysphagia, dysarthria, and aphasia following a 1st occurrence of ischemic stroke at an academic medical center hospital.
Abstract: Purpose The high incidence of swallowing and communication disorders following stroke is well documented. However, many of these studies have used retrospective chart reviews to make estimates of incidence and co-occurrence. The current study prospectively examined the incidence and co-occurrence of dysphagia, dysarthria, and aphasia following a 1st occurrence of ischemic stroke at an academic medical center hospital. Method One hundred patients who experienced their 1st ischemic stroke were recruited for participation in this study. All participants received a clinical swallowing evaluation to assess for dysphagia, administration of the Frenchay Dysarthria Assessment-Second Edition ( Enderby & Palmer, 2008 ) and Western Aphasia Battery-Revised ( Kertesz, 2006 ) to screen for the presence of dysarthria and aphasia, respectively. Results Incidence rates of dysphagia, dysarthria, and aphasia were 32%, 26%, and 16%, respectively. Forty-seven percent of participants had at least 1 of these disorders, 28% had 2 of these disorders, and 4% had all 3. Although the incidence rates in this study were smaller in magnitude than incidence rates in previous research, the pattern of results is broadly similar (i.e., dysphagia had the highest incidence rate, followed by dysarthria and, lastly, aphasia). Conclusions This prospective study yielded slightly lower incidence rates than have been previously obtained from retrospective chart reviews. The high incidence and co-occurrence of devastating swallowing and communication disorders post-ischemic stroke provides clear motivation for speech-language pathology involvement in the early phase of stroke rehabilitation.

Journal ArticleDOI
TL;DR: A description of speech articulation dynamics as a probability density function of the kinematic features derived from the evolution of formants in the time domain is given.

Proceedings ArticleDOI
10 Jul 2019
TL;DR: This paper proposed a multi-task supervised approach for predicting both the probability of dysarthric speech and the mel-spectrogram, which helps improve the detection and reconstruction of speech with higher accuracy, thanks to a low-dimensional latent space of the auto-encoder.
Abstract: This paper proposed a novel approach for the detection and reconstruction of dysarthric speech. The encoder-decoder model factorizes speech into a low-dimensional latent space and encoding of the input text. We showed that the latent space conveys interpretable characteristics of dysarthria, such as intelligibility and fluency of speech. MUSHRA perceptual test demonstrated that the adaptation of the latent space let the model generate speech of improved fluency. The multi-task supervised approach for predicting both the probability of dysarthric speech and the mel-spectrogram helps improve the detection of dysarthria with higher accuracy. This is thanks to a low-dimensional latent space of the auto-encoder as opposed to directly predicting dysarthria from a highly dimensional mel-spectrogram.

Journal ArticleDOI
TL;DR: Speech intelligibility and naturalness improved post treatment and piloting evidence that ataxia-tailored speech treatment might be effective in degenerative cerebellar disease is provided.
Abstract: We aimed to provide proof-of-principle evidence that intensive home-based speech treatment can improve dysarthria in complex multisystemic degenerative ataxias, exemplified by autosomal recessive spastic ataxia Charlevoix-Saguenay (ARSACS). Feasibility and piloting efficacy of speech training specifically tailored to cerebellar dysarthria was examined through a 4-week program in seven patients with rater-blinded assessment of intelligibility (primary outcome) and naturalness and acoustic measures of speech (secondary outcomes) performed 4 weeks before, immediately prior to, and directly after training (intraindividual control design). Speech intelligibility and naturalness improved post treatment. This provides piloting evidence that ataxia-tailored speech treatment might be effective in degenerative cerebellar disease.

Journal ArticleDOI
TL;DR: This study supports the existence of distinct presentations of dysarthria in patients with HD, which may be due to divergent pathologic processes.
Abstract: Objective Dysarthric speech of persons with Huntington disease (HD) is typically described as hyperkinetic; however, studies suggest that dysarthria can vary and resemble patterns in other neurologic conditions. To test the hypothesis that distinct motor speech subgroups can be identified within a larger cohort of patients with HD, we performed a cluster analysis on speech perceptual characteristics of patient audio recordings. Methods Audio recordings of 48 patients with mild to moderate dysarthria due to HD were presented to 6 trained raters. Raters provided scores for various speech features (e.g., voice, articulation, prosody) of audio recordings using the classic Mayo Clinic dysarthria rating scale. Scores were submitted to an unsupervised k-means cluster analysis to determine the most salient speech features of subgroups based on motor speech patterns. Results Four unique subgroups emerged from the cohort of patients with HD. Subgroup 1 was characterized by an abnormally fast speaking rate among other unique speech features, whereas subgroups 2 and 3 were defined by an abnormally slow speaking rate. Salient speech features for subgroup 2 overlapped with subgroup 3; however, the severity of dysarthria differed. Subgroup 4 was characterized by mild deviations of speech features with typical speech rate. Length of CAG repeats, Unified Huntington’s Disease Rating Scale total motor score, and percent intelligibility were significantly different for pairwise comparisons of subgroups. Conclusion This study supports the existence of distinct presentations of dysarthria in patients with HD, which may be due to divergent pathologic processes. The findings are discussed in relation to previous literature and clinical implications.

Journal ArticleDOI
TL;DR: A review of 112 FAS cases published between 1907 and October 2016 concluded that FAS should be regarded a dual component motor speech disorder in which both planning and motor execution of speech may be affected.

Journal ArticleDOI
01 Jan 2019-BMJ Open
TL;DR: A delayed treatment design, in which dysarthria therapy is offered at the end of the study to families allocated to treatment as usual, is acceptable, and a randomised controlled trial of internet delivered dysarthia therapy is feasible.
Abstract: Objectives To test the feasibility of recruitment, retention, outcome measures and internet delivery of dysarthria therapy for young people with cerebral palsy in a randomised controlled trial. Design Mixed methods. Single blind pilot randomised controlled trial, with control offered Skype therapy at end of study. Qualitative study of the acceptability of therapy delivery via Skype. Setting Nine speech and language therapy departments in northern England recruited participants to the study. Skype therapy was provided in a university setting. Participants Twenty-two children (14 M, 8 F) with dysarthria and cerebral palsy (mean age 8.8 years (SD 3.2)) agreed to take part. Participants were randomised to dysarthria therapy via Skype (n=11) or treatment as usual (n=11). Interventions Children received either usual speech therapy from their local therapist for 6 weeks or dysarthria therapy via Skype from a research therapist. Usual therapy sessions varied in frequency, duration and content. Skype dysarthria therapy focused on breath control and phonation to produce clear speech at a steady rate, and comprised three 40 min sessions per week for 6 weeks. Primary and secondary outcome measures Feasibility and acceptability of the trial design, intervention and outcome measures. Results Departments recruited two to three participants. All participants agreed to random allocation. None withdrew from the study. Recordings of children’s speech were made at all time points and rated by listeners. Families allocated to Skype dysarthria therapy judged internet delivery of the therapy to be acceptable. All families reported that the study design was acceptable. Treatment integrity checks suggested that the phrases practised in one therapy exercise should be reduced in length. Conclusions A delayed treatment design, in which dysarthria therapy is offered at the end of the study to families allocated to treatment as usual, is acceptable. A randomised controlled trial of internet delivered dysarthria therapy is feasible.

Proceedings ArticleDOI
15 Sep 2019
TL;DR: The results show that the LSTM’s ability to leverage temporal information within its input makes for an effective step in the pursuit of accessible Dysarthria diagnoses.
Abstract: This paper proposes the use of Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units for determining whether Mandarin-speaking individuals are afflicted with a form of Dysarthria based on samples of syllable pronunciations. Several LSTM network architectures are evaluated on this binary classification task, using accuracy and Receiver Operating Characteristic (ROC) curves as metrics. The LSTM models are shown to significantly improve upon a baseline fully connected network, reaching over 90% area under the ROC curve on the task of classifying new speakers, when a sufficient number of cepstrum coefficients are used. The results show that the LSTM’s ability to leverage temporal information within its input makes for an effective step in the pursuit of accessible Dysarthria diagnoses.

Proceedings ArticleDOI
17 Apr 2019
TL;DR: A new family of features based on likelihood ratios between the plosives and their respective nasal cognates based on an acoustic model that is trained only on healthy speech, and evaluated on a set of 75 speakers diagnosed with different dysarthria subtypes and exhibiting varying levels of hypernasality.
Abstract: Hypernasal speech is a common symptom across several neurological disorders; however it has a variable acoustic signature, making it difficult to quantify acoustically or perceptually. In this paper, we propose the nasal cognate distinctiveness features as an objective proxy for hypernasal speech. Our method is motivated by the observation that incomplete velopharyngeal closure changes the acoustics of the resultant speech such that alveolar stops /t/ and /d/ map to the alveolar nasal /n/ and bilabial stops /b/ and /p/ map to bilabial nasal /m/. We propose a new family of features based on likelihood ratios between the plosives and their respective nasal cognates. These features are based on an acoustic model that is trained only on healthy speech, and evaluated on a set of 75 speakers diagnosed with different dysarthria subtypes and exhibiting varying levels of hypernasality. Our results show that the family of features compares favorably with the clinical perception of speech-language pathologists subjectively evaluating hypernasality.

Journal ArticleDOI
TL;DR: Assessment of perceptual speech features during connected speech samples in 26 children with Down syndrome suggests that speech disorders in DS are due to distributed impairments involving voice, speech sound production, fluency, resonance, and prosody.
Abstract: Speech disorders occur commonly in individuals with Down syndrome (DS), although data regarding the auditory-perceptual speech features are limited. This descriptive study assessed 47 perc...

Proceedings ArticleDOI
01 Jan 2019
TL;DR: This study is the first to evaluate ASR systems’ responses to speech from patients at different stages of PD in Spanish, showing that the word error rate is 27% higher in speakers with PD than in control speakers, with a moderated correlation between that rate and the developmental stage of the disease.
Abstract: Parkinson’s Disease (PD) affects motor capabilities of patients, who in some cases need to use human-computer assistive technologies to regain independence. The objective of this work is to study in detail the differences in error patterns from state-of-the-art Automatic Speech Recognition (ASR) systems on speech from people with and without PD. Two different speech recognizers (attention-based end-to-end and Deep Neural Network - Hidden Markov Models hybrid systems) were trained on a Spanish language corpus and subsequently tested on speech from 43 speakers with PD and 46 without PD. The differences related to error rates, substitutions, insertions and deletions of characters and phonetic units between the two groups were analyzed, showing that the word error rate is 27% higher in speakers with PD than in control speakers, with a moderated correlation between that rate and the developmental stage of the disease. The errors were related to all manner classes, and were more pronounced in the vowel /u/. This study is the first to evaluate ASR systems’ responses to speech from patients at different stages of PD in Spanish. The analyses showed general trends but individual speech deficits must be studied in the future when designing new ASR systems for this population.