scispace - formally typeset
Proceedings ArticleDOI

Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

TLDR
In this article , the authors explored the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech.
Abstract
State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particu-larly difficult as several aspects of speech such as articulation, prosody and phonation can be impaired. Specifically, we train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model. Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance. In particular, features from the multilingual model led to lower WERs than filter-banks (Fbank) or models trained on a single language. Improvements were observed in English speakers with cerebral palsy caused dysarthria (UASpeech corpus), Spanish speakers with Parkinsonian dysarthria (PC-GITA corpus) and Italian speakers with paralysis-based dysarthria (EasyCall corpus). Compared to using Fbank features, XLSR-based features reduced WERs by 6.8%, 22.0%, and 7.0% for the UASpeech, PC-GITA, and EasyCall corpus, respectively.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

TL;DR: In this paper , a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition were explored.

Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features

TL;DR: In this paper , a hierarchical classifier was used to detect laryngeal voice disorders, and the best performance was achieved by using features from wav2vec 2.0 LARGE together with hierarchical classification.
Journal ArticleDOI

Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning

TL;DR: A novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning, and how multi- task learning affects the severity classification performance by an-alyzing the latent representations and regularization effect is presented.
Journal ArticleDOI

Use of Speech Impairment Severity for Dysarthric Speech Recognition

TL;DR: In this paper , a set of techniques to use both severity and speaker-identity in dysarthric speech recognition is proposed, such as multitask training incorporating severity prediction error, speaker-severity aware auxiliary feature adaptation, and structured LHUC transforms separately conditioned on speaker identity and severity.
Journal ArticleDOI

Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech

TL;DR: In this paper , the authors compare different mono- or cross-lingual pre-training (supervised and unsupervised ) methodologies for spoken language understanding (SLU) tasks on Dutch dysarthric speech.
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

Movement Disorder Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS): scale presentation and clinimetric testing results.

Christopher G. Goetz, +87 more
- 15 Nov 2008 - 
TL;DR: The combined clinimetric results of this study support the validity of the MDS‐UPDRS for rating PD.
Proceedings Article

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

TL;DR: It is shown for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
Posted Content

Conformer: Convolution-augmented Transformer for Speech Recognition

TL;DR: This work proposes the convolution-augmented transformer for speech recognition, named Conformer, which significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.
Journal ArticleDOI

Differential Diagnostic Patterns of Dysarthria

TL;DR: Thirty-second speech samples were studied of at least 30 patients in each of 7 discrete neurologic groups, each patient unequivocally diagnosed as being a representative of his diagnostic group, leading to results leading to these conclusions.