Proceedings ArticleDOI
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
Abner F Hernandez,Paula Andrea P'erez-Toro,Elmar Nöth,Juan Rafael Orozco-Arroyave,Andreas Maier,Seung Hee Yang +5 more
- pp 51-55
TLDR
In this article , the authors explored the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech.Abstract:
State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particu-larly difficult as several aspects of speech such as articulation, prosody and phonation can be impaired. Specifically, we train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model. Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance. In particular, features from the multilingual model led to lower WERs than filter-banks (Fbank) or models trained on a single language. Improvements were observed in English speakers with cerebral palsy caused dysarthria (UASpeech corpus), Spanish speakers with Parkinsonian dysarthria (PC-GITA corpus) and Italian speakers with paralysis-based dysarthria (EasyCall corpus). Compared to using Fbank features, XLSR-based features reduced WERs by 6.8%, 22.0%, and 7.0% for the UASpeech, PC-GITA, and EasyCall corpus, respectively.read more
Citations
More filters
Proceedings ArticleDOI
Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Shujie Hu,Xurong Xie,Zengrui Jin,Mengzhe Geng,Yi Wang,Mingyu Cui,Jiajun Deng,Xunying Liu,Helen Meng +8 more
TL;DR: In this paper , a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition were explored.
Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features
TL;DR: In this paper , a hierarchical classifier was used to detect laryngeal voice disorders, and the best performance was achieved by using features from wav2vec 2.0 LARGE together with hierarchical classification.
Journal ArticleDOI
Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning
TL;DR: A novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning, and how multi- task learning affects the severity classification performance by an-alyzing the latent representations and regularization effect is presented.
Journal ArticleDOI
Use of Speech Impairment Severity for Dysarthric Speech Recognition
Mengzhe Geng,Zengrui Jin,Tianzi Wang,Shujie Hu,Jiajun Deng,Mingyu Cui,Guinan Li,Tianwei Yu,Xurong Xie,Xunying Liu +9 more
TL;DR: In this paper , a set of techniques to use both severity and speaker-identity in dysarthric speech recognition is proposed, such as multitask training incorporating severity prediction error, speaker-severity aware auxiliary feature adaptation, and structured LHUC transforms separately conditioned on speaker identity and severity.
Journal ArticleDOI
Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech
Pu Wang,Hugo Van hamme +1 more
TL;DR: In this paper , the authors compare different mono- or cross-lingual pre-training (supervised and unsupervised ) methodologies for spoken language understanding (SLU) tasks on Dutch dysarthric speech.
References
More filters
Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI
Movement Disorder Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS): scale presentation and clinimetric testing results.
Christopher G. Goetz,Barbara C. Tilley,Stephanie R. Shaftman,Glenn T. Stebbins,Stanley Fahn,Pablo Martinez-Martin,Werner Poewe,Cristina Sampaio,Matthew B. Stern,Richard Dodel,Bruno Dubois,Robert G. Holloway,Joseph Jankovic,Jaime Kulisevsky,Anthony E. Lang,Andrew J. Lees,Sue Leurgans,Peter A. LeWitt,David L. Nyenhuis,C. Warren Olanow,Olivier Rascol,Anette Schrag,Jeanne A. Teresi,Jacobus J. van Hilten,Nancy R. LaPelle,Pinky Agarwal,Saima Athar,Yvette Bordelan,Helen Bronte-Stewart,Richard Camicioli,Kelvin L. Chou,Wendy Cole,Arif Dalvi,Holly Delgado,Alan Diamond,Jeremy P.R. Dick,John E. Duda,Rodger J. Elble,Carol Evans,V. G. H. Evidente,Hubert H. Fernandez,Susan H. Fox,Joseph H. Friedman,Robin D. Fross,David A. Gallagher,Deborah A. Hall,Neal Hermanowicz,Vanessa K. Hinson,Stacy Horn,Howard I. Hurtig,Un Jung Kang,Galit Kleiner-Fisman,Olga Klepitskaya,Katie Kompoliti,Eugene C. Lai,Maureen L. Leehey,Iracema Leroi,Kelly E. Lyons,Terry McClain,Steven W. Metzer,Janis M. Miyasaki,John C. Morgan,Martha Nance,Joanne Nemeth,Rajesh Pahwa,Sotirios A. Parashos,Jay S. Schneider,Kapil D. Sethi,Lisa M. Shulman,Andrew Siderowf,Monty Silverdale,Tanya Simuni,Mark Stacy,Robert Malcolm Stewart,Kelly L. Sullivan,David M. Swope,Pettaruse M. Wadia,Richard Walker,Ruth H. Walker,William J. Weiner,Jill Wiener,Jayne R. Wilkinson,Joanna M. Wojcieszek,Summer C. Wolfrath,Frederick Wooten,Allen Wu,Theresa A. Zesiewicz,Richard M. Zweig +87 more
TL;DR: The combined clinimetric results of this study support the validity of the MDS‐UPDRS for rating PD.
Proceedings Article
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
TL;DR: It is shown for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
Posted Content
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati,James Qin,Chung-Cheng Chiu,Niki Parmar,Yu Zhang,Jiahui Yu,Wei Han,Shibo Wang,Zhengdong Zhang,Yonghui Wu,Ruoming Pang +10 more
TL;DR: This work proposes the convolution-augmented transformer for speech recognition, named Conformer, which significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.
Journal ArticleDOI
Differential Diagnostic Patterns of Dysarthria
TL;DR: Thirty-second speech samples were studied of at least 30 patients in each of 7 discrete neurologic groups, each patient unequivocally diagnosed as being a representative of his diagnostic group, leading to results leading to these conclusions.