Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition

doi:10.1109/icassp43922.2022.9746585

Proceedings ArticleDOI

Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition

Mohammad Soleymanpour, +3 more

- pp 7382-7386

Chats0

TLDR

This paper aims to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR, and adds Dysarthria severity level and pause insertion mechanisms to other control parameters such as pitch, energy, and duration.

Abstract:

Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. To have robust dysarthria-specific ASR, sufficient training speech is required, which is not readily available. Recent advances in Text-To-Speech (TTS) synthesis multi-speaker end-to-end systems suggest the possibility of using synthesis for data augmentation. In this paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR. In the synthesized speech, we add dysarthria severity level and pause insertion mechanisms to other control parameters such as pitch, energy, and duration. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Audio samples are available at https://mohammadelc.github.io/SpeechGroupUKY/

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Zengrui Jin, +6 more

- 13 May 2022 -

arXiv.org

TL;DR: The proposed GAN based data augmentation approaches consistently outperform the baseline speed perturbation method by up to 0.91% and 3.0% absolute on the TORGO and DementiaBank data respectively.

...read moreread less

Journal ArticleDOI

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

Xin Zhang, +7 more

- 04 Nov 2022 -

arXiv.org

TL;DR: The authors proposed Stutter-TTS, a neural text-to-speech model capable of synthesizing diverse types of stuttering utterances, where additional tokens are introduced into source text during training to represent speci�c stuttering characteristics.

...read moreread less

Journal ArticleDOI

Dysarthria severity assessment using squeeze-and-excitation networks

Amlu Anna Joshy, +1 more

- 01 Apr 2023 -

Biomedical Signal Processing and Control

TL;DR: In this article , the authors explored the potency of squeeze-and-excitation (SE) networks for dysarthria severity level classification using mel spectrograms and compared them with a shallow CNN and a convolutional recurrent neural network built using a bidirectional long short-term memory network.

...read moreread less

Journal ArticleDOI

Use of Speech Impairment Severity for Dysarthric Speech Recognition

Mengzhe Geng, +9 more

- 18 May 2023 -

arXiv.org

TL;DR: In this paper , a set of techniques to use both severity and speaker-identity in dysarthric speech recognition is proposed, such as multitask training incorporating severity prediction error, speaker-severity aware auxiliary feature adaptation, and structured LHUC transforms separately conditioned on speaker identity and severity.

...read moreread less

Journal ArticleDOI

A comprehensive survey of automatic dysarthric speech recognition

Shailaja S Yadav

- 01 Dec 2023 -

International Journal of Informatics and...

TL;DR: A comprehensive survey of the recent advances in the automatic dysarthric speech recognition (DSR) using machine learning (ML) and deep learning (DL) paradigms is presented in this paper .

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

The Kaldi Speech Recognition Toolkit

Daniel Povey, +12 more

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Proceedings ArticleDOI

Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

Jonathan Shen, +12 more

TL;DR: Tacotron 2, a neural network architecture for speech synthesis directly from text that is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize time-domain waveforms from those Spectrograms is described.

...read moreread less

Proceedings ArticleDOI

Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi.

Michael McAuliffe, +4 more

TL;DR: The Montreal Forced Aligner (MFA) is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features.

...read moreread less

Posted Content

Tacotron: Towards End-to-End Speech Synthesis

Yuxuan Wang, +13 more

- 29 Mar 2017 -

arXiv: Computation and Language

TL;DR: Tacotron is presented, an end-to-end generative text- to-speech model that synthesizes speech directly from characters that achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness.

...read moreread less

Posted Content

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Yi Ren, +6 more

- 08 Jun 2020 -

arXiv: Audio and Speech Processing

TL;DR: FastSpeech 2 is proposed, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by directly training the model with ground-truth target instead of the simplified output from teacher, and introducing more variation information of speech as conditional inputs.

...read moreread less

Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition

Citations

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

Dysarthria severity assessment using squeeze-and-excitation networks

Use of Speech Impairment Severity for Dysarthric Speech Recognition

A comprehensive survey of automatic dysarthric speech recognition

References

The Kaldi Speech Recognition Toolkit

Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi.

Tacotron: Towards End-to-End Speech Synthesis

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Related Papers (5)

Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition

Synthetic visual speech driven from auditory speech.

Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation

Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

Intelligibility Improvement of Dysarthric Speech using MMSE DiscoGAN