This work performs initial alignments to locate long pauses in dysarthric speech and make use of the pause intervals as anchor points, and applies speech recognition for word lattice outputs for recovering the time-stamps of the words in disordered or incomplete pronunciations to obtain reliably aligned segments.
Abstract:
Dysarthria is a motor speech disorder due to neurologic deficits. The impaired movement of muscles for speech production leads to disordered speech where utterances have prolonged pause intervals, slow speaking rates, poor articulation of phonemes, syllable deletions, etc. These present challenges towards the use of speech technologies for automatic processing of dysarthric speech data. In order to address these challenges, this work begins by addressing the performance degradation faced in forced alignment. We perform initial alignments to locate long pauses in dysarthric speech and make use of the pause intervals as anchor points. We apply speech recognition for word lattice outputs for recovering the time-stamps of the words in disordered or incomplete pronunciations. By verifying the initial alignments with word lattices, we obtain the reliably aligned segments. These segments provide constraints for new alignment grammars, that can improve alignment and transcription quality. We have applied the proposed strategy to the TORGO corpus and obtained improved alignments for most dysarthric speech data, while maintaining good alignments for non-dysarthric speech data. Index Terms: automatic forced alignment, speech recognition, dysarthric speech, word lattices
TL;DR: The authors proposed a set of acoustic features that capture the complementary dimensions of hypernasal hypernasality, which is a common characteristic symptom across many motor-speech disorders, such as Parkinson's disease and Huntington's disease.
TL;DR: This article proposed a set of acoustic features that capture the complementary dimensions of hypernasal hypernasality, which is a common characteristic symptom across many motor-speech disorders, such as Parkinson's disease and Huntington's disease.
TL;DR: An automatic articulatory characteristics analysis framework based on a distinctive feature (DF) recognition that indicates a potential way to describe articulation characteristics of dysarthric speech and automatically assess it.
TL;DR: An algorithm for syllable boundary detection followed by syllable repetition detection in dysarthric speech is proposed and when a syllable is found to be repeated, that syllables are repeated automatically in the transcription also.
TL;DR: This work proposes an approach for repetition detection and tested on Dysarthric utterances by extracting features such as MFCC and formants and employed two approaches: Dynamic time warping (DTW) and polynomial curve fitting (PCF).
TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.
TL;DR: In this article, the authors define, understand, and categorize motor speech disorders, and present a classification of the disorders based on the following: 1. Defining, Understanding, and Categorizing Motor Speech Disorders 2. Neurologic Bases of Motor Speech and its Pathologies 3. Examination of motor Speech disorders Part 2: The Disorders and their Diagnoses 4.
TL;DR: The short easy assessment described has been found to have acceptable inter-rater reliability, even between Speech Therapists who have not been trained to use the test.
Q1. What are the contributions mentioned in the paper "Improving automatic forced alignment for dysarthric speech transcription" ?
In order to address these challenges, this work begins by addressing the performance degradation faced in forced alignment. The authors perform initial alignments to locate long pauses in dysarthric speech and make use of the pause intervals as anchor points. The authors have applied the proposed strategy to the TORGO corpus and obtained improved alignments for most dysarthric speech data, while maintaining good alignments for non-dysarthric speech data.
Q2. What are the pause intervals of dysarthric speech?
The corpus includes action tasks for articulation movements, speaking tasks of repeating patterns, words, sentences and picture descriptions.
Q3. Why do the authors proceed to the verification stage with the LVCSR outputs?
The authors proceed to the verification stage with the LVCSR outputs based on LM weight p = 30 due to the higher and more consistent LVCSR agreement rates.
Q4. What is the correct word for the alignment?
If the substituted word is reliable, the authors replace the recognized word with the substituted word in the updated alignment under the following conditions: (1) the starttime or the end-time between the substituted and the recognized words are matched, or (2) the start-time and the end-time are matched with larger time tolerance.
Q5. What are the popular speech corpora?
For dysarthric speech, several English dysarthric speech corpora are publicly available, such as Nemours [8], the Universal Access-Speech (UA-Speech) [9], and the TORGO corpus [10].
Q6. What is the agreement rate of the 1-Best LVCSR outputs?
The authors define the agreement rate which is equivalent to sentence correctness (1- sentence error rate) of the 1-Best LVCSR outputs (pauses ignored) with the original transcriptions as the reference.
Q7. What is the purpose of the alignment task?
The contextual information from language models in LVCSR also helps to compensate for the acoustic mismatch due to pronunciation deviations.
Q8. How many pauses are there in the TORGO corpus?
The data set consist of about 4,900 non-dysarthric and 2,400 dysarthric speech utterances, with at least 100 utterancesAppears in INTERSPEECH 2015 pp. 2991-2995