Dysarthric speech classification from coded telephone speech using glottal features

Question

Q1. What are the contributions mentioned in the paper "Nonavinakere prabhakera, narendra; alku, paavo dysarthric speech classification from coded telephone speech using glottal features" ?

Q2. What future works have the authors mentioned in the paper "Nonavinakere prabhakera, narendra; alku, paavo dysarthric speech classification from coded telephone speech using glottal features" ?

Q3. How many sentences are used to classify dysarthric speech?

Q4. What codecs can be used for the proposed method?

Q5. What is the way to measure the variabilities of speech sources?

Q6. How many glottal parameters are extracted from the speech?

Q7. What is the accurate method for estimating the glottal flow from coded speech?

Q8. How many microphones were used to record speech?

Q9. What is the name of the two sets of acoustic features extracted from coded?

Q10. What are the two sets of acoustic features extracted from coded speech?

Q11. What is the proposed method for predicting dysarthric speech?

Q12. What is the value of the kernel parameter and penalty parameter C?

Q13. What is the glottal flow waveform obtained from coded telephone speech?

Q14. What is the classification accuracy of the two sets of openSMILE-based features?

Accepted Answer

This paper proposes a new dysarthric speech classification method from coded telephone speech using glottal features. The proposed dysarthric speech classification method can potentially be employed in telemonitoring application for identifying the presence of dysarthria from coded telephone speech.

Accepted Answer

Possible future works are as follows. Apart from the AMR-NB and AMR-WB codecs, the proposed method can be evaluated using recent codecs, for example, Enhanced Voice Services ( EVS ) codec [ 58 ]. The proposed method can be extended for the speech-based telemonitoring of different neuro-motor disorders such as Parkinson ’ s disease, Alzheimer ’ s disease, and ALS. Apart from neuro-motor disorders, the proposed method can be utilized for different paralinguistic tasks such as the recognition of emotion, and speaker states and traits under the coded condition.

Accepted Answer

80 sentence-level utterances from each speaker are used (except for two dysarthric speakers, only 23 and 28 utterances are used due to the lack of availability of recordings) to develop dysarthric speech classification systems.

Accepted Answer

Apart from the AMR-NB and AMR-WB codecs, the proposed method can be evaluated using recent codecs, for example, Enhanced Voice Services (EVS) codec [58].

Accepted Answer

The existing dysarthric speech classification systems extract high-dimensional acoustic features to capture the wide variabilities of sources and patterns in pathological speech.

Accepted Answer

two sets of glottal parameters are extracted from glottal flow waveforms, which are estimated using two GIF methods (QCP and DNN-GIF) and, hence, a total of four types of glottal parameters are extracted.

Accepted Answer

In order to estimate the glottal flow from coded speech, two GIF methods are utilized: QCP and the recently proposed DNN-GIF method.

Accepted Answer

Dysarthric speech data was recorded using an eight-microphone array, sampled at 16 kHz and each microphone was spaced at intervals of 1.5 inches.

Accepted Answer

Two sets of acoustic features, named openSMILE-1 and openSMILE-2, are also extracted from every coded speech utterance using openSMILE (described in Section 2.4), which is a widely used toolkit in paralinguistic speech processing tasks.

Accepted Answer

In this work, two sets of acoustic features, which are extracted from coded telephone speech using the openSMILE toolkit [35] are used as reference features.

Accepted Answer

The proposed method utilizes SVM to predict dysarthric/healthy labels by using the acoustic and glottal features extracted from coded speech.

Accepted Answer

The optimal values of kernel parameter γ and penalty parameter C are chosen based on grid search with C and γ, varying from 10−3 to 103 in multiples of 10.

Accepted Answer

The glottal flow waveform is obtained from coded telephone speech (coded with two standardized speech codecs - AMR-NB and AMR-WB) using QCP and the recently proposed DNN-GIF method.

Accepted Answer

From the table, it can be observed that with more than 80 % classification accuracy (except for openSMILE-1 of NB-coded speech with 77.71 % of TORGO), the two sets of openSMILE-based features have better classification accuracy than the glottal parameters after feature selection for both NB- and WB-coded speech.

Dysarthric speech classification from coded telephone speech using glottal features

Figures

Citations

Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition

Glottal Source Information for Pathological Voice Detection

The Detection of Parkinson's Disease From Speech Using Voice Source Information

Repeatability of Commonly Used Speech and Language Features for Clinical Applications.

Detection of Speech Impairments Using Cepstrum, Auditory Spectrogram and Wavelet Time Scattering Domain Features

References

Applied nonparametric statistics

Applied nonparametric statistics

Motor Speech Disorders: Substrates, Differential Diagnosis, and Management

Recent developments in openSMILE, the munich open-source multimedia feature extractor

The INTERSPEECH 2009 Emotion Challenge

Related Papers (5)

Estimation and comparison of the glottal source waveform across stress styles using glottal inverse filtering

Improvements to and applications of analysis of stressed speech using glottal waveforms

Proposal and evaluation of models for the glottal source waveform

Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer

Glottal Spectral Separation for Speech Synthesis

Frequently Asked Questions (14)