Q2. What future works have the authors mentioned in the paper "Nonavinakere prabhakera, narendra; alku, paavo dysarthric speech classification from coded telephone speech using glottal features" ?
Possible future works are as follows. Apart from the AMR-NB and AMR-WB codecs, the proposed method can be evaluated using recent codecs, for example, Enhanced Voice Services ( EVS ) codec [ 58 ]. The proposed method can be extended for the speech-based telemonitoring of different neuro-motor disorders such as Parkinson ’ s disease, Alzheimer ’ s disease, and ALS. Apart from neuro-motor disorders, the proposed method can be utilized for different paralinguistic tasks such as the recognition of emotion, and speaker states and traits under the coded condition.
Q3. How many sentences are used to classify dysarthric speech?
80 sentence-level utterances from each speaker are used (except for two dysarthric speakers, only 23 and 28 utterances are used due to the lack of availability of recordings) to develop dysarthric speech classification systems.
Q4. What codecs can be used for the proposed method?
Apart from the AMR-NB and AMR-WB codecs, the proposed method can be evaluated using recent codecs, for example, Enhanced Voice Services (EVS) codec [58].
Q5. What is the way to measure the variabilities of speech sources?
The existing dysarthric speech classification systems extract high-dimensional acoustic features to capture the wide variabilities of sources and patterns in pathological speech.
Q6. How many glottal parameters are extracted from the speech?
two sets of glottal parameters are extracted from glottal flow waveforms, which are estimated using two GIF methods (QCP and DNN-GIF) and, hence, a total of four types of glottal parameters are extracted.
Q7. What is the accurate method for estimating the glottal flow from coded speech?
In order to estimate the glottal flow from coded speech, two GIF methods are utilized: QCP and the recently proposed DNN-GIF method.
Q8. How many microphones were used to record speech?
Dysarthric speech data was recorded using an eight-microphone array, sampled at 16 kHz and each microphone was spaced at intervals of 1.5 inches.
Q9. What is the name of the two sets of acoustic features extracted from coded?
Two sets of acoustic features, named openSMILE-1 and openSMILE-2, are also extracted from every coded speech utterance using openSMILE (described in Section 2.4), which is a widely used toolkit in paralinguistic speech processing tasks.
Q10. What are the two sets of acoustic features extracted from coded speech?
In this work, two sets of acoustic features, which are extracted from coded telephone speech using the openSMILE toolkit [35] are used as reference features.
Q11. What is the proposed method for predicting dysarthric speech?
The proposed method utilizes SVM to predict dysarthric/healthy labels by using the acoustic and glottal features extracted from coded speech.
Q12. What is the value of the kernel parameter and penalty parameter C?
The optimal values of kernel parameter γ and penalty parameter C are chosen based on grid search with C and γ, varying from 10−3 to 103 in multiples of 10.
Q13. What is the glottal flow waveform obtained from coded telephone speech?
The glottal flow waveform is obtained from coded telephone speech (coded with two standardized speech codecs - AMR-NB and AMR-WB) using QCP and the recently proposed DNN-GIF method.
Q14. What is the classification accuracy of the two sets of openSMILE-based features?
From the table, it can be observed that with more than 80 % classification accuracy (except for openSMILE-1 of NB-coded speech with 77.71 % of TORGO), the two sets of openSMILE-based features have better classification accuracy than the glottal parameters after feature selection for both NB- and WB-coded speech.