scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 2015"


Journal ArticleDOI
TL;DR: How common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems is reviewed.

607 citations


Journal ArticleDOI
TL;DR: A survey of past work and priority research directions for the future is provided, showing that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, known spoofing attacks.

433 citations


Journal ArticleDOI
TL;DR: Experimental results on an isolated English word corpus recorded by non-native (L2) English learners show that the proposed GOP measure can improve the performance of GOP based mispronunciation detection approach, i.e., 7.4 % of the precision and recall rate are improved, compared with the conventional GOP estimated from GMM-HMM.

184 citations


Journal ArticleDOI
Yuan Liu1, Yanmin Qian1, Nanxin Chen1, Tianfan Fu1, Ya Zhang1, Kai Yu1 
TL;DR: Experiments showed that deep feature based methods can obtain significant performance improvements compared to the traditional baselines, no matter if they are directly applied in the GMM-UBM system or utilized as identity vectors.

176 citations


Journal ArticleDOI
TL;DR: The performance of the human listening panel shows that imitated speech increases the difficulty of the speaker verification task, and a statistically significant association is found between listener accuracy and self reported factors only when familiar voices were present in the test.

84 citations


Journal ArticleDOI
TL;DR: The hypothesis that the effects of depression in speech manifest as a reduction in the spread of phonetic events in acoustic space as modelled by Gaussian Mixture Models in combination with Mel Frequency Cepstral Coefficient (MFCC) is investigated.

81 citations


Journal ArticleDOI
TL;DR: The findings demonstrate that such spoofing-oriented playback attacks can be effectively detected and should not be considered a significant argument against applications of text-dependent speaker verification.

75 citations


Journal ArticleDOI
TL;DR: Object and subjective evaluations indicated that the proposed spectral envelope estimation algorithm can obtain a temporally stable spectral envelope and synthesize speech with higher sound quality than speech synthesized with other algorithms.

73 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that: (i) the MHEC feature is highly effective and performs favorably compared to other conventional and state-of-the-art front-ends, and (ii) the power-law non-linearity consistently yields the best performance across different conditions for both SID and LID tasks.

61 citations


Journal ArticleDOI
TL;DR: The paper discusses the evaluation of audiovisual speech synthesizers, it elaborates on the hardware requirements for performing visual speech synthesis and it describes some important future directions that should stimulate the use of audiolabeled speech synthesis technology in real-life applications.

60 citations


Journal ArticleDOI
TL;DR: This study presents a novel expert-based approach to assess the quality of ongoing Spoken Dialog System (SDS) interactions and concludes that this paradigm could render SDSs more user friendly and improve user acceptance.

Journal ArticleDOI
TL;DR: The results show improvement in instrumental measure for intelligibility and frequency-weighted SNR over complex-valued non-negative matrix factorization (CNMF) source separation approach, spatial sound source separation, and conventional beamforming methods such as the DSB and minimum variance distortionless response (MVDR).

Journal ArticleDOI
TL;DR: All features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features.

Journal ArticleDOI
TL;DR: Analysis on age factor suggests that mother tongue detection in older speaker groups is easier than in younger speaker groups, and mother tongue traits might be more preserved in older speakers when speaking the second language in comparison to younger speakers.

Journal ArticleDOI
TL;DR: The speech processing pipeline to automatically detect common errors associated with CAS is described, which contains modules for voice activity detection, pronunciation verification, and lexical stress verification.

Journal ArticleDOI
TL;DR: Results show that the proposed QMF approach successfully improves the system performance in terms of both discrimination and calibration, compared with conventional linear calibration.

Journal ArticleDOI
TL;DR: It is noticed that the more challenging open-ended tasks benefit significantly more than constrained item types by the use of DNN-HMMs, which indicates the potential to build reliable spoken assessment applications based on constrained tasks, when few domain specific training data are available.

Journal ArticleDOI
TL;DR: New measures of syntactic complexity for use in the framework of automatic scoring systems for second language spontaneous speech, are studied and suggest that they show a reasonable association with human-rated proficiency scores compared to conventional measures of Syntactic complexity.

Journal ArticleDOI
TL;DR: This work introduces a set of databases that contains both speech and electroglottograph data that provide arguably the first direct insight into how cognitive load affects the voice source, and shows that glottal- based features carry complementary information with respect to formant-based features.

Journal ArticleDOI
TL;DR: It is argued that all of these findings are expected within an exemplar approach assuming storage of tonal information with lexical items, and discussed the implications of this for the production and mental representation of intonation.

Journal ArticleDOI
TL;DR: The distributions of LRs were found to be relatively stable across systems, although LRs for individual comparisons may be substantially affected, as expected, the Mismatched systems produced the worst validity, while the Matched systemsproduced the best validity.

Journal ArticleDOI
TL;DR: This study's results show that responses to the VF task contain a large number of extraneous utterances and noise that lead to relatively poor baseline ASR performance, but it is found that speaker adaptation combined with confidence scoring significantly improves all three metrics and can enable use of ASR for reliable estimates of the traditional manual VF scores.

Journal ArticleDOI
TL;DR: It is found that the simple distribution-based detection method is capable of detecting clipped speech with a higher accuracy, and the DNN-based reconstruction can achieve promising performance gains for speaker recognition on clipped speech.

Journal ArticleDOI
TL;DR: This paper shows that the relationship between lexical units and acoustic features can be factored into two parts through a latent variable, namely, an acoustic model and a lexical model and proposes an approach that addresses both acoustic and phonetic lexical resource constraints in ASR system development.

Journal ArticleDOI
TL;DR: This study investigated the relationship between clearly produced and plain citation form speech styles and motion of visible articulators, and found significant effects of speech style as well as speaker gender and saliency of visual speech cues.

Journal ArticleDOI
TL;DR: The results suggest that talkers actively monitor their environment and are able to adopt appropriate speech production strategies for efficient and effective communication in adverse conditions.

Journal ArticleDOI
TL;DR: The results of this study will be useful in a proposed application of speech ABR to objective hearing aid fitting, if the separation of the brain's responses to different vowels is found to be correlated with perceptual discrimination.

Journal ArticleDOI
TL;DR: This paper focuses on employing adaptive scales for computation of perceptually scaled continuous wavelet transform coefficients (CWT) and adaptive thresholding of these coefficients for speech enhancement and finds that for the white Gaussian noise case, SNR and SSNR of the proposed method were better than all the methods under comparison.

Journal ArticleDOI
TL;DR: A source separation algorithm based on the von Mises mixture model and the complex Gaussian mixture model is developed, where the model parameters are estimated via an expectation–maximization (EM) algorithm and a T–F mask is derived from themodel parameters for recovering the sources.

Journal ArticleDOI
TL;DR: The results highlight the feasibility of instrumental quality prediction for TTS signals provided that broad training material is employed and high prediction accuracy, however, requires nonlinear model structures.