scispace - formally typeset
Search or ask a question

Showing papers in "Computer Speech & Language in 2018"


Journal ArticleDOI
TL;DR: For both training schemes, ASR-based predictions outperform established measures such as the extended speech intelligibility index (ESII), the multi-resolution speech envelope power spectrum model (mr-sEPSM) and others.

74 citations


Journal ArticleDOI
TL;DR: The performances of several classification methods are compared, including Gaussian Mixture Model–Universal Background Model (GMM–UBM), GMM–Support Vector Machine (G MM–SVM) and i-vector based approaches, and the utility of different frequency bands for speaker, age-group and gender recognition from children’s speech is assessed.

59 citations


Journal ArticleDOI
TL;DR: An automatic segmentation and classification system for empathy inspired by the modal model of emotions is designed and evaluated and designed to support both the fusion and automatic selection of relevant features from high dimensional space.

49 citations


Journal ArticleDOI
TL;DR: This paper proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV), which outperforms all existing baseline features for both known and unknown attacks.

47 citations


Journal ArticleDOI
TL;DR: Techniques used in analysis of articulatory data acquired using RT-MRI are reviewed, the utility of different approaches for different types of data and research goals are assessed, and new challenges in audio–video data analysis and data modeling are presented.

44 citations


Journal ArticleDOI
TL;DR: Pitch-adaptive front-end signal processing in deriving the Mel-frequency cepstral coefficient features is explored to reduce the sensitivity to pitch variation and the effectiveness of existing speaker normalization techniques remain intact even with the use of proposed pitch- Adaptive MFCCs.

35 citations


Journal ArticleDOI
TL;DR: This work extends the conventional expectation-maximization algorithm for GMM training using semi-supervised learning and provides a methodology to incorporate unlabeled data into the SAD training process, leading to more accurate statistical models by exploiting the structure of data distribution.

34 citations


Journal ArticleDOI
TL;DR: A detailed analysis of neural versus phrase-based statistical machine translation outputs, leveraging high quality post-edits performed by professional translators on the IWSLT data provides useful insights on what linguistic phenomena are best modelled by neural models.

33 citations


Journal ArticleDOI
TL;DR: A paraphrase identification system that represents each pair of sentence as a combination of different similarity measures that extract lexical, syntactic and semantic components of the sentences encompassed in a graph is proposed.

32 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of optimal placement of EMA sensors by posing it as the optimal selection of points for minimizing the reconstruction error of the air-tissue boundaries in the real-time magnetic resonance imaging (rtMRI) video frames of vocal tract (VT) in the mid-sagittal plane using dynamic programming.

32 citations


Journal ArticleDOI
TL;DR: It is shown how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error expressed in terms of Root Mean Square Error (RMSE), equal to 324 mm and 367 mm for the Simulated and the Real subsets.

Journal ArticleDOI
TL;DR: Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.

Journal ArticleDOI
TL;DR: This paper proposed a new deep neural network that explores recurrent models to capture word sequences within sentences, and further study the impact of pretrained word embeddings on the performance of the proposed approach.

Journal ArticleDOI
TL;DR: It is found that DNN-based ASR reaches human performance for single-channel, small-vocabulary tasks in the presence of speech-shaped noise and in multi-talker babble noise, which is an important difference to previous human-machine comparisons.

Journal ArticleDOI
TL;DR: The results presented here suggest that substantial reduction in WER is achieved with clean training, and the uncertainty weighting method reduced the gap between clean and multi-noise/multi-condition training.

Journal ArticleDOI
TL;DR: In this article, a domain-invariant linear discriminant analysis (DI-LDA) technique was proposed to compensate domain mismatch from both LDA and PLDA subspaces.

Journal ArticleDOI
TL;DR: In this paper, the rank-1 constrained multichannel Wiener filter is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance.

Journal ArticleDOI
TL;DR: It is found that pronunciation models that use explicit knowledge about error pronunciation patterns can lead to more accurate classification whether a phoneme was correctly pronounced or not, and this paper proposes two new GOP techniques.

Journal ArticleDOI
TL;DR: An empirical study on POS tagging for Vietnamese social media text is presented, which shows several challenges compared with tagging for general text and the semi-supervised model outperformed, in terms of accuracy, the version of vnTagger trained on the same Facebook dataset, showing the usefulness of word cluster features.

Journal ArticleDOI
TL;DR: The S+ condition presumably captured the children's attention toward the currently heard word, which forced the children to be strictly aligned with the oral modality, as well as improving the learning benefits provided by a reading experience.

Journal ArticleDOI
TL;DR: Two approaches to tackling dialogue management as a reinforcement learning task are presented, whereby a recurrent neural network is utilised as a task success predictor which is pre-trained from off-line data to estimate task success during subsequent on-line dialogue policy learning.

Journal ArticleDOI
TL;DR: This paper proposes a method that solves unbound pronominal anaphoric expressions, automatically enabling the cohesiveness of the extractive summaries, and provides a comparative evaluation concerning two distinct assessment scenarios which are compared to a baseline.

Journal ArticleDOI
TL;DR: This work presents a novel prototype Rule Based Machine Translation (RBMT) system for the creation of large and quality written Greek Sign Language (GSL) glossed corpora from Greek text and stresses that Language Models for written GSL gloss are missing from the scientific literature, thus this work is pioneer in this field.

Journal ArticleDOI
TL;DR: Both raw speech samples and mel frequency cepstral coefficients are used as an initial representation for feature extraction and a transformation function known as weighted decomposition (WD) of principal components is used to emphasize the discriminative information present in the PCA-based dictionary.

Journal ArticleDOI
TL;DR: The use of Web texts for language modeling is shown to significantly improve both speech recognition and keyword spotting performance, and combining full-word and subword units leads to the best keyword spotting results.

Journal ArticleDOI
TL;DR: This work identifies 14 aspects of stance that occur frequently in radio news stories and that could be useful for information retrieval, including indications of subjectivity, immediacy, local relevance, and newness.

Journal ArticleDOI
TL;DR: The probability of an ignorant state is eliminated through the orthogonal sum of several speech presence probabilities, which results in the performance improvement when detecting voice activity.

Journal ArticleDOI
TL;DR: A novel prosody teaching system where intensity (accent), intonation and rhythm are presented visually for the students as visual feedback and automatic assessment scores are given jointly and separately for the goodness of intonations and rhythm is introduced.

Journal ArticleDOI
TL;DR: A corpus similarity measure based on PCA-ranked features answers the question which corpora should be included into joint training and outperforms all other combinations of corpora.

Journal ArticleDOI
TL;DR: This work proposes an unsupervised method—RankUp—that enhances graph-based keyphrase extraction approaches by applying an error-feedback mechanism similar to the concept of backpropagation, and shows that error- feedback propagation can boost the quality of keyphrases in graph- based keyphrase extractions techniques.