scispace - formally typeset
Search or ask a question

Showing papers in "Computer Speech & Language in 2014"


Journal ArticleDOI
TL;DR: The results show that using either lemma or lexeme information is helpful, as well as using the two part of speech tagsets (RTS and ERTS), but that lemmatization and the ERTS POS tagset are present in a majority of the settings.

227 citations


Journal ArticleDOI
TL;DR: Extensive evaluation scenarios show that machine translation systems are approaching a good level of maturity and that they can, in combination to appropriate machine learning algorithms and carefully chosen features, be used to build sentiment analysis systems that can obtain comparable performances to the one obtained for English.

180 citations


Journal ArticleDOI
TL;DR: A novel approach to Sentiment Polarity Classification in Twitter posts is presented, by extracting a vector of weighted nodes from the graph of WordNet, which is used in SentiWordNet to compute a final estimation of the polarity.

140 citations


Journal ArticleDOI
TL;DR: This review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise.

107 citations


Journal ArticleDOI
TL;DR: Experimental results based on the NIST 2010 SRE dataset suggest that the proposed VAD outperforms conventional ones whenever interview-style speech is involved, and it is demonstrated that noise reduction is vital for energy-based VAD under low SNR.

98 citations


Journal ArticleDOI
TL;DR: Speakers showed two additional modifications as compared to shouted speech, which cannot be interpreted in terms of vocal effort only: they enhanced the modulation of their speech in f"0 and vocal intensity and they boosted their speech spectrum specifically around 3kHz, in the region of maximum ear sensitivity associated with the actor's or singer's formant.

94 citations


Journal ArticleDOI
Man-Hung Siu1, Herbert Gish1, Arthur Chan1, William Belfield1, Steve Lowe1 
TL;DR: This work proposes building HMM-based speech recognizers without transcribed data by formulating the HMM training as an optimization over both the parameter and transcription sequence space, and describes how SOU training can be easily implemented using existing HMM recognition tools.

90 citations


Journal ArticleDOI
TL;DR: An overview of the latest trends in the subjectivity and sentiment analysis fields is presented and the manner in which the articles contained in the special issue contribute to the advancement of the area is described.

90 citations


Journal ArticleDOI
TL;DR: This paper proposes an unsupervised signal-derived approach within a principal component analysis framework for quantifying one aspect of entrainment in communication, namely, vocalEntrainment, and involves measuring the similarity of specific vocal characteristics between the interlocutors in a dialog.

83 citations


Journal ArticleDOI
TL;DR: A comprehensive overview of techniques for glottal source processing can be found in this article, where the authors discuss how these tools and techniques might be properly integrated in various voice technology applications.

81 citations


Journal ArticleDOI
TL;DR: Deep bidirectional LSTM networks processing log Mel filterbank outputs deliver best results with clean models, reaching down to 42% word error rate (WER) at signal-to-noise ratios ranging from −6 to 9 dB.

Journal ArticleDOI
TL;DR: A domain-independent statistical methodology to develop dialog managers for spoken dialog systems allows rapid development of new dialog managers as well as to explore new dialog strategies, which permit developing new enhanced versions of already existing systems.

Journal ArticleDOI
TL;DR: The results indicate that the proposed scheme can be effectively employed in real applications to detect emotional speech, and can lead to accuracies as high as 75.8% in binary emotion classification.

Journal ArticleDOI
TL;DR: Traditional dialogue systems use a fixed silence threshold to detect the end of users' turns, but this simplistic model can result in system behaviour that is both interruptive and unresponsive.

Journal ArticleDOI
TL;DR: The classification results show that the NPS data and the pedophiles' conversations can be accurately discriminated from each other with character n-grams, while in the more complicated case of cybersex logs there is need for high-level features to reach good accuracy levels.

Journal ArticleDOI
TL;DR: A speech pre-processing algorithm is presented that improves the speech intelligibility in noise for the near-end listener by optimally redistributing the speech energy over time and frequency according to a perceptual distortion measure, which is based on a spectro-temporal auditory model.

Journal ArticleDOI
TL;DR: By fusing participants' systems, it is shown that binary classification of alcoholisation and sleepiness from short-term observations, i.e., single utterances, can both reach over 72% accuracy on unseen test data; and it is demonstrated that these medium-term states can be recognised more robustly by fusing short- term classifiers along the time axis.

Journal ArticleDOI
TL;DR: The present work investigates the performance of the features of Autoregressive (AR) parameters, which include gain and reflection coefficients, in addition to the traditional linear prediction coefficients (LPC), to recognize emotions from speech signals, and finds that the Features of reflection coefficients recognize emotions better than the LPC.

Journal ArticleDOI
TL;DR: The latter part of this work focuses mainly on a novel frequency warping technique that is shown to achieve vowel space expansion, incorporated into an established Lombard-inspired Spectral Shaping method that pairs with dynamic range compression to maximize speech audibility (SSDRC).

Journal ArticleDOI
TL;DR: The proposed supervised i-vector approach outperforms the i- vector baseline by relatively 12% and 7% in terms of EER and norm old minDCF values, respectively and the use of Gammatone frequency cepstral coefficients, Mel-frequency cep stral coefficients and spectro-temporal Gabor features in conjunction with shifted-delta-cepstral features improves the overall language identification performance significantly.

Journal ArticleDOI
TL;DR: A set of features is presented which enable us to distinguish automatically between prior and contextual emotion, with a focus on exploring features important in this task, and a promising learning method is shown which significantly outperforms two reasonable baselines.

Journal ArticleDOI
TL;DR: It is shown that using phoneme-level emotion classes can improve classification performance even with comparably low speech recognition performance obtained with scant a priori knowledge about the language.

Journal ArticleDOI
TL;DR: This paper identifies two methods that are able to incorporate subjectivity information originating from different languages, namely co-training and multilingual vector spaces, and shows that for this task the latter method is better suited and obtains superior results.

Journal ArticleDOI
TL;DR: This research presents the first benchmark Arabic data set that contains 610 students’ short answers together with their English translations, and focuses on applying multiple similarity measures separately and in combination.

Journal ArticleDOI
TL;DR: The evaluation results show that the synthesized voices with varying vocal effort are rated similarly to their natural counterparts both in terms of intelligibility and suitability.

Journal ArticleDOI
TL;DR: This paper presents the 2011 and 2012 MediaEval results, and compares the relative merits and weaknesses of approaches developed by participants, providing analysis and directions for future research, in order to improve voice access to spoken information in low resource settings.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated four channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora.

Journal ArticleDOI
TL;DR: This paper investigates the temporal excitation patterns of creaky voice using a variety of languages, speakers, and on both read and conversational data and involves a mutual information-based assessment of the various acoustic features proposed in the literature for detectingcreaky voice.

Journal ArticleDOI
TL;DR: The results indicate that the proposed method, only requiring a few minutes to record and analyze the patient's voice during the visit to the specialist, could help in the development of a non-intrusive, fast and convenient PSG-complementary screening technique for OSA.

Journal ArticleDOI
TL;DR: The first large, annotated, motion-capture, unscripted ASL corpus is built, which enables new animation synthesis research and makes use of the collected data to synthesize novel animations of ASL, which have also been evaluated in experimental studies with native signers.