scispace - formally typeset
Search or ask a question

Showing papers in "Computer Speech & Language in 2013"


Journal ArticleDOI
TL;DR: A broad overview of the constantly growing field of paralinguistic analysis is provided by defining the field, introducing typical applications, presenting exemplary resources, and sharing a unified view of the chain of processing.

285 citations


Journal ArticleDOI
TL;DR: The ASR task as discussed by the authors was designed to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally recorded in a busy domestic environment, and the challenge attracted thirteen submissions.

218 citations


Journal ArticleDOI
TL;DR: A novel automatic speaker age and gender identification approach which combines seven different methods at both acoustic and prosodic levels to improve the baseline performance is presented and weighted summation based fusion of these seven subsystems at the score level is demonstrated.

176 citations


Journal ArticleDOI
TL;DR: It seems that the state-of-the-art LID system performs much better on the standard 12 class NIST 2003 Language Recognition Evaluation task or the two class ethnic group recognition task than on the 14 class regional accent recognition task.

109 citations


Journal ArticleDOI
TL;DR: The current study looks to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension and includes these features as inputs to a fuzzy-input fuzzy-output support vector machine (F^2SVM) algorithm, which is in turn capable of softly categorizing voice quality recordings.

75 citations


Journal ArticleDOI
TL;DR: A system for detecting interpersonal stance: whether a speaker is flirtatious, friendly, awkward, or assertive, made use of a new spoken corpus of over 1000 4-min speed-dates and has implications for the understanding of interpersonal stance, their linguistic expression, and their automatic extraction.

73 citations


Journal ArticleDOI
TL;DR: It is suggested that enacting studies using professional mental imagery techniques are an important part of the available experimental paradigms, as they allow extensive experimental control and as the results seem to be comparable with other induction techniques.

70 citations


Journal ArticleDOI
TL;DR: The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production to create a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.

65 citations


Journal ArticleDOI
TL;DR: This work presents an on-line system designed to behave as a virtual therapist incorporating automatic speech recognition technology that permits aphasia patients to perform word naming training exercises and focuses on the study of the automatic word naming detector module.

60 citations


Journal ArticleDOI
TL;DR: A system that transforms the speech signals of speakers with physical speech disabilities into a more intelligible form that can be more easily understood by listeners and a substantial step towards full automation in speech transformation without the need for expert or clinical intervention is presented.

59 citations


Journal ArticleDOI
TL;DR: A new algorithm for automatically detecting creak in speech signals is described, utilising two new acoustic parameters which are designed to characterise creaky excitations following previous evidence in the literature combined with new insights from observations in the current work.

Journal ArticleDOI
TL;DR: Experimental evidence not only demonstrates the feasibility of the proposed techniques, but it shows that the proposed technique attains comparable performance to standard approaches on the LRE tasks investigated in this work when the same experimental conditions are adopted.

Journal ArticleDOI
TL;DR: A model of incremental speech generation in practical conversational systems that allows a conversational system to incrementally interpret spoken input, while simultaneously planning, realising and self-monitoring the system response.

Journal ArticleDOI
TL;DR: This work proposes an interpolation-based technique for obtaining a prior acoustic model from one trained on unimpaired speech, before adapting it to the dysarthric talker, and tests it in conjunction with the well-known maximum a posteriori (MAP) adaptation algorithm.

Journal ArticleDOI
TL;DR: A comparison to a simplified front-end based on a free-field assumption shows that the introduced system substantially improves the speech quality and the recognition performance under the considered adverse conditions.

Journal ArticleDOI
TL;DR: It is shown that room acoustic parameters such as the clarity and the definition correlate well with the ASR results, and that the application of a recent dereverberation method based on perceptual modelling can be used and achieve significant Phone Recognition (PR) improvement, especially under highly reverberant conditions.

Journal ArticleDOI
TL;DR: This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals to noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to -6dB.

Journal ArticleDOI
TL;DR: The distribution of topics such as Friend and Job are found to be sensitive to the documents' emotions, which is called emotion topic variation in this paper, and reveals the deeper relationship between topics and emotions.

Journal ArticleDOI
TL;DR: This paper investigates the use of context dependent weighting in both interpolation and test-time adaptation of language models and proposes a range of schemes to combine weight information obtained from training data and test data hypotheses to improve robustness during context dependent LM adaptation.

Journal ArticleDOI
TL;DR: A new expectation maximization (EM) based technique is introduced that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty, and results in 3-4% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data.

Journal ArticleDOI
TL;DR: Various types of MLP networks were examined with respect to their ability to classify utterances correctly into two, non-fluent and fluent, groups and classification correctness exceeded 84-100% depending on the disfluency type.

Journal ArticleDOI
TL;DR: The introduced methods improve the performance of single channel source separation for speech separation and speech-music separation with different NMF divergence functions and introduce novel update rules that solve the optimization problem efficiently for the new regularized NMF problem.

Journal ArticleDOI
TL;DR: This work represents the first comprehensive analysis of speaker verification on a longitudinal speaker database and successfully addresses the associated variability from ageing and quality arte-facts.

Journal ArticleDOI
TL;DR: This work proposes a novel, semi-supervised, batch-mode active learning strategy that attempts to maximize in-domain coverage by selecting sentences, which represent a balance between domain match, translation difficulty, and batch diversity.

Journal ArticleDOI
TL;DR: From the evaluation, it is observed that prediction accuracy is better for two-stage FFNN models, compared to the other models.

Journal ArticleDOI
TL;DR: Experiments on both synthetic and real data recorded by two distributed microphone pairs show that the proposed framework can detect and track up to five sources simultaneously active in a reverberant environment.

Journal ArticleDOI
TL;DR: An efficient approach to modeling the acoustic features for the tasks of recognizing various paralinguistic phenomena by building a monophone-based Hidden Markov Model (HMM), able to achieve better results than the current state-of-the-art systems in both tasks.

Journal ArticleDOI
TL;DR: This paper introduces a recognition system that can recognize speech in the presence of multiple rapidly time-varying noise sources as found in a typical family living room and approaches human performance levels by greatly improving the audible quality of speech and substantially improving the keyword recognition accuracy.

Journal ArticleDOI
TL;DR: On the task of unsupervised spoken pattern discovery from the TIDIGITS database, both training schemes are observed to improve over BW training in terms of pattern purity, accuracy of the segmentation boundaries and accuracy for speech recognition.

Journal ArticleDOI
TL;DR: A novel front-end for context-sensitive Tandem feature extraction is designed and it is shown how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM).