Showing papers in &quot;Computer Speech &amp; Language in 2013&quot;

The PASCAL CHiME speech separation and recognition challenge

TL;DR: A broad overview of the constantly growing field of paralinguistic analysis is provided by defining the field, introducing typical applications, presenting exemplary resources, and sharing a unified view of the chain of processing.

...read moreread less

285 citations

Journal Article•DOI•

[...]

Jon Barker¹, Emmanuel Vincent², Ning Ma¹, Heidi Christensen¹, Phil D. Green¹ - Show less +1 more•Institutions (2)

University of Sheffield¹, French Institute for Research in Computer Science and Automation²

Automatic speaker age and gender recognition using acoustic and prosodic level information fusion

TL;DR: The ASR task as discussed by the authors was designed to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally recorded in a busy domestic environment, and the challenge attracted thirteen submissions.

...read moreread less

218 citations

Journal Article•DOI•

[...]

Ming Li¹, Kyu Jeong Han¹, Shrikanth S. Narayanan¹•Institutions (1)

University of Southern California¹

Human and computer recognition of regional accents and ethnic groups from British English speech

TL;DR: A novel automatic speaker age and gender identification approach which combines seven different methods at both acoustic and prosodic levels to improve the baseline performance is presented and weighted summation based fusion of these seven subsystems at the score level is demonstrated.

...read moreread less

176 citations

Journal Article•DOI•

[...]

Abualsoud Hanani¹, Martin J. Russell¹, Michael J. Carey¹•Institutions (1)

University of Birmingham¹

Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification

TL;DR: It seems that the state-of-the-art LID system performs much better on the standard 12 class NIST 2003 Language Recognition Evaluation task or the two class ethnic group recognition task than on the 14 class regional accent recognition task.

...read moreread less

109 citations

Journal Article•DOI•

[...]

Stefan Scherer¹, John Kane², Christer Gobl², Friedhelm Schwenker³•Institutions (3)

University of Southern California¹, Trinity College, Dublin², University of Ulm³

Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates

TL;DR: The current study looks to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension and includes these features as inputs to a fuzzy-input fuzzy-output support vector machine (F^2SVM) algorithm, which is in turn capable of softly categorizing voice quality recordings.

...read moreread less

75 citations

Journal Article•DOI•

[...]

Rajesh Ranganath¹, Dan Jurafsky¹, Daniel A. McFarland¹•Institutions (1)

Stanford University¹

Vocal markers of emotion: Comparing induction and acting elicitation

TL;DR: A system for detecting interpersonal stance: whether a speaker is flirtatious, friendly, awkward, or assertive, made use of a new spoken corpus of over 1000 4-min speed-dates and has implications for the understanding of interpersonal stance, their linguistic expression, and their automatic extraction.

...read moreread less

73 citations

Journal Article•DOI•

[...]

Klaus R. Scherer¹•Institutions (1)

University of Geneva¹

Phrase-level speech simulation with an airway modulation model of speech production

TL;DR: It is suggested that enacting studies using professional mental imagery techniques are an important part of the available experimental paradigms, as they allow extensive experimental control and as the results seem to be comparable with other induction techniques.

...read moreread less

70 citations

Journal Article•DOI•

[...]

Brad H. Story¹•Institutions (1)

University of Arizona¹

Automatic word naming recognition for an on-line aphasia treatment system

TL;DR: The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production to create a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.

...read moreread less

65 citations

Journal Article•DOI•

[...]

Alberto Abad¹, Anna Pompili², Anna Pompili¹, Angela Costa¹, Angela Costa³, Isabel Trancoso², Isabel Trancoso¹, José Fonseca, Gabriela Leal, Luisa Farrajota, Isabel Pavão Martins - Show less +7 more•Institutions (3)

INESC-ID¹, Instituto Superior Técnico², Universidade Nova de Lisboa³

01 Sep 2013-Computer Speech & Language

TL;DR: This work presents an on-line system designed to behave as a virtual therapist incorporating automatic speech recognition technology that permits aphasia patients to perform word naming training exercises and focuses on the study of the automatic word naming detector module.

...read moreread less

60 citations

Journal Article•DOI•

Adjusting dysarthric speech signals to be more intelligible

[...]

Frank Rudzicz¹•Institutions (1)

University of Toronto¹

01 Sep 2013-Computer Speech & Language

TL;DR: A system that transforms the speech signals of speakers with physical speech disabilities into a more intelligible form that can be more easily understood by listeners and a substantial step towards full automation in speech transformation without the need for expert or clinical intervention is presented.

...read moreread less

59 citations

Journal Article•DOI•

Improved automatic detection of creak

[...]

John Kane¹, Thomas Drugman², Christer Gobl¹•Institutions (2)

Trinity College, Dublin¹, University of Mons²

Universal attribute characterization of spoken languages for automatic spoken language recognition

TL;DR: A new algorithm for automatically detecting creak in speech signals is described, utilising two new acoustic parameters which are designed to characterise creaky excitations following previous evidence in the literature combined with new insights from observations in the current work.

...read moreread less

Journal Article•DOI•

[...]

Sabato Marco Siniscalchi¹, Jeremy Reed², Torbjørn Svendsen³, Chin-Hui Lee⁴•Institutions (4)

Kore University of Enna¹, Georgia Tech Research Institute², Norwegian University of Science and Technology³, Georgia Institute of Technology⁴

Towards incremental speech generation in conversational systems

TL;DR: Experimental evidence not only demonstrates the feasibility of the proposed techniques, but it shows that the proposed technique attains comparable performance to standard approaches on the LRE tasks investigated in this work when the same experimental conditions are adopted.

...read moreread less

Journal Article•DOI•

[...]

Gabriel Skantze¹, Anna Hjalmarsson¹•Institutions (1)

Royal Institute of Technology¹

Acoustic model adaptation using in-domain background models for dysarthric speech recognition

TL;DR: A model of incremental speech generation in practical conversational systems that allows a conversational system to incrementally interpret spoken input, while simultaneously planning, realising and self-monitoring the system response.

...read moreread less

Journal Article•DOI•

[...]

Harsh Vardhan Sharma¹, Mark Hasegawa-Johnson¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2013-Computer Speech & Language

TL;DR: This work proposes an interpolation-based technique for obtaining a prior acoustic model from one trained on unimpaired speech, before adapting it to the dysarthric talker, and tests it in conjunction with the well-known maximum a posteriori (MAP) adaptation algorithm.

...read moreread less

Journal Article•DOI•

A stereophonic acoustic signal extraction scheme for noisy and reverberant environments

[...]

Klaus Reindl¹, Yuanhang Zheng¹, Andreas Schwarz¹, Stefan Meier¹, Roland Maas¹, Armin Sehr¹, Walter Kellermann¹ - Show less +3 more•Institutions (1)

University of Erlangen-Nuremberg¹

Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing

TL;DR: A comparison to a simplified front-end based on a free-field assumption shows that the introduced system substantially improves the speech quality and the recognition performance under the considered adverse conditions.

...read moreread less

Journal Article•DOI•

[...]

Alexandros Tsilfidis¹, Iosif Mporas¹, John Mourjopoulos¹, Nikos Fakotakis¹•Institutions (1)

University of Patras¹

Modelling non-stationary noise with spectral factorisation in automatic speech recognition

TL;DR: It is shown that room acoustic parameters such as the clarity and the definition correlate well with the ASR results, and that the application of a recent dereverberation method based on perceptual modelling can be used and achieve significant Phone Recognition (PR) improvement, especially under highly reverberant conditions.

...read moreread less

Journal Article•DOI•

[...]

Antti Hurmalainen¹, Jort F. Gemmeke², Tuomas Virtanen¹•Institutions (2)

Tampere University of Technology¹, Katholieke Universiteit Leuven²

Employing hierarchical Bayesian networks in simple and complex emotion topic analysis

TL;DR: This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals to noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to -6dB.

...read moreread less

Journal Article•DOI•

[...]

Fuji Ren¹, Xin Kang¹•Institutions (1)

University of Tokushima¹

Use of contexts in language model interpolation and adaptation

TL;DR: The distribution of topics such as Friend and Job are found to be sensitive to the documents' emotions, which is called emotion topic variation in this paper, and reveals the deeper relationship between topics and emotions.

...read moreread less

Journal Article•DOI•

[...]

Xunying Liu¹, Mark J. F. Gales¹, Philip C. Woodland¹•Institutions (1)

University of Cambridge¹

Uncertainty-based learning of acoustic models from noisy data

TL;DR: This paper investigates the use of context dependent weighting in both interpolation and test-time adaptation of language models and proposes a range of schemes to combine weight information obtained from training data and test data hypotheses to improve robustness during context dependent LM adaptation.

...read moreread less

Journal Article•DOI•

[...]

Alexey Ozerov, Mathieu Lagrange¹, Emmanuel Vincent²•Institutions (2)

IRCAM¹, French Institute for Research in Computer Science and Automation²

Hierarchical ANN system for stuttering identification

TL;DR: A new expectation maximization (EM) based technique is introduced that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty, and results in 3-4% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data.

...read moreread less

Journal Article•DOI•

[...]

Izabela Wietlicka¹, WiesłAwa Kuniszyk-JóKowiak², Elbieta SmołKa²•Institutions (2)

University of Life Sciences in Lublin¹, Maria Curie-Skłodowska University²

Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation

TL;DR: Various types of MLP networks were examined with respect to their ability to classify utterances correctly into two, non-fluent and fluent, groups and classification correctness exceeded 84-100% depending on the disfluency type.

...read moreread less

Journal Article•DOI•

[...]

Emad M. Grais¹, Hakan Erdogan¹•Institutions (1)

Sabancı University¹

Speaker verification in score-ageing-quality classification space

TL;DR: The introduced methods improve the performance of single channel source separation for speech separation and speech-music separation with different NMF divergence functions and introduce novel update rules that solve the optimization problem efficiently for the new regularized NMF problem.

...read moreread less

Journal Article•DOI•

[...]

Finnian Kelly¹, Andrzej Drygajlo², Naomi Harte¹•Institutions (2)

Trinity College, Dublin¹, École Polytechnique Fédérale de Lausanne²

01 Aug 2013-Computer Speech & Language

TL;DR: This work represents the first comprehensive analysis of speaker verification on a longitudinal speaker database and successfully addresses the associated variability from ageing and quality arte-facts.

...read moreread less

Journal Article•DOI•

Batch-mode semi-supervised active learning for statistical machine translation

[...]

Sankaranarayanan Ananthakrishnan¹, Rohit Prasad¹, David Stallard¹, Prem Natarajan¹•Institutions (1)

BBN Technologies¹

01 Feb 2013-Computer Speech & Language

TL;DR: This work proposes a novel, semi-supervised, batch-mode active learning strategy that attempts to maximize in-domain coverage by selecting sentences, which represent a balance between domain match, translation difficulty, and batch diversity.

...read moreread less

Journal Article•DOI•

Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis

[...]

V. Ramu Reddy¹, K. Sreenivasa Rao¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Aug 2013-Computer Speech & Language

TL;DR: From the evaluation, it is observed that prediction accuracy is better for two-stage FFNN models, compared to the other models.

...read moreread less

Journal Article•DOI•

Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs

[...]

Alessio Brutti¹, Francesco Nesta¹•Institutions (1)

fondazione bruno kessler¹

Speaker state recognition using an HMM-based feature extraction method

TL;DR: Experiments on both synthetic and real data recorded by two distributed microphone pairs show that the proposed framework can detect and track up to five sources simultaneously active in a reverberant environment.

...read moreread less

Journal Article•DOI•

[...]

Rok Gajsek¹, F. Mihelic¹, Simon Dobrišek¹•Institutions (1)

University of Ljubljana¹

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

TL;DR: An efficient approach to modeling the acoustic features for the tasks of recognizing various paralinguistic phenomena by building a monophone-based Hidden Markov Model (HMM), able to achieve better results than the current state-of-the-art systems in both tasks.

...read moreread less

Journal Article•DOI•

[...]

Marc Delcroix¹, Keisuke Kinoshita¹, Tomohiro Nakatani¹, Shoko Araki¹, Atsunori Ogawa¹, Takaaki Hori¹, Shinji Watanabe¹, Masakiyo Fujimoto¹, Takuya Yoshioka¹, Takanobu Oba¹, Yotaro Kubo¹, Mehrez Souden¹, Seongjun Hahm¹, Atsushi Nakamura¹ - Show less +10 more•Institutions (1)

Nippon Telegraph and Telephone¹

Joint training of non-negative Tucker decomposition and discrete density hidden Markov models

TL;DR: This paper introduces a recognition system that can recognize speech in the presence of multiple rapidly time-varying noise sources as found in a typical family living room and approaches human performance levels by greatly improving the audible quality of speech and substantially improving the keyword recognition accuracy.

...read moreread less

Journal Article•DOI•

[...]

Meng Sun¹, Hugo Van hamme¹•Institutions (1)

Katholieke Universiteit Leuven¹

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

TL;DR: On the task of unsupervised spoken pattern discovery from the TIDIGITS database, both training schemes are observed to improve over BW training in terms of pattern purity, accuracy of the segmentation boundaries and accuracy for speech recognition.

...read moreread less

Journal Article•DOI•

[...]

Martin Wöllmer¹, Felix Weninger¹, Jürgen T. Geiger¹, Björn Schuller¹, Gerhard Rigoll¹ - Show less +1 more•Institutions (1)

Technische Universität München¹