Showing papers in &quot;Speech Communication in 2008&quot;

A geometric approach to spectral subtraction

TL;DR: Experimental results demonstrate that the MLE- based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.

...read moreread less

251 citations

Journal Article•DOI•

The sound of sarcasm

[...]

Henry S. Cheang¹, Marc D. Pell¹•Institutions (1)

McGill University¹

01 May 2008-Speech Communication

TL;DR: It was concluded that sarcasm in speech can be characterized by a specific pattern of prosodic cues in addition to textual cues, and that these acoustic characteristics can be influenced by language used by the speaker.

...read moreread less

199 citations

Journal Article•DOI•

[...]

Yang Lu¹, Philipos C. Loizou¹•Institutions (1)

University of Texas at Dallas¹

Extraction and representation of prosodic features for language and speaker recognition

TL;DR: Analysis of the gain function of the proposed spectral subtraction algorithm indicated that it possesses similar properties as the traditional MMSE algorithm, and Objective evaluation showed that it performed significantly better than the traditional spectral subtractive algorithm.

...read moreread less

194 citations

Journal Article•DOI•

[...]

Leena Mary¹, B. Yegnanarayana²•Institutions (2)

Indian Institute of Technology Madras¹, International Institute of Information Technology, Hyderabad²

01 Oct 2008-Speech Communication

TL;DR: A new approach for extracting and representing prosodic features directly from the speech signal, and syllable-like unit is chosen as the basic unit for representing the prosodic characteristics.

...read moreread less

190 citations

Journal Article•DOI•

Fear-type emotion recognition for future audio-based surveillance systems

[...]

Chloé Clavel, Ioana Vasilescu¹, Laurence Devillers¹, Gael Richard², Thibaut Ehrette - Show less +1 more•Institutions (2)

Centre national de la recherche scientifique¹, Télécom ParisTech²

Towards human-like spoken dialogue systems

TL;DR: The SAFE corpus (situation analysis in a fictional and emotional corpus) based on fiction movies is developed and a task-dependent annotation strategy which has the particularity to describe simultaneously the emotion and the situation evolution in context is defined.

...read moreread less

184 citations

Journal Article•DOI•

[...]

Jens Edlund, Joakim Gustafson, Mattias Heldner, Anna Hjalmarsson

Automating spoken dialogue management design using machine learning: An industry perspective

TL;DR: The two-way mimicry target is presented, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies.

...read moreread less

145 citations

Journal Article•DOI•

[...]

Tim Paek¹, Roberto Pieraccini•Institutions (1)

Microsoft¹

A statistical approach to spoken dialog systems design and evaluation

TL;DR: How dialogue management is handled in industry is discussed and to what extent current state-of-the-art machine learning methods can be of practical benefit to application developers who are deploying commercial production systems is critically evaluated.

...read moreread less

143 citations

Journal Article•DOI•

[...]

David Griol¹, Lluís F. Hurtado¹, Encarna Segarra¹, Emilio Sanchis¹•Institutions (1)

Polytechnic University of Valencia¹

An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification

TL;DR: A statistical approach for the development of a dialog manager and for learning optimal dialog strategies based on a classification procedure that considers all of the previous history of the dialog to select the next system answer is presented.

...read moreread less

119 citations

Journal Article•DOI•

[...]

Xugang Lu¹, Jianwu Dang¹•Institutions (1)

Japan Advanced Institute of Science and Technology¹

Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners

TL;DR: This paper proposed a new physiological feature which emphasizes individual information for text-independent speaker identification by using a non-uniform subband processing strategy to emphasize the physiological information involved in speech production.

...read moreread less

111 citations

Journal Article•DOI•

[...]

Michiko Watanabe¹, Keikichi Hirose¹, Yasuharu Den², Nobuaki Minematsu¹•Institutions (2)

University of Tokyo¹, Chiba University²

01 Feb 2008-Speech Communication

TL;DR: Investigation of whether filled pauses affect listeners' predictions about the complexity of upcoming phrases in Japanese found that FPs cause listeners to expect that the speaker is going to refer to something that is likely to be expressed by a relatively long or complex constituent.

...read moreread less

Journal Article•DOI•

Speech to sign language translation system for Spanish

[...]

Rubén San-Segundo¹, R. Barra¹, Ricardo de Córdoba¹, Luis Fernando D'Haro¹, F. Fernández¹, Javier Ferreiros¹, J. M. Lucas¹, Javier Macias-Guarasa², Juan Manuel Montero¹, José Manuel Pardo¹ - Show less +6 more•Institutions (2)

Technical University of Madrid¹, University of Alcalá²

01 Nov 2008-Speech Communication

TL;DR: The development of and the first experiments in a Spanish to sign language translation system in a real domain focusing on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card are described.

...read moreread less

Journal Article•DOI•

A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

[...]

Umit H. Yapanel¹, John H. L. Hansen¹•Institutions (1)

University of Texas at Dallas¹

01 Feb 2008-Speech Communication

TL;DR: A novel feature extraction algorithm, perceptual-MVDR (PMVDR), which computes cepstral coefficients from the speech signal is proposed, shown to better model the speech spectrum compared to traditional feature extraction approaches.

...read moreread less

Journal Article•DOI•

Influence of contextual information in emotion annotation for spoken dialogue systems

[...]

Zoraida Callejas¹, Ramón López-Cózar¹•Institutions (1)

University of Granada¹

01 May 2008-Speech Communication

TL;DR: The inclusion of the history of user-system interaction and the neutral speaking style of users is proposed to automatically include in the annotation of emotions making use of novel techniques for acoustic normalization and dialogue context annotation.

...read moreread less

Journal Article•DOI•

Supervised and unsupervised learning of multidimensionally varying non-native speech categories

[...]

Martijn Goudbeek¹, Anne Cutler¹, Roel Smits¹•Institutions (1)

Max Planck Society¹

01 Feb 2008-Speech Communication

TL;DR: In four experiments in which listeners were presented with novel categories based on vowels of Dutch, feedback was either available or not; this comparison showed supervised learning to be significantly superior to unsupervised learning.

...read moreread less

Journal Article•DOI•

A new approach for the adaptation of HMMs to reverberation and background noise

[...]

Hans-Günter Hirsch, Harald Finster

A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments

TL;DR: A new approach is presented to adapt the energy and spectral parameters of HMMs as well as their time derivatives to the modifications by the speech input in a reverberant environment to combine the adaptation to background noise and unknown frequency characteristics.

...read moreread less

Journal Article•DOI•

[...]

Tomohiro Nakatani¹, Shigeaki Amano¹, Toshio Irino², Kentaro Ishizuka¹, Tadahisa Kondo¹ - Show less +1 more•Institutions (2)

Nippon Telegraph and Telephone¹, Wakayama University²

Adapting speaking after evidence of misrecognition: Local and global hyperarticulation

TL;DR: The ripple-enhanced power spectrum based method (REPS) and the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates, and the degree of dominance defined based on the IF is introduced as a robust voicing decision measure.

...read moreread less

Journal Article•DOI•

[...]

Amanda Stent¹, Marie K. Huffman¹, Susan E. Brennan¹•Institutions (1)

Stony Brook University¹

Issues with uncertainty decoding for noise robust automatic speech recognition

TL;DR: This paper reports the results of an experiment in which speakers spoke to a simulated speech recognizer and received text feedback about what had been ''recognized'' and coded for adaptations associated with hyperarticulate speech: speaking rate and phonetically clear speech.

...read moreread less

Journal Article•DOI•

[...]

Hank Liao¹, Mark J. F. Gales¹•Institutions (1)

University of Cambridge¹

European Portuguese MRI based speech production studies

TL;DR: It is shown that a model-based joint uncertainty decoding approach does not suffer from this limitation, like these front-end forms do, and is more computationally attractive.

...read moreread less

Journal Article•DOI•

[...]

Paula Martins¹, Inês Carbone¹, Alda Pinto², Augusto Silva¹, Antônio Lúcio Teixeira¹ - Show less +1 more•Institutions (2)

University of Aveiro¹, University of Coimbra²

01 Nov 2008-Speech Communication

TL;DR: A recently acquired magnetic resonance imaging database including almost all classes of European Portuguese sounds, excluding taps and trills, is presented and analyzed, and European Portuguese stops revealed less resistant to coarticulatory effects than fricatives.

...read moreread less

Journal Article•DOI•

Implicit processing of emotional prosody in a foreign versus native language

[...]

Marc D. Pell¹, Vera Skorup¹•Institutions (1)

McGill University¹

Automatic extraction of paralinguistic information using prosodic features related to F0, duration and voice quality

TL;DR: Results indicated that English listeners automatically detect the emotional significance of prosody when expressed in a foreign language, although activation of emotional meanings in aforeign language may require increased exposure to prosodic information than when listening to the native language.

...read moreread less

Journal Article•DOI•

[...]

Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita

Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion

TL;DR: Experimental results indicated that the classical prosodic features, i.e., F0 and duration, were effective for discriminating groups of paralinguistic information expressing intentions, and accounted for 57% of the global detection rate, in a task of discriminating seven groups ofParalinguism information.

...read moreread less

Journal Article•DOI•

[...]

George Almpanidis¹, Constantine Kotropoulos¹•Institutions (1)

Aristotle University of Thessaloniki¹

01 Jan 2008-Speech Communication

TL;DR: This work presents a text-independent automatic phone segmentation algorithm based on the Bayesian Information Criterion, and uses a computationally inexpensive maximum likelihood approach for parameter estimation to evaluate the efficiency and demonstrate that the proposed adjustments yield significant performance improvement in noisy environments.

...read moreread less

Journal Article•DOI•

Predicting the quality and usability of spoken dialogue services

[...]

Sebastian Möller¹, Klaus-Peter Engelbrecht¹, Robert Schleicher¹•Institutions (1)

Technical University of Berlin¹

The vocal communication of different kinds of smile

TL;DR: It is shown that - although an accurate prediction of individual ratings is not yet possible with such models - they may still be used for taking decisions on component optimization, and are thus helpful tools for the system developer.

...read moreread less

Journal Article•DOI•

[...]

Amy Drahota¹, Alan Costall¹, Vasudevi Reddy¹•Institutions (1)

University of Portsmouth¹

Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

TL;DR: The study established that listeners can discriminate different smile types and indicated that listeners utilize prototypical ideals to discern whether a person is smiling, regardless of whether the speaker is actually smiling.

...read moreread less

Journal Article•DOI•

[...]

Fernando Batista¹, Diamantino Caseiro², Nuno J. Mamede², Isabel Trancoso²•Institutions (2)

INESC-ID¹, Technical University of Lisbon²

01 Oct 2008-Speech Communication

TL;DR: A study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions, using finite state transducers automatically built from language models; and maximum entropy models.

...read moreread less

Journal Article•DOI•

Low-frequency vocal modulations in vowels produced by Parkinsonian subjects

[...]

Laurence Cnockaert¹, Jean Schoentgen¹, Pascal Auzou, Canan Ozsancak, L. Defebvre, Francis Grenez¹ - Show less +2 more•Institutions (1)

Université libre de Bruxelles¹

A three-layered model for expressive speech perception

TL;DR: The objective is to discover differences between speaker groups in F"0 low-frequency modulations and show that Parkinson's disease has different effects on the voice of male and female speakers.

...read moreread less

Journal Article•DOI•

[...]

Chun-Fang Huang¹, Masato Akagi¹•Institutions (1)

Japan Advanced Institute of Science and Technology¹

01 Oct 2008-Speech Communication

TL;DR: A three-layer model: five categories of expressive speech constitute the top layer, semantic primitives constitute the middle layer, and acoustic features, the bottom layer is introduced, showing significant relationships between expressive speech, semanticPrimitives, andoustic features.

...read moreread less

Journal Article•DOI•

A Reinforcement Learning approach to evaluating state representations in spoken dialogue systems

[...]

Joel Tetreault¹, Diane J. Litman²•Institutions (2)

Princeton University¹, University of Pittsburgh²

Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment

TL;DR: This work investigates how to create and evaluate the best state space representations for a Reinforcement Learning model to learn an optimal dialogue control strategy and presents three metrics for evaluating the impact of different state models.

...read moreread less

Journal Article•DOI•

[...]

Xu Shao¹, Jon Barker¹•Institutions (1)

University of Sheffield¹