Showing papers in &quot;Speech Communication in 2007&quot;

Primitives-based evaluation and estimation of emotions in speech

TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.

...read moreread less

507 citations

Journal Article•DOI•

[...]

Michael Grimm¹, Kristian Kroschel¹, Emily Mower², Shrikanth S. Narayanan²•Institutions (2)

Karlsruhe Institute of Technology¹, University of Southern California²

Ensemble methods for spoken emotion recognition in call-centres

TL;DR: An image-based, text-free evaluation system is presented that provides intuitive assessment of emotion primitives, and yields high inter-evaluator agreement, and speaker-dependent modeling of emotion expression is proposed since the emotionPrimitives are particularly suited for capturing dynamics and intrinsic variations in emotion expression.

...read moreread less

309 citations

Journal Article•DOI•

[...]

Donn Morrison¹, Ruili Wang¹, Liyanage C. De Silva¹•Institutions (1)

Massey University¹

Do you speak E-NG-L-I-SH? A comparison of foreigner- and infant-directed speech

TL;DR: This research aims to improve the automatic perception of vocal emotion in two ways: compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech, and look at two classification methods which have not been applied: stacked generalisation and unweighted vote.

...read moreread less

305 citations

Journal Article•DOI•

[...]

Maria Uther¹, Monja Knoll², Denis K Burnham³•Institutions (3)

Brunel University London¹, University of Portsmouth², University of Western Sydney³

Linear hidden transformations for adaptation of hybrid ANN/HMM models

TL;DR: It was found that, compared with British adult-directed speech, vowels were equivalently hyperarticulated in infant- and foreigner-directedspeech, and that linguistic modifications are independent of vocal pitch and affective valence.

...read moreread less

215 citations

Journal Article•DOI•

[...]

Roberto Gemello, Franco Mana, Stefano Scanzio¹, Pietro Laface¹, Renato De Mori² - Show less +1 more•Institutions (2)

Polytechnic University of Turin¹, University of Avignon²

Automatic discrimination between laughter and speech

TL;DR: The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations.

...read moreread less

173 citations

Journal Article•DOI•

[...]

Khiet P. Truong, David A. van Leeuwen

Frequency-domain criterion for the speech distortion weighted multichannel Wiener filter for robust noise reduction

TL;DR: The development of a gender-independent laugh detector is described with the aim to enable automatic emotion recognition and acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughed and speech.

...read moreread less

169 citations

Journal Article•DOI•

[...]

Simon Doclo¹, Ann Spriet¹, Jan Wouters¹, Marc Moonen¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jul 2007-Speech Communication

TL;DR: It is shown that the SP-SDW-MWF is more robust against signal model errors than the GSC, and that the block-structured step size matrix gives rise to a faster convergence and a better tracking performance than the diagonal step size Matrix, only at a slightly higher computational cost.

...read moreread less

167 citations

Journal Article•DOI•

Multisyn: Open-domain unit selection for the Festival speech synthesis system

[...]

Robert A. J. Clark¹, Korin Richmond¹, Simon King¹•Institutions (1)

University of Edinburgh¹

01 Apr 2007-Speech Communication

TL;DR: The implementation and evaluation of an open-domain unit selection speech synthesis engine designed to be flexible enough to encourage further unit selection research and allow rapid voice development by users with minimal speech synthesis knowledge and experience are presented.

...read moreread less

161 citations

Journal Article•DOI•

Improving the intelligibility of dysarthric speech

[...]

Alexander Kain¹, John-Paul Hosom¹, Xiaochuan Niu¹, Jan P. H. van Santen¹, Melanie Fried-Oken¹, Janice Staehely¹ - Show less +2 more•Institutions (1)

Oregon Health & Science University¹

01 Sep 2007-Speech Communication

TL;DR: This study significantly improved the intelligibility of dysarthric vowels of one speaker from 48% to 54%, as evaluated by a vowel identification task using 64 CVC stimuli judged by 24 listeners.

...read moreread less

161 citations

Journal Article•DOI•

An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

[...]

Mohammad Shami¹, Werner Verhelst¹•Institutions (1)

Vrije Universiteit Brussel¹

01 Mar 2007-Speech Communication

TL;DR: The robustness of approaches to the automatic classification of emotions in speech is addressed and it is suggested that existing approaches are efficient enough to handle larger amounts of training data without any reduction in classification accuracy.

...read moreread less

Journal Article•DOI•

Acoustic variability and automatic recognition of children's speech

[...]

Matteo Gerosa, Diego Giuliani, Fabio Brugnara

Reaching over the gap: A review of efforts to link human and automatic speech recognition research

TL;DR: The use of several methods for speaker adaptive acoustic modeling to cope with inter-speaker spectral variability and to improve recognition performance for children proved to be effective in recognition of read speech with a vocabulary of about 11k words.

...read moreread less

Journal Article•DOI•

[...]

Odette Scharenborg¹•Institutions (1)

A sociopragmatic study of apology speech act realization patterns in Persian

01 May 2007-Speech Communication

TL;DR: An overview of past and present efforts to link human and automatic speech recognition research is provided and an overview of the literature describing the performance difference between machines and human listeners is presented.

...read moreread less

Journal Article•DOI•

[...]

Akbar Afghari

01 Mar 2007-Speech Communication

TL;DR: The research findings indicated that Persian apologies are as formulaic in pragmatic structures as in English apologies and the values assigned to the two context-external variables were found to have significant effect on the frequency of the intensifiers in different situations.

...read moreread less

Journal Article•DOI•

Modelling speaker intelligibility in noise

[...]

Jon Barker¹, Martin Cooke¹•Institutions (1)

Speech signal enhancement through adaptive wavelet thresholding

01 May 2007-Speech Communication

TL;DR: Using a single set of speaker-independent, noise-level-independent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords.

...read moreread less

Journal Article•DOI•

[...]

Michael T. Johnson, Xiaolong Yuan¹, Yao Ren•Institutions (1)

Motorola¹

Chirp group delay analysis of speech signals

TL;DR: Overall results indicate that SNR and SSNR improvements for the proposed approach are comparable to those of the Ephraim Malah filter, with BWT enhancement giving the best results of all methods for the noisiest (-10db and -5db input SNR) conditions.

...read moreread less

Journal Article•DOI•

[...]

Baris Bozkurt¹, Laurent Couvreur¹, Thierry Dutoit¹•Institutions (1)

Faculté polytechnique de Mons¹

01 Mar 2007-Speech Communication

TL;DR: It is shown that chirp group delay representations are potentially useful for improving ASR performance and presented one application in feature extraction for automatic speech recognition (ASR), which can be guaranteed to be spike-free.

...read moreread less

Journal Article•DOI•

Phonetic and lexical interferences in informational masking during speech-in-speech comprehension

[...]

Michel Hoen¹, Fanny Meunier¹, Claire-Léonie Grataloup¹, François Pellegrino¹, Nicolas Grimault¹, Fabien Perrin¹, Xavier Perrot¹, Lionel Collet¹ - Show less +4 more•Institutions (1)

University of Lyon¹

A Laplacian-based MMSE estimator for speech enhancement

TL;DR: Results confirm that lexical masking occurs only when some words in the babble are detectable, and suggest that different levels of linguistic information can be extracted from background babble and cause different types of linguistic competition for target-word identification.

...read moreread less

Journal Article•DOI•

[...]

Bin Chen¹, Philipos C. Loizou¹•Institutions (1)

University of Texas at Dallas¹

Highly accurate children's speech recognition for interactive reading tutors using subword units

TL;DR: The present study demonstrates that the assumed distribution of the DFT coefficients can have a significant effect on the quality of the enhanced speech and derive the MMSE estimator under speech presence uncertainty and a Laplacian statistical model.

...read moreread less

Journal Article•DOI•

[...]

Andreas Hagen¹, Bryan L. Pellom¹, Ronald A. Cole¹•Institutions (1)

University of Colorado Boulder¹

Spoken language processing: Piecing together the puzzle

TL;DR: It is demonstrated that speech recognition error rates for interactive read aloud can be reduced by more than 50% through a combination of advances in both statistical language and acoustic modeling.

...read moreread less

Journal Article•DOI•

[...]

Roger K. Moore¹•Institutions (1)

A data-driven approach to optimizing spectral speech enhancement methods for various error criteria

01 May 2007-Speech Communication

TL;DR: It is argued that progress is hampered by the fragmentation of the field across many different disciplines, coupled with a failure to create an integrated view of the fundamental mechanisms that underpin one organism's ability to communicate with another.

...read moreread less

Journal Article•DOI•

[...]

J.S. Erkelens¹, Jesper Jensen¹, Richard Heusdens¹•Institutions (1)

Delft University of Technology¹

01 Jul 2007-Speech Communication

TL;DR: It is shown that the ''decision-directed'' approach for speech spectral variance estimation can have an important bias at low SNRs, which generally leads to too much speech suppression.

...read moreread less

Journal Article•DOI•

The role of early fundamental frequency rises and elbows in French word segmentation

[...]

Pauline Welby¹•Institutions (1)

University of Grenoble¹

The effect of voice cuing on releasing Chinese speech from informational masking

TL;DR: The results provide support for an autosegmental-metrical account of the intonational phonology of French in which the early rise is a bitonal (LH) phrase accent that serves as a cue to content word beginnings.

...read moreread less

Journal Article•DOI•

[...]

Zhigang Yang¹, Jing Chen¹, Qiang Huang¹, Xihong Wu¹, Yanhong Wu¹, Bruce A. Schneider², Liang Li¹ - Show less +3 more•Institutions (2)

Peking University¹, University of Toronto²

Multistyle classification of speech under stress using feature subset selection based on genetic algorithms

TL;DR: The results suggest that in addition to content cues, voice cues can be used by Chinese listeners to release speech from masking by other talkers.

...read moreread less

Journal Article•DOI•

[...]

Salvatore Casale¹, Alessandra Russo¹, Salvatore Serrano¹•Institutions (1)

University of Catania¹

Thai speech processing technology: A review

TL;DR: This study proposes a new feature vector that will allow better classification of emotional/stressed states and achieves good discrimination between neutral, angry, loud and Lombard states for the simulated domain of the Speech Under Simulated and Actual Stress (SUSAS) database.

...read moreread less

Journal Article•DOI•

[...]

Chai Wutiwiwatchai, Sadaoki Furui¹•Institutions (1)

Tokyo Institute of Technology¹

Exploiting correlogram structure for robust speech recognition with multiple speech sources

TL;DR: This paper reviews the progress of Thai speech technology in five areas of research: fundamental analyses and tools, text-to-speech synthesis (TTS), automatic speech recognition (ASR), speech applications, and language resources.

...read moreread less

Journal Article•DOI•

[...]

Ning Ma¹, Phil D. Green¹, Jon Barker¹, André Coy¹•Institutions (1)

From syntax to acoustic duration: A dynamical model of speech rhythm production

TL;DR: Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy, which is compared to a conventional fragment generation approach.

...read moreread less

Journal Article•DOI•

[...]

Plínio Almeida Barbosa¹•Institutions (1)

State University of Campinas¹

01 Sep 2007-Speech Communication

TL;DR: A probabilistic algorithm for phrase stress assignment accounts for both prominence and constituency prosodic relations by considering the coupling between a dependency-grammar system of markers and constituent-size constraints, which copes with intra- and inter-speaker prosodic variability.

...read moreread less

Journal Article•DOI•

Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

[...]

Yoo Rhee Oh¹, Jae Sam Yoon¹, Hong Kook Kim¹•Institutions (1)

Gwangju Institute of Science and Technology¹

An MRI analysis of the extrinsic tongue muscles during vowel production

TL;DR: It is shown from the continuous Korean-English speech recognition experiments that the proposed method can achieve the average word error rate reduction by 12.75% when compared with the speech recognition system with the baseline acoustic models trained by native speech.

...read moreread less

Journal Article•DOI•

[...]

Sayoko Takano, Kiyoshi Honda