Showing papers in &quot;Speech Communication in 1997&quot;

Speech recognition by machines and humans

TL;DR: This paper focuses on the task of automatically routing telephone calls based on a user's fluently spoken response to the open-ended prompt of “ How may I help you? ”.

...read moreread less

664 citations

Journal Article•DOI•

[...]

Richard P. Lippmann¹•Institutions (1)

Massachusetts Institute of Technology¹

MMIE training of large vocabulary recognition systems

TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.

...read moreread less

606 citations

Journal Article•DOI•

[...]

V. Valtchev¹, JJ Odell¹, Philip C. Woodland¹, Steve Young¹•Institutions (1)

University of Cambridge¹

Speech motor control: acoustic goals, saturation effects, auditory feedback and internal models

TL;DR: Experimental results show that MMIE optimisation of system structure and parameters can yield useful increases in recognition accuracy and the use of lattices makes MMIe training practicable for very complex recognition systems and large training sets.

...read moreread less

203 citations

Journal Article•DOI•

[...]

Joseph S. Perkell¹, Melanie L. Matthies², Harlan Lane³, Frank H. Guenther², Reiner Wilhelms-Tricarico¹, Jane Wozniak¹, Peter Guiod¹ - Show less +3 more•Institutions (3)

Massachusetts Institute of Technology¹, Boston University², Northeastern University³

Synthesis of V-V sequences with a 2D biomechanical tongue model controlled by the equilibrium point hypothesis

TL;DR: Findings of “motor-equivalent” trading relations between the contributions of two constrictions to the same acoustic transfer function provide preliminary support for the idea that segmental control is based on acoustic or auditory-perceptual goals.

...read moreread less

191 citations

Journal Article•DOI•

[...]

Yohan Payan¹, Pascal Perrier¹•Institutions (1)

Stendhal University¹

The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style

TL;DR: In this paper, a target-based control model of speech production using Feldman's Equilibrium Point Hypothesis is presented, which consists of simulations of articulatory movements during Vowel-to-Vowel sequences with a 2D biomechanical tongue model.

...read moreread less

141 citations

Journal Article•DOI•

[...]

Gitta P. M. Laan¹•Institutions (1)

University of Amsterdam¹

The voice source in connected speech

TL;DR: None of these acoustic features by itself can clearly discriminate between the two speaking styles, and it became clear that the performance of the speakers and the listeners varied enormously.

...read moreread less

118 citations

Journal Article•DOI•

[...]

Gunnar Fant

A pitch determination and voiced/unvoiced decision algorithm for noisy speech

TL;DR: An outline of the properties of the human voice source in connected speech is formed based on the transformed LF-model and frequency domain correspondences which allows for a maximal specificational power with a limited number of parameters.

...read moreread less

115 citations

Journal Article•DOI•

[...]

Jean Rouat¹, Yong Chun Liu¹, Daniel Morissette¹•Institutions (1)

Université du Québec à Chicoutimi¹

15 Apr 1997-Speech Communication

TL;DR: The voiced/unvoiced, unvoiced/voiced performance and pitch estimation errors for the proposed PDA and the reference system while utilising three speech databases are reported in details.

...read moreread less

106 citations

Journal Article•DOI•

Effects of phase on the perception of intervocalic stop consonants

[...]

Li Liu¹, Jialong He¹, Günther Palm¹•Institutions (1)

University of Ulm¹

Production models as a structural basis for automatic speech recognition

TL;DR: The perception of voicing in stops was found to rely strongly on phase information while the perception of the place of articulation was mainly determined by amplitude information, which demonstrated that phonetically different signals can be constructed by combining the same short-time amplitude spectra with different phase spectra.

...read moreread less

101 citations

Journal Article•DOI•

[...]

Li Deng¹, G. Ramsay¹, D. Sun²•Institutions (2)

University of Waterloo¹, Bell Labs²

Parabolic spectral parameter—a new method for quantification of the glottal flow

TL;DR: It is suggested that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating production models with the probabilistic analysis-by-synthesis strategy currently used by the technology community.

...read moreread less

98 citations

Journal Article•DOI•

[...]

Paavo Alku¹, Helmer Strik², Erkki Vilkman³•Institutions (3)

University of Turku¹, Radboud University Nijmegen², University of Oulu³

Learning to speak. Sensori-motor control of speech movements

TL;DR: A new frequency domain parameter, Parabolic Spectral Parameter (PSP), for the quantification of the glottal volume velocity waveform is presented and the performance of the new parameter is compared to three commonly used time-based parameters and to one previously developed frequency domain method.

...read moreread less

Journal Article•DOI•

[...]

Gérard Bailly¹•Institutions (1)

Stendhal University¹

Inference of variable-length linguistic and acoustic units by multigrams

TL;DR: This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language.

...read moreread less

Journal Article•DOI•

[...]

Sabine Deligne¹, Frédéric Bimbot¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Nov 1997-Speech Communication

TL;DR: A general formulation of the multigram model, applicable to single or multiple parallel strings of data having either discrete or continuous values, and used to infer a set of variable-length acoustic units, directly from speech data.

...read moreread less

Journal Article•DOI•

Modelling biphonation—the role of the vocal tract

[...]

Patrick Mergell, Hanspeter Herzel¹•Institutions (1)

Humboldt University of Berlin¹

Babbling and first words: phonetic similarities and differences

TL;DR: Bifurcations in 2-mass models of the vocal folds are analyzed and how the incorporation ofThe vocal tract affects bifurcation diagrams is studied to study how these features are affected by the underlying nonlinear dynamical system.

...read moreread less

Journal Article•DOI•

[...]

Peter F. MacNeilage¹, Barbara L. Davis¹, Christine L. Matyear¹•Institutions (1)

University of Texas at Austin¹

SSML: a speech synthesis markup language

TL;DR: Whether the first words are similar to babbling in all respects was evaluated in 4 subjects, using a database consisting of 152 hours of audio recording, and a tendency towards increasing use of labial consonants relative to alveolar consonants was observed.

...read moreread less

Journal Article•DOI•

[...]

Paul Taylor¹, Amy Isard•Institutions (1)

University of Edinburgh¹

The role of speech processing in human—computer intelligent communication

TL;DR: The need for standardisation in speech synthesizers and how this will help builders of systems make better use of synthesis is discussed, and the features of SSML (based on SGML, standard generalised markup language) are discussed.

...read moreread less

Journal Article•DOI•

[...]

Candace A. Kamm¹, Marilyn A. Walker¹, Lawrence R. Rabiner¹•Institutions (1)

AT&T Labs¹

01 Dec 1997-Speech Communication

TL;DR: It is argued that spoken language interfaces (SLIs) are essential to making this vision of ubiquitous access to multimedia communication services between people and machines a reality.

...read moreread less

Journal Article•DOI•

Prosodic and lexical indications of discourse structure in human-machine interactions

[...]

Marc Swerts, Mari Ostendorf¹•Institutions (1)

Boston University¹

The comparative perspective on spoken-language processing

TL;DR: The study reveals that in human-machine interactions, both discourse segmentation and utterance purpose can have particular prosodic correlates, although speakers also mark this information through choice of wording.

...read moreread less

Journal Article•DOI•

[...]

Anne Cutler¹•Institutions (1)

Max Planck Society¹

PADIS—an automatic telephone switchboard and directory information system

TL;DR: The study of spoken-language processing by human listeners requires cross-linguistic comparison, and aspects of the universal processing model are revealed by analysis of language-specific effects.

...read moreread less

Journal Article•DOI•

[...]

Andreas Kellner¹, B. Rueber¹, Frank Seide¹, Bach-Hiep Tran¹•Institutions (1)

Philips¹

Towards improving ASR robustness for PSN and GSM telephone applications

TL;DR: La regle de decision est implantee dans une approche multiniveaux correspondant a une combinaison d'une reconnaissance de parole au niveau of l'etat de l'art, d'un algorithme de recherche des N-meilleures solutions, qui est egalement decrit dans cet article.

...read moreread less

Journal Article•DOI•

[...]

Chafic Mokbel¹, Laurent Mauuary¹, Lamia Karray¹, Denis Jouvet¹, Jean Monne¹, Jacques Simonin¹, Katarina Bartkova¹ - Show less +3 more•Institutions (1)

Orange S.A.¹

Context modeling and the generation of spoken discourse

TL;DR: The results obtained prove that HMM adaptation and preprocessing techniques can be advantageously combined to improve Automatic Speech Recognition (ASR) robustness and show that spectral subtraction improves speech detection under noisy GSM conditions.

...read moreread less

Journal Article•DOI•

[...]

Kees van Deemter, Jan Odijk

Filtering the time sequences of spectral parameters for speech recognition

TL;DR: The Dial-Your-Disc system is presented, an interactive system that supports browsing through a large database of musical information and generates a spoken monologue once a musical composition has been selected.

...read moreread less

Journal Article•DOI•

[...]

Climent Nadeu¹, Pau Pachès-Leal¹, Biing-Hwang Juang²•Institutions (2)

Polytechnic University of Catalonia¹, Alcatel-Lucent²

The LIMSI RailTel system: field trial of a telephone service for rail travel information

TL;DR: The analysis of the average long-term spectrum of the successfully filtered sequences reveals a combined effect of equalization and band selection that provides insights into TSSP filtering, and it is shown that, when supplementary differential parameters are not used, the recognition rate can be improved even for clean speech, just by properly filtering the TSSPs.

...read moreread less

Journal Article•DOI•

[...]

Lori Lamel¹, S. Bennacef¹, Sophie Rosset¹, Laurence Devillers¹, S. Foukia¹, J. J. Gangolf¹, Jean-Luc Gauvain¹ - Show less +3 more•Institutions (1)

Centre national de la recherche scientifique¹

Experiments with a spoken dialogue system for taking the US census

TL;DR: The RailTel system developed at LIMSI to provide vocal access to static train timetable information in French is described, and a field trial carried out to assess the technical adequacy of available speech technology for interactive services is described.

...read moreread less

Journal Article•DOI•

[...]

R. Cole¹, David G. Novick¹, Pieter J. Vermeulen¹, Stephen Sutton¹, Mark Fanty¹, L. F. A. Wessels¹, J. H. de Villiers¹, J. Schalkwyk¹, Brian Hansen¹, Daniel C. Burnett¹ - Show less +6 more•Institutions (1)

Oregon Health & Science University¹

01 Nov 1997-Speech Communication

TL;DR: It is shown empirically that an automated spoken questionnaire could successfully collect and recognize census data, and that subjects preferred the spoken system to written questionnaires.

...read moreread less

Journal Article•DOI•

Towards a tool for predicting speech functionality

[...]

Niels Ole Bernsen¹•Institutions (1)

Odense University¹

01 Nov 1997-Speech Communication

TL;DR: It is shown that a small set of modality properties are surprisingly powerful in justifying, supporting and correcting the claims set, and it is argued that their power could be made available to systems and interface designers who have to make modality choices during early design of speech-related systems and interfaces.

...read moreread less

Journal Article•DOI•

Towards a prominence-based synthesis system

[...]

Thomas Portele¹, Barbara Heuft¹•Institutions (1)

University of Bonn¹

Predictable and random components of jitter

TL;DR: The results indicate the validity of the prominence based approach as an interface between linguistics and acoustics, and two algorithms to transform prominence values to prosodic parameters are evaluated.

...read moreread less

Journal Article•DOI•

[...]

Jean Schoentgen¹, Raoul De Guchteneere¹•Institutions (1)

Université libre de Bruxelles¹

01 May 1997-Speech Communication

TL;DR: A discriminant analysis of jitter time series extracted from 279 sustained vocoids shows that the jitter features which separately describe the predictable and random components better characterise healthy and dysphonic speakers than a traditional jitter feature.

...read moreread less

Journal Article•DOI•

Analysis by synthesis of pathological voices using the Klatt synthesizer

[...]

Philbert Bangayan¹, Christopher J. Long¹, Abeer Alwan¹, Jody Kreiman¹, Bruce R. Gerratt¹ - Show less +1 more•Institutions (1)

University of California, Los Angeles¹

Telephone speech recognition based on Bayesian adaptation of hidden Markov models

TL;DR: Several modifications to the Klatt synthesizer may improve synthesis of pathological voices, including providing jitter and shimmer parameters; updating synthesis parameters as a function of period, rather than absolute time; modeling diplophonia with independent parameters for fundamental frequency and amplitude variations; providing a parameter to increase low-frequency energy; and adding more pole-zero pairs.

...read moreread less

Journal Article•DOI•

[...]

Jen-Tzung Chien¹, Hsiao-Chuan Wang¹•Institutions (1)

National Tsing Hua University¹