Showing papers in &quot;Speech Communication in 1999&quot;

Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation

TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.

...read moreread less

1,741 citations

Journal Article•DOI•

[...]

Steven Greenberg¹•Institutions (1)

International Computer Science Institute¹

Grammar fragment acquisition using syntactic and semantic clustering

TL;DR: Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is more systematic at the level of the syllable than at the phonetic-segment level, and syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents.

...read moreread less

373 citations

Journal Article•DOI•

[...]

Kazuhiro Arai, Jeremy Huntley Wright¹, Giuseppe Riccardi¹, Allen Louis Gorin¹•Institutions (1)

AT&T Labs¹

01 Feb 1999-Speech Communication

TL;DR: A method and apparatus are provided for automatically acquiring grammar fragments for recognizing and understanding fluently spoken language.

...read moreread less

334 citations

Journal Article•DOI•

Modeling pronunciation variation for ASR

[...]

Helmer Strik¹, Catia Cucchiarini¹•Institutions (1)

Radboud University Nijmegen¹

Speaker transformation algorithm using segmental codebooks (STASC)

TL;DR: This contribution provides an overview of the publications on pronunciation variation modeling in automatic speech recognition, paying particular attention to the papers in this special issue and the papers presented at 'the Rolduc workshop'.

...read moreread less

259 citations

Journal Article•DOI•

[...]

Levent M. Arslan¹•Institutions (1)

Boğaziçi University¹

01 Jul 1999-Speech Communication

TL;DR: A novel method is proposed which finds accurate alignments between source and target speaker utterances which modifies the utterance of a source speaker to sound-like speech from a target speaker.

...read moreread less

181 citations

Journal Article•DOI•

Effects of speaking rate and word frequency on pronunciations in conversational speech

[...]

Eric Fosler-Lussier¹, Nelson Morgan¹•Institutions (1)

University of California, Berkeley¹

Stochastic pronunciation modelling from hand-labelled phonetic corpora

TL;DR: This work argues that pronunciations in spontaneous speech are dynamic and that ASR systems should change models in accordance with contextual factors, and confirms the intuition that variations in these factors correlate with changes in ASR system performance for both the Switchboard and Broadcast News corpora.

...read moreread less

152 citations

Journal Article•DOI•

[...]

Michael Riley¹, William Byrne², Michael Finke³, Sanjeev Khudanpur², Andrej Ljolje¹, John McDonough², Harriet J. Nock⁴, Murat Saraclar³, Chuck Wooters⁵, G. Zavaliagkos - Show less +6 more•Institutions (5)

AT&T Labs¹, Johns Hopkins University², Carnegie Mellon University³, University of Cambridge⁴, United States Department of Defense⁵

Real-time beat tracking for drumless audio signals: chord change detection for musical decisions

TL;DR: Several approaches were described, including a hybrid approach in which a decision-tree model was used to automatically phonetically transcribe a much larger speech corpus than ICSI and then the multiword approachwas used to construct an ASR recognition pronunciation lexicon.

...read moreread less

142 citations

Journal Article•DOI•

[...]

Masataka Goto¹, Yoichi Muraoka¹•Institutions (1)

Waseda University¹

On the relative importance of various components of the modulation spectrum for automatic speech recognition

TL;DR: A real-time beat-tracking system that detects a hierarchical beat structure in musical audio signals without drum-sounds and a method of detecting chord changes that does not require chord names to be identified is proposed.

...read moreread less

136 citations

Journal Article•DOI•

[...]

Noboru Kanedera¹, Takayuki Arai², Hynek Hermansky³, Hynek Hermansky⁴, Misha Pavel⁵, Misha Pavel³ - Show less +2 more•Institutions (5)

Ishikawa National College of Technology¹, Sophia University², Oregon Health & Science University³, International Computer Science Institute⁴, AT&T Labs⁵

Mutual dependence of the octave-band weights in predicting speech intelligibility

TL;DR: Most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz, and in some realistic environments, the use of componentsfrom the range below 2 Hz or above 16 Hz can degrade the recognition accuracy.

...read moreread less

135 citations

Journal Article•DOI•

[...]

Herman J. M. Steeneken, Tammo Houtgast

01 Jun 1999-Speech Communication

TL;DR: There is evidence that the underlying assumption of additive (mutually independent) contributions from a number of frequency bands is not optimal and may lead to erroneous prediction of the intelligibility for conditions with a limited or with a discontinuous frequency transfer.

...read moreread less

105 citations

Journal Article•DOI•

A blackboard architecture for computational auditory scene analysis

[...]

Darryl Godsmark¹, Guy J. Brown¹•Institutions (1)

University of Sheffield¹

An acoustic description of consonant reduction

TL;DR: It is demonstrated that the model can replicate listeners' perception of interleaved melodies, and is also able to segregate melodic lines from polyphonic, multi-timbral audio recordings.

...read moreread less

Journal Article•DOI•

[...]

R.J.J.H. van Son¹, Louis C. W. Pols¹•Institutions (1)

University of Amsterdam¹

01 Jun 1999-Speech Communication

TL;DR: The acoustic results suggest that articulatory reduction will decrease the intelligibility of consonants and vowels in comparable ways.

...read moreread less

Journal Article•DOI•

Multiple period estimation and pitch perception model

[...]

Alain de Cheveigné¹, Hideki Kawahara²•Institutions (2)

University of Paris¹, Wakayama University²

Speech enhancement using linear prediction residual

TL;DR: This paper proposes a process in which the periodic sounds are canceled in turn (multistep cancellation model) or simultaneously (joint cancellation model), which is guaranteed to find all periods, except in certain situations for which the stimulus is inherently ambiguous.

...read moreread less

Journal Article•DOI•

[...]

B. Yegnanarayana¹, Carlos Avendano², Hynek Hermansky², P. Satyanarayana Murthy¹•Institutions (2)

Indian Institute of Technology Madras¹, Oregon Health & Science University²

Speech analysis and synthesis using an AM-FM modulation model

TL;DR: The objective is to selectively enhance the high signal-to-noise ratio (SNR) regions in the noisy speech in the temporal and spectral domains, without causing significant distortion in the resulting enhanced speech.

...read moreread less

Journal Article•DOI•

[...]

Alexandros Potamianos¹, Petros Maragos²•Institutions (2)

Alcatel-Lucent¹, National Technical University of Athens²

01 Jul 1999-Speech Communication

TL;DR: The perceptual importance of modulations in speech resonances is investigated and it is shown that amplitude modulation patterns are both speaker and phone dependent.

...read moreread less

Journal Article•DOI•

Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation

[...]

Judith M. Kessens¹, Mirjam Wester¹, Helmer Strik¹•Institutions (1)

Radboud University Nijmegen¹

A sound source identification system for ensemble music based on template adaptation and music stream extraction

TL;DR: How the performance of a Dutch continuous speech recognizer was improved by modeling pronunciation variation is described, which consists of adding pronunciation variants to the lexicon, retraining phone models and using language models to which the pronunciation variants have been added.

...read moreread less

Journal Article•DOI•

[...]

Kunio Kashino¹, Hiroshi Murase¹•Institutions (1)

Nippon Telegraph and Telephone¹

Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

TL;DR: An adaptive method for template matching that can cope with variability in musical sounds is proposed that is applicable to real performances of ensemble music and discusses musical context integration based on the Bayesian probabilistic networks.

...read moreread less

Journal Article•DOI•

[...]

Daniel P. W. Ellis¹•Institutions (1)

International Computer Science Institute¹

Pronunciation variants across system configuration, language and speaking style

TL;DR: A preliminary investigation supports the argument that successful scene analysis must exploit such abstract knowledge at every level.

...read moreread less

Journal Article•DOI•

[...]

Martine Adda-Decker¹, Lori Lamel¹•Institutions (1)

Centre national de la recherche scientifique¹

Harmonic sound stream segregation using localization and its application to speech stream segregation

TL;DR: To measure the need for variants the authors have defined the variant2+ rate which is the percentage of words in the corpus not aligned with the most common phonemic transcription which may be indicative of the possible need for pronunciation variants in the recognition system.

...read moreread less

Journal Article•DOI•

[...]

Tomohiro Nakatani, Hiroshi G. Okuno¹•Institutions (1)

Nippon Telegraph and Telephone¹

Joint lexicon, acoustic unit inventory and model design

TL;DR: Experimental results show that the method reduces the spectrum distortions and the fundamental frequency errors compared to an existing monaural system, and that it can segregate three simultaneous harmonic streams with only two microphones.

...read moreread less

Journal Article•DOI•

[...]

Michiel Bacchiani¹, Mari Ostendorf¹•Institutions (1)

Boston University¹

Acoustic characteristics of lexical stress in continuous telephone speech

TL;DR: A joint solution to the related problems of learning a unit inventory and corresponding lexicon from data on a speaker-independent read speech task with a 1k vocabulary, the proposed algorithm outperforms phone-based systems at both high and low complexities.

...read moreread less

Journal Article•DOI•

[...]

David van Kuijk¹, Loe Boves•Institutions (1)

Max Planck Society¹

01 Mar 1999-Speech Communication

TL;DR: The findings suggest that a lexical stress detector has little use for a single pass decoder in an automatic speech recognition (ASR) system, but could still play a useful role as an additional knowledge source in a multi-pass decoder.

...read moreread less

Journal Article•DOI•

Maximum likelihood modelling of pronunciation variation

[...]

Trym Holter¹, Torbjørn Svendsen²•Institutions (2)

SINTEF¹, Norwegian University of Science and Technology²

In search of better pronunciation models for speech recognition

TL;DR: A maximum likelihood based algorithm for fully automatic data-driven modelling of pronunciation, given a set of subword hidden Markov models (HMMs) and acoustic tokens of a word to create a consistent framework for optimisation of automatic speech recognition systems.

...read moreread less

Journal Article•DOI•

[...]

Nick Cremelie¹, Jean-Pierre Martens¹•Institutions (1)

Ghent University¹

Network optimizations for large-vocabulary speech recognition

TL;DR: A method for upgrading initially simple pronunciation models to new models that can explain several pronunciation variants of each word, and the introduction of such variants in a segment-based recognizer significantly improves the recognition accuracy.

...read moreread less

Journal Article•DOI•

[...]

Mehryar Mohri¹, Michael Riley¹•Institutions (1)

AT&T Labs¹

Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

TL;DR: Two new algorithms are described: weighted determinization and minimization, which transform recognition labeled networks into equivalent ones that require much less time and space in large-vocabulary speech recognition.

...read moreread less

Journal Article•DOI•

[...]

Sangho Lee¹, Yung-Hwan Oh¹•Institutions (1)

KAIST¹

01 Aug 1999-Speech Communication

TL;DR: To understand the performance of the tree-based modeling of prosodic phrasing, pause duration between phrases and segmental duration for Korean TTS systems, trees were trained and tested with the output of the text analyzer and their effectiveness was measured.

...read moreread less

Journal Article•DOI•

Hiatus and diphthong: Acoustic cues and speech situation differences

[...]

Lourdes Aguilar¹•Institutions (1)

Autonomous University of Barcelona¹

Automatic generation of multiple pronunciations based on neural networks

TL;DR: It is concluded that hiatus and diphthong are two phonetic categories which can be described on the basis of their acoustic characteristics and are subject, like any other phonetic category, to modifications due to a change in the communicative situation.

...read moreread less

Journal Article•DOI•

[...]

Toshiaki Fukada, Takayoshi Yoshimura, Yoshinori Sagisaka

01 Feb 1999-Speech Communication

TL;DR: This paper proposed a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations from the canonical pronunciation, which gives consistently higher recognition rates than a conventional dictionary.

...read moreread less

Journal Article•DOI•

Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences

[...]

Kuo-Hwei Yuo¹, Hsiao-Chuan Wang¹•Institutions (1)

National Tsing Hua University¹