scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 1999"


Journal ArticleDOI
TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.

1,741 citations


Journal ArticleDOI
TL;DR: Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is more systematic at the level of the syllable than at the phonetic-segment level, and syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents.

373 citations


Journal ArticleDOI
TL;DR: A method and apparatus are provided for automatically acquiring grammar fragments for recognizing and understanding fluently spoken language.

334 citations


Journal ArticleDOI
TL;DR: This contribution provides an overview of the publications on pronunciation variation modeling in automatic speech recognition, paying particular attention to the papers in this special issue and the papers presented at 'the Rolduc workshop'.

259 citations


Journal ArticleDOI
TL;DR: A novel method is proposed which finds accurate alignments between source and target speaker utterances which modifies the utterance of a source speaker to sound-like speech from a target speaker.

181 citations


Journal ArticleDOI
TL;DR: This work argues that pronunciations in spontaneous speech are dynamic and that ASR systems should change models in accordance with contextual factors, and confirms the intuition that variations in these factors correlate with changes in ASR system performance for both the Switchboard and Broadcast News corpora.

152 citations


Journal ArticleDOI
TL;DR: Several approaches were described, including a hybrid approach in which a decision-tree model was used to automatically phonetically transcribe a much larger speech corpus than ICSI and then the multiword approachwas used to construct an ASR recognition pronunciation lexicon.

142 citations


Journal ArticleDOI
TL;DR: A real-time beat-tracking system that detects a hierarchical beat structure in musical audio signals without drum-sounds and a method of detecting chord changes that does not require chord names to be identified is proposed.

136 citations


Journal ArticleDOI
TL;DR: Most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz, and in some realistic environments, the use of componentsfrom the range below 2 Hz or above 16 Hz can degrade the recognition accuracy.

135 citations


Journal ArticleDOI
TL;DR: There is evidence that the underlying assumption of additive (mutually independent) contributions from a number of frequency bands is not optimal and may lead to erroneous prediction of the intelligibility for conditions with a limited or with a discontinuous frequency transfer.

105 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the model can replicate listeners' perception of interleaved melodies, and is also able to segregate melodic lines from polyphonic, multi-timbral audio recordings.

Journal ArticleDOI
TL;DR: The acoustic results suggest that articulatory reduction will decrease the intelligibility of consonants and vowels in comparable ways.

Journal ArticleDOI
TL;DR: This paper proposes a process in which the periodic sounds are canceled in turn (multistep cancellation model) or simultaneously (joint cancellation model), which is guaranteed to find all periods, except in certain situations for which the stimulus is inherently ambiguous.

Journal ArticleDOI
TL;DR: The objective is to selectively enhance the high signal-to-noise ratio (SNR) regions in the noisy speech in the temporal and spectral domains, without causing significant distortion in the resulting enhanced speech.

Journal ArticleDOI
TL;DR: The perceptual importance of modulations in speech resonances is investigated and it is shown that amplitude modulation patterns are both speaker and phone dependent.

Journal ArticleDOI
TL;DR: How the performance of a Dutch continuous speech recognizer was improved by modeling pronunciation variation is described, which consists of adding pronunciation variants to the lexicon, retraining phone models and using language models to which the pronunciation variants have been added.

Journal ArticleDOI
TL;DR: An adaptive method for template matching that can cope with variability in musical sounds is proposed that is applicable to real performances of ensemble music and discusses musical context integration based on the Bayesian probabilistic networks.

Journal ArticleDOI
TL;DR: A preliminary investigation supports the argument that successful scene analysis must exploit such abstract knowledge at every level.

Journal ArticleDOI
TL;DR: To measure the need for variants the authors have defined the variant2+ rate which is the percentage of words in the corpus not aligned with the most common phonemic transcription which may be indicative of the possible need for pronunciation variants in the recognition system.

Journal ArticleDOI
TL;DR: Experimental results show that the method reduces the spectrum distortions and the fundamental frequency errors compared to an existing monaural system, and that it can segregate three simultaneous harmonic streams with only two microphones.

Journal ArticleDOI
TL;DR: A joint solution to the related problems of learning a unit inventory and corresponding lexicon from data on a speaker-independent read speech task with a 1k vocabulary, the proposed algorithm outperforms phone-based systems at both high and low complexities.

Journal ArticleDOI
TL;DR: The findings suggest that a lexical stress detector has little use for a single pass decoder in an automatic speech recognition (ASR) system, but could still play a useful role as an additional knowledge source in a multi-pass decoder.

Journal ArticleDOI
TL;DR: A maximum likelihood based algorithm for fully automatic data-driven modelling of pronunciation, given a set of subword hidden Markov models (HMMs) and acoustic tokens of a word to create a consistent framework for optimisation of automatic speech recognition systems.

Journal ArticleDOI
TL;DR: A method for upgrading initially simple pronunciation models to new models that can explain several pronunciation variants of each word, and the introduction of such variants in a segment-based recognizer significantly improves the recognition accuracy.

Journal ArticleDOI
Mehryar Mohri1, Michael Riley1
TL;DR: Two new algorithms are described: weighted determinization and minimization, which transform recognition labeled networks into equivalent ones that require much less time and space in large-vocabulary speech recognition.

Journal ArticleDOI
Sangho Lee1, Yung-Hwan Oh1
TL;DR: To understand the performance of the tree-based modeling of prosodic phrasing, pause duration between phrases and segmental duration for Korean TTS systems, trees were trained and tested with the output of the text analyzer and their effectiveness was measured.

Journal ArticleDOI
TL;DR: It is concluded that hiatus and diphthong are two phonetic categories which can be described on the basis of their acoustic characteristics and are subject, like any other phonetic category, to modifications due to a change in the communicative situation.

Journal ArticleDOI
TL;DR: This paper proposed a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations from the canonical pronunciation, which gives consistently higher recognition rates than a conventional dictionary.

Journal ArticleDOI
TL;DR: A new representation of speech that is invariant to noise is introduced, and the proposed features are shown to be superior to other robust representations and compensation techniques.

Journal ArticleDOI
TL;DR: It is found that cry vocalizations of hearing-impaired infants differ from those of their counterparts with normal hearing abilities due to the lack of auditory feedback, and melodic and rhythmic parameters are extracted which differ significantly for the two infant groups.