scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 1990"


Journal ArticleDOI
TL;DR: In a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation based on pitch-synchronous overlap-add approach are reviewed.

1,438 citations


Journal ArticleDOI
TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

570 citations


Journal ArticleDOI
TL;DR: A large-scale Japanese speech database has been described and has been used to develop algorithms in speech recognition and synthesis studies and to find acoustic, phonetic and linguistic evidence that will serve as basic data for speech technologies.

282 citations


Journal ArticleDOI
TL;DR: A hidden Markov model isolated word recogniser using full likelihood scoring for each word model can be treated as a recurrent ‘neural’ network and can use back-propagation of partial derivatives to hill-climb on a measure of discriminability between words.

143 citations


Journal ArticleDOI
TL;DR: This paper introduces Hidden Markov Modelling techniques, analyzes the reason for their success, and describes some improvements to the standard HMM used in SPHINX.

96 citations


Journal ArticleDOI
TL;DR: In this paper, the authors presented a microphone array adaptive beamformer with a dual function, which is suited to transmission as well as to use as input to speech recognition systems. But the performance of the beamformer was limited.

84 citations


Journal ArticleDOI
TL;DR: It is found that speakers do indeed attempt to mark word boundaries in clear (though not in normal) speech; moreover, they differentiate betweenword boundaries in a way which suggests they are sensitive to listener needs.

76 citations


Journal ArticleDOI
TL;DR: Experimental tests using data from the DARPA Resource Management Task confirm a prediction that DP scoring overestimates substitution errors and underestimates insertion and deletion errors and a new figure of merit, weighted total errors, takes all three kinds of errors into account and minimises bias.

49 citations


Journal ArticleDOI
TL;DR: The LTS was systematically related to the affective dimensions in certain frequency ranges and no significant sex or ethnic group effects were found.

46 citations


Journal ArticleDOI
TL;DR: This work attempted multi-talker, connected recognition of the spoken American English letter names b, d, e and v, using a recurrent neural network as the speech recognizer.

34 citations


Journal ArticleDOI
TL;DR: Results of the French test of intelligibility of text-to-speech synthesizers show that the “SAM” methodology is efficient for the assessment of TTS systems, as it allows comparisons of prosodic, coding, semantic and feed-forward factors between synthesizers.

Journal ArticleDOI
TL;DR: A monosyllabic corpus for use in testing the consonant intelligibility of synthesized speech differs from those used in other tests in that it spans a wide variety of English sounds and is thus useful for diagnosis as well as for comparative assessment.

Journal ArticleDOI
TL;DR: A selective overview is given of methods used for the evaluation of text-to-speech (TTS) systems, with some comments on their advantages and disadvantages.

Journal ArticleDOI
TL;DR: A simple neural network for isolated word recognition constructed under consideration of neurobiological and psychoacoustical observations is described, showing that the different stages of preprocessing of the speech signal increase recognition rates significantly and are essential to achieve faultless recognition of a small vocabulary.

Journal ArticleDOI
W. N. Campbell1
TL;DR: Back-propagation has been used to train a small network for the prediction of syllable-level duration in a text-to-speech system and the net performs a multiple regression function.

Journal ArticleDOI
TL;DR: This work proposes here an alternative approach which consists of expanding the signal into a combination of a finite set of basic time functions, chosen taking into account the point-like and non-linear character of the acoustic voice source.

Journal ArticleDOI
TL;DR: Vowel sub-component of a speaker-independent phoneme classification system based on an ear model followed by a set of Multi-Layered Neural Networks has good generalization capabilities over new speakers and new sounds.

Journal ArticleDOI
TL;DR: In this article, the Self-Organizing Feature Maps (SOMM) were applied to vector-quantize speech into a sequence of phoneme labels a centisecond apart, which were converted into a phoneme string using a multi-layered feed-forward network trained with error back propagation.

Journal ArticleDOI
TL;DR: This paper addresses how knowledge of domain semantics, dialog, communication conventions and problem solving behavior are used to enhance automatic speech recognition and understanding and explains why the heuristics are effective.

Journal ArticleDOI
TL;DR: Two new methods are presented here for the detection of the glottal closure instant from the speech waveform, both based on the maximization of the likelihood ratio, while the second uses a divergence convexity test.

Journal ArticleDOI
TL;DR: The classical theory of speech production proves the validity of the EWSM parameters; their modifications yield well-localized time-frequency transformations, including frequency compression/expansion, pitch, formant and noise modification.

Journal ArticleDOI
TL;DR: Evidence is presented that both low-level grouping mechanisms and knowledge specific to speech are deployed in solving the problem of listeners' ability to separate speech from other sounds.

Journal ArticleDOI
TL;DR: An experimental-phonetic approach to the study of speech melody, developed at the Institute for Perception Research (IPO), leads to intonation models which are helpful for the interpretation of acoustic and physiological data on pitch in natural speech.

Journal ArticleDOI
TL;DR: Calcium enriched orange juice made by the process can exhibit taste and color characteristics similar to non-calcium enrichedorange juice.

Journal ArticleDOI
TL;DR: A new vowel production theory is formulated and it is proposed that the production of vowels and consonants is based on these geometric and acoustic properties, since the eight regions can be linked to morphological and articulatory properties of the vocal tract.

Journal ArticleDOI
TL;DR: This paper serves a double purpose: to review the coding methods which have been introduced during the past decade in the 4.8–9.6 kbps range, and to discuss the most recent research trends.

Journal ArticleDOI
TL;DR: A multilayer perceptron has been trained to perform an analogue mapping from the power spectra of vowels and nasal consonants, spoken by a single speaker, to the control parameters of a speech synthesiser based on an acoustic tube model.

Journal ArticleDOI
TL;DR: An evaluation exercise carried out on the sentence-accent assignment rules of the CSTR system is presented, based on just such an abstract representation of prosodic features that has been useful in improving the rules.

Journal ArticleDOI
TL;DR: A prototype pocket-sized portable device has been constructed and the real-time software transferred to it, which will provide the basis for a new generation of signal processing hearing aids for the profoundly and totally deaf.

Journal ArticleDOI
TL;DR: The result shows that the best performance is achieved with the two ANN-classifiers and indicates that a Kohonen map does not deteriorate the information presented to the second layer in the network and hence can be used instead of a first hidden layer.