Showing papers in &quot;Speech Communication in 1990&quot;

Speech database development at MIT: Timit and beyond

TL;DR: In a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation based on pitch-synchronous overlap-add approach are reviewed.

...read moreread less

1,438 citations

Journal Article•DOI•

[...]

Victor W. Zue¹, Stephanie Seneff¹, James Glass¹•Institutions (1)

Massachusetts Institute of Technology¹

ATR Japanese speech database as a tool of speech recognition and synthesis

TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

...read moreread less

570 citations

Journal Article•DOI•

[...]

Akira Kurematsu, Kazuya Takeda, Yoshinori Sagisaka, Shigeru Katagiri, Hisao Kuwabara, Kiyohiro Shikano - Show less +2 more

Alpha-nets: a recurrent “neural” network architecture with a hidden Markov model interpretation

TL;DR: A large-scale Japanese speech database has been described and has been used to develop algorithms in speech recognition and synthesis studies and to find acoustic, phonetic and linguistic evidence that will serve as basic data for speech technologies.

...read moreread less

282 citations

Journal Article•DOI•

[...]

J. S. Bridle¹•Institutions (1)

University of St Andrews¹

Speech recognition using hidden Markov models: a CMU perspective

TL;DR: A hidden Markov model isolated word recogniser using full likelihood scoring for each word model can be treated as a recurrent ‘neural’ network and can use back-propagation of partial derivatives to hill-climb on a measure of discriminability between words.

...read moreread less

143 citations

Journal Article•DOI•

[...]

Kai-Fu Lee¹, Hsiao-Wuen Hon¹, Mei-Yuh Hwang¹, Xuedong Huang¹•Institutions (1)

Carnegie Mellon University¹

Speech recognition in noisy environments with the aid of microphone arrays

TL;DR: This paper introduces Hidden Markov Modelling techniques, analyzes the reason for their success, and describes some improvements to the standard HMM used in SPHINX.

...read moreread less

96 citations

Journal Article•DOI•

[...]

Dirk Van Compernolle¹, W Ma¹, Fei Xie¹, Marc Van Diest¹•Institutions (1)

Katholieke Universiteit Leuven¹

Durational cues to word boundaries in clear speech

TL;DR: In this paper, the authors presented a microphone array adaptive beamformer with a dual function, which is suited to transmission as well as to use as input to speech recognition systems. But the performance of the beamformer was limited.

...read moreread less

84 citations

Journal Article•DOI•

[...]

Anne Cutler, Sally Butterfield

Figures of merit for assessing connected-word recognisers

TL;DR: It is found that speakers do indeed attempt to mark word boundaries in clear (though not in normal) speech; moreover, they differentiate betweenword boundaries in a way which suggests they are sensitive to listener needs.

...read moreread less

76 citations

Journal Article•DOI•

[...]

Melvyn J. Hunt

The long-term spectrum and perceived emotion

TL;DR: Experimental tests using data from the DARPA Resource Management Task confirm a prediction that DP scoring overestimates substitution errors and underestimates insertion and deletion errors and a new figure of merit, weighted total errors, takes all three kinds of errors into account and minimises bias.

...read moreread less

49 citations

Journal Article•DOI•

[...]

Jeffery Pittam¹, Cindy Gallois¹, Victor J. Callan¹•Institutions (1)

University of Queensland¹

Connected recognition with a recurrent network

TL;DR: The LTS was systematically related to the affective dimensions in certain frequency ranges and no significant sex or ethnic group effects were found.

...read moreread less

46 citations

Journal Article•DOI•

[...]

Gary M. Kuhn¹, Raymond L. Watrous², Raymond L. Watrous¹, B. Ladendorf¹•Institutions (2)

Princeton University¹, University of Toronto²

An intelligibility test using semantically unpredictable sentences: towards the quantification of linguistic complexity

TL;DR: This work attempted multi-talker, connected recognition of the spoken American English letter names b, d, e and v, using a recurrent neural network as the speech recognizer.

...read moreread less

34 citations

Journal Article•DOI•

[...]

Christian Benoît¹•Institutions (1)

Stendhal University¹

Comprehensive assessment of the telephone intelligibility of synthesized and natural speech

TL;DR: Results of the French test of intelligibility of text-to-speech synthesizers show that the “SAM” methodology is efficient for the assessment of TTS systems, as it allows comparisons of prosodic, coding, semantic and feed-forward factors between synthesizers.

...read moreread less

Journal Article•DOI•

[...]

Murray F. Spiegel¹, Mary Jo Altom¹, Marian J. Macchi¹, Karen L. Wallace²•Institutions (2)

Telcordia Technologies¹, New York University²

Evaluating text-to-speech systems: Some methodological aspects

TL;DR: A monosyllabic corpus for use in testing the consonant intelligibility of synthesized speech differs from those used in other tests in that it spans a wide variety of English sounds and is thus useful for diagnosis as well as for comparative assessment.

...read moreread less

Journal Article•DOI•

[...]

Renée van Bezooijen¹, Louis C. W. Pols¹•Institutions (1)

University of Amsterdam¹

Recognition of isolated words based on psychoacoustics and neurobiology

TL;DR: A selective overview is given of methods used for the evaluation of text-to-speech (TTS) systems, with some comments on their advantages and disadvantages.

...read moreread less

Journal Article•DOI•

[...]

T. Gramms¹, Hans Werner Strube¹•Institutions (1)

University of Göttingen¹

Analog I/O nets for syllable timing

TL;DR: A simple neural network for isolated word recognition constructed under consideration of neurobiological and psychoacoustical observations is described, showing that the different stages of preprocessing of the speech signal increase recognition rates significantly and are essential to achieve faultless recognition of a small vocabulary.

...read moreread less

Journal Article•DOI•

[...]

W. N. Campbell¹•Institutions (1)

IBM¹

Non-linear signal representation and its application to the modelling of the glottal waveform

TL;DR: Back-propagation has been used to train a small network for the prediction of syllable-level duration in a text-to-speech system and the net performs a multiple regression function.

...read moreread less

Journal Article•DOI•

[...]

Jean Schoentgen¹•Institutions (1)

Free University of Brussels¹

Phonetically-based multi-layered neural networks for classification

TL;DR: This work proposes here an alternative approach which consists of expanding the signal into a combination of a finite set of basic time functions, chosen taking into account the point-like and non-linear character of the acoustic voice source.

...read moreread less

Journal Article•DOI•

[...]

P. Cosi, Yoshua Bengio¹, R. De Mori¹•Institutions (1)

McGill University¹

Using self-organizing maps and multi-layered feed-forward nets to obtain phonemic transcriptions of spoken utterances

TL;DR: Vowel sub-component of a speaker-independent phoneme classification system based on an ear model followed by a set of Multi-Layered Neural Networks has good generalization capabilities over new speakers and new sounds.

...read moreread less

Journal Article•DOI•

[...]

Mikko Kokkonen¹, Kari Torkkola¹•Institutions (1)

Helsinki University of Technology¹

Use of dialogue, pragmatics and semantics to enhance speech recognition

TL;DR: In this article, the Self-Organizing Feature Maps (SOMM) were applied to vector-quantize speech into a sequence of phoneme labels a centisecond apart, which were converted into a phoneme string using a multi-layered feed-forward network trained with error back propagation.

...read moreread less

Journal Article•DOI•

[...]

Sheryl R. Young¹•Institutions (1)

Carnegie Mellon University¹

Detection of the glottal closure by jumps in the statistical properties of the speech signal

TL;DR: This paper addresses how knowledge of domain semantics, dialog, communication conventions and problem solving behavior are used to enhance automatic speech recognition and understanding and explains why the heuristics are effective.

...read moreread less

Journal Article•DOI•

[...]

Eric Moulines¹, R. Di Francesco¹•Institutions (1)

Centre national d'études des télécommunications¹

Time-frequency speech transformation based on an elementary waveform representation

TL;DR: Two new methods are presented here for the detection of the glottal closure instant from the speech waveform, both based on the maximization of the likelihood ratio, while the second uses a divergence convexity test.

...read moreread less

Journal Article•DOI•

[...]

Christophe d'Alessandro¹•Institutions (1)

Centre national de la recherche scientifique¹

Speech perception seen through the ear

TL;DR: The classical theory of speech production proves the validity of the EWSM parameters; their modifications yield well-localized time-frequency transformations, including frequency compression/expansion, pitch, formant and noise modification.

...read moreread less

Journal Article•DOI•

[...]

Christopher J. Darwin¹, John F. Culling¹•Institutions (1)

University of Sussex¹

On the perceptual analysis of intonation

TL;DR: Evidence is presented that both low-level grouping mechanisms and knowledge specific to speech are deployed in solving the problem of listeners' ability to separate speech from other sounds.

...read moreread less

Journal Article•DOI•

[...]

René Collier

Speech research in perspective

TL;DR: An experimental-phonetic approach to the study of speech melody, developed at the Institute for Perception Research (IPO), leads to intonation models which are helpful for the interpretation of acoustic and physiological data on pitch in natural speech.

...read moreread less

Journal Article•DOI•

[...]

Gunnar Fant

Comments on “distinctive regions and modes: a new theory of speech production” by M. Mrayati, R. Carre´ and B. Gue´rin

TL;DR: Calcium enriched orange juice made by the process can exhibit taste and color characteristics similar to non-calcium enrichedorange juice.

...read moreread less

Journal Article•DOI•

[...]

Louis-Jean Boë¹, Pascal Perrier¹•Institutions (1)

Stendhal University¹

CELP and sinusoidal coders: two solutions for speech coding at 4.8– 96 KBPS

TL;DR: A new vowel production theory is formulated and it is proposed that the production of vowels and consonants is based on these geometric and acoustic properties, since the eight regions can be linked to morphological and articulatory properties of the vocal tract.

...read moreread less

Journal Article•DOI•

[...]

Isabel Trancoso, Jorge S. Marques, Carlos M. Ribeiro

Estimation of vocal tract filter parameters using a neural net

TL;DR: This paper serves a double purpose: to review the coding methods which have been introduced during the past decade in the 4.8–9.6 kbps range, and to discuss the most recent research trends.

...read moreread less

Journal Article•DOI•

[...]

M. G. Rahim¹, C. C. Goodyear¹•Institutions (1)

University of Liverpool¹

Symbolic output as the basis for evaluating intonation in text-to-speech systems

TL;DR: A multilayer perceptron has been trained to perform an analogue mapping from the power spectra of vowels and nasal consonants, spoken by a single speaker, to the control parameters of a speech synthesiser based on an acoustic tube model.

...read moreread less

Journal Article•DOI•

[...]

A. I. C. Monaghan¹, D.R. Ladd¹•Institutions (1)

University of Edinburgh¹

Real-time portable multi-layer perceptron voice fundamental-period extractor for hearing aids and cochlear implants

TL;DR: An evaluation exercise carried out on the sentence-accent assignment rules of the CSTR system is presented, based on just such an abstract representation of prosodic features that has been useful in improving the rules.

...read moreread less

Journal Article•DOI•

[...]

J. R. Walliker¹, J. R. Walliker², Ian S. Howard¹•Institutions (2)

University College London¹, Guy's Hospital²

Classification of vowels in continuous speech using MLP and a hybrid net

TL;DR: A prototype pocket-sized portable device has been constructed and the real-time software transferred to it, which will provide the basis for a new generation of signal processing hearing aids for the profoundly and totally deaf.

...read moreread less

Journal Article•DOI•

[...]

P. Knagenhjelm¹, P. Brauer¹•Institutions (1)

Chalmers University of Technology¹