An improved sub-word based speech recognizer

doi:10.1109/ICASSP.1989.266375

Home
/
Papers
/
An improved sub-word based speech recognizer

Proceedings Article•DOI•

An improved sub-word based speech recognizer

Torbjørn Svendsen¹, Kuldip K. Paliwal¹, E. Harborg¹, P. O. Husoy¹•Institutions (1)

Norwegian Institute of Technology¹

23 May 1989-pp 108-111

TL;DR: The authors describe a system for speaker-dependent speech recognition based on acoustic subword units that showed results comparable to those of whole-word-based systems.

read less

Abstract: The authors describe a system for speaker-dependent speech recognition based on acoustic subword units. Several strategies for automatic generation of an acoustic lexicon are outlined. Preliminary tests have been performed on a small vocabulary. In these tests, the proposed system showed results comparable to those of whole-word-based systems. >

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Automatic speech recognition and speech variability: A review

[...]

Mohamed Faouzi BenZeghiba, R. De Mori, Olivier Deroo, Stéphane Dupont, T. Erbes, D. Jouvet, Luciano Fissore, Pietro Laface, Alfred Mertins, Christophe Ris, Richard Rose, Vivek Tyagi, Christian Wellekens - Show less +9 more

01 Oct 2007-Speech Communication

TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.

...read moreread less

507 citations

Cites background from "An improved sub-word based speech r..."

...In [290], the usual assumption is made that the piecewise quasi-stationary segments (QSS) of the speech signal can be modeled by a Gaussian autoregressive (AR) process of a fixed orderp as in [7, 272, 273]....
[...]
...In Tyagi et al. (2005), the usual assumption is made that the piecewise quasi-stationary segments (QSS) of the speech signal can be modeled by a Gaussian autoregressive (AR) process of a fixed order p as in Andre-Obrecht (1988), Svendsen et al. (1989), Svendsen and Soong (1987)....
[...]

Moving beyond the 'beads-on-a-string' model of speech

[...]

Mari Ostendorf

01 Jan 1999

TL;DR: Problems with the phoneme as the basic subword unit in speech recognition are raised, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech.

...read moreread less

Abstract: The notion that a word is composed of a sequence of phone segments, sometimes referred to as ‘beads on a string’, has formed the basis of most speech recognition work for over 15 years. However, as more researchers tackle spontaneous speech recognition tasks, that view is being called into question. This paper raises problems with the phoneme as the basic subword unit in speech recognition, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech. We offer two different alternatives – automatically derived subword units and linguistically motivated distinctive feature systems – and discuss current work in these directions. In addition, we look at problems that arise in acoustic modeling when trying to incorporate higher-level structure with these two strategies.

...read moreread less

151 citations

Cites background from "An improved sub-word based speech r..."

...ASWUs were proposed several years ago [10, 11, 12, 13], but they faded from view as speaker-independent recognition became the primary goal, because of the difficulty of distinguishing speaker variability from real pronunciation differences....
[...]

Journal Article•DOI•

Automatic generation of subword units for speech recognition systems

[...]

Rita Singh¹, Bhiksha Raj², Richard M. Stern¹•Institutions (2)

Carnegie Mellon University¹, Mitsubishi Electric²

07 Aug 2002-IEEE Transactions on Speech and Audio Processing

TL;DR: This paper presents a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions, and permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.

...read moreread less

Abstract: Large vocabulary continuous speech recognition (LVCSR) systems traditionally represent words in terms of smaller subword units. Both during training and during recognition, they require a mapping table, called the dictionary, which maps words into sequences of these subword units. The performance of the LVCSR system depends critically on the definition of the subword units and the accuracy of the dictionary. In current LVCSR systems, both these components are manually designed. While manually designed subword units generalize well, they may not be the optimal units of classification for the specific task or environment for which an LVCSR system is trained. Moreover, when human expertise is not available, it may not be possible to design good subword units manually. There is clearly a need for data-driven design of these LVCSR components. In this paper, we present a complete probabilistic formulation for the automatic design of subword units and dictionary, given only the acoustic data and their transcriptions. The proposed framework permits easy incorporation of external sources of information, such as the spellings of words in terms of a nonideographic script.

...read moreread less

82 citations

Cites background from "An improved sub-word based speech r..."

...The problem of automatic identification of subword units has been addressed by several researchers in the past [1]–[6]....
[...]

Journal Article•DOI•

Joint lexicon, acoustic unit inventory and model design

[...]

Michiel Bacchiani¹, Mari Ostendorf¹•Institutions (1)

Boston University¹

01 Nov 1999-Speech Communication

TL;DR: A joint solution to the related problems of learning a unit inventory and corresponding lexicon from data on a speaker-independent read speech task with a 1k vocabulary, the proposed algorithm outperforms phone-based systems at both high and low complexities.

...read moreread less

66 citations

Cites background or methods from "An improved sub-word based speech r..."

...Cluster centroids therefore directly represent unit models and clustering addresses both the inventory and model design problems, whereas in (Svendsen et al., 1989; Paliwal, 1990; Holter and Svendsen, 1997a) unit model parameters had to be estimated in a separate step from the data partition de®ned by clustering....
[...]
...Cluster centroids therefore directly represent unit models and clustering addresses both the inventory and model design problems, whereas in (Svendsen et al., 1989; Paliwal, 1990; Holter and Svendsen, 1997a) unit model parameters had to be estimated in a separate step from the data partition de®ned…...
[...]
...Over the last decade, a number of researchers have looked into this problem and found algorithms that automatically de®ne model inventories and estimate unit model parameters (Lee et al., 1989; Svendsen et al., 1989; Bahl et al., 1993; Bacchiani et al., 1996)....
[...]
...The clustering algorithm used here diers from that used in (Svendsen et al., 1989; Paliwal, 1990; Holter and Svendsen, 1997a) in that maximum likelihood is used as an objective rather than minimum Euclidean distance....
[...]
...The two basic steps of any unit inventory design algorithm are an acoustic segmentation step followed by a clustering step (e.g., Lee et al., 1989; Svendsen et al., 1989; Bacchiani et al., 1996; Holter and Svendsen, 1997a)....
[...]

Journal Article•DOI•

Maximum likelihood modelling of pronunciation variation

[...]

Trym Holter¹, Torbjørn Svendsen²•Institutions (2)

SINTEF¹, Norwegian University of Science and Technology²

01 Nov 1999-Speech Communication

TL;DR: A maximum likelihood based algorithm for fully automatic data-driven modelling of pronunciation, given a set of subword hidden Markov models (HMMs) and acoustic tokens of a word to create a consistent framework for optimisation of automatic speech recognition systems.

...read moreread less

63 citations

1
2
3
4
…
5
6
7
8
9

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

An Algorithm for Vector Quantizer Design

[...]

Y. Linde¹, A. Buzo², Robert M. Gray³•Institutions (3)

Codex Corporation¹, National Autonomous University of Mexico², Stanford University³

01 Jan 1980-IEEE Transactions on Communications

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.

...read moreread less

Abstract: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in Linear Predictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.

...read moreread less

7,935 citations

Journal Article•DOI•

An introduction to hidden Markov models

[...]

Lawrence R. Rabiner¹, Biing-Hwang Juang•Institutions (1)

Bell Labs¹

01 Jan 1986-IEEE Assp Magazine

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

4,546 citations

Journal Article•DOI•

On the use of bandpass liftering in speech recognition

[...]

Biing-Hwang Juang¹, Lawrence R. Rabiner², Jay G. Wilpon²•Institutions (2)

Bell Labs¹, AT&T²

01 Jul 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.

...read moreread less

Abstract: In a template-based speech recognition system, distortion measures that compute the distance or dissimilarity between two spectral representations have a strong influence on the performance of the recognizer. Accordingly, extensive comparative studies have been conducted to determine good distortion measures for improved recognition accuracy. Previous studies have shown that the log likelihood ratio measure, the likelihood ratio measure, and the truncated cepstral measures all gave good recognition performance (comparable accuracy) for isolated word recognition tasks. In this paper we extend the interpretation of distortion measures, based upon the observation that measurements of speech spectral envelopes (as normally obtained from standard analysis procedures such as LPC or filter banks) are prone to statistical variations due to window position fluctuations, excitation interference, measurement noise, etc., and may not accurately characterize the true speech spectrum because of analysis model constraints. We have found that these undesirable spectral measurement variations can be partially controlled (i.e., reduced in the level of variation) by appropriate signal processing techniques. In particular, we have found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer. We have applied this liftering process to several speech recognition tasks: in particular, single frame vowel recognition and isolated word recognition. Using the liftering process, we have been able to achieve an average digit error rate of 1 percent in a speaker-independent isolated digit test. This error rate is about one-half that obtained without the liftering process.

...read moreread less

291 citations

Journal Article•DOI•

Isolated and Connected Word Recognition--Theory and Selected Applications

[...]

Lawrence R. Rabiner¹, Stephen E. Levinson¹•Institutions (1)

Bell Labs¹

01 May 1981-IEEE Transactions on Communications

TL;DR: This paper discusses word recognition as a classical pattern-recognition problem and shows how some fundamental concepts of signal processing, information theory, and computer science can be combined to give us the capability of robust recognition of isolated words and simple connected word sequences.

...read moreread less

Abstract: The art and science of speech recognition have been advanced to the state where it is now possible to communicate reliably with a computer by speaking to it in a disciplined manner using a vocabulary of moderate size. It is the purpose of this paper to outline two aspects of speech-recognition research. First, we discuss word recognition as a classical pattern-recognition problem and show how some fundamental concepts of signal processing, information theory, and computer science can be combined to give us the capability of robust recognition of isolated words and simple connected word sequences. We then describe methods whereby these principles, augmented by modern theories of formal language and semantic analysis, can be used to study some of the more general problems in speech recognition. It is anticipated that these methods will ultimately lead to accurate mechanical recognition of fluent speech under certain controlled conditions.

...read moreread less

246 citations

Proceedings Article•DOI•

Acoustic Markov models used in the Tangora speech recognition system

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Michael Picheny¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: An automatic technique for constructing Markov word models is described and results are included of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks.

...read moreread less

Abstract: The Speech Recognition Group at IBM Research has developed a real-time, isolated-word speech recognizer called Tangora, which accepts natural English sentences drawn from a vocabulary of 20000 words. Despite its large vocabulary, the Tangora recognizer requires only about 20 minutes of speech from each new user for training purposes. The accuracy of the system and its ease of training are largely attributable to the use of hidden Markov models in its acoustic match component. An automatic technique for constructing Markov word models is described and results are included of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks. >

...read moreread less

245 citations