Lexicon-building methods for an acoustic sub-word based speech recognizer

doi:10.1109/ICASSP.1990.115888

Home
/
Papers
/
Lexicon-building methods for an acoustic sub-word based speech recognizer

Proceedings Article•DOI•

Lexicon-building methods for an acoustic sub-word based speech recognizer

Kuldip K. Paliwal¹•Institutions (1)

Tata Institute of Fundamental Research¹

03 Apr 1990-Vol. 1990, pp 729-732

TL;DR: The use of an acoustic subword unit (ASWU)-based speech recognition system for the recognition of isolated words is discussed and it is shown that the use of a modified k-means algorithm on the likelihoods derived through the Viterbi algorithm provides the best deterministic-type of word lexicon.

read less

Abstract: The use of an acoustic subword unit (ASWU)-based speech recognition system for the recognition of isolated words is discussed. Some methods are proposed for generating the deterministic and the statistical types of word lexicon. It is shown that the use of a modified k-means algorithm on the likelihoods derived through the Viterbi algorithm provides the best deterministic-type of word lexicon. However, the ASWU-based speech recognizer leads to better performance with the statistical type of word lexicon than with the deterministic type. Improving the design of the word lexicon makes it possible to narrow the gap in the recognition performances of the whole word unit (WWU)-based and the ASWU-based speech recognizers considerably. Further improvements are expected by designing the word lexicon better. >

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Feature Trajectory Dynamic Time Warping for Clustering of Speech Segments.

[...]

Lerato Lerato¹, Thomas Niesler¹•Institutions (1)

Stellenbosch University¹

30 Oct 2018-arXiv: Sound

TL;DR: A modification to DTW that performs individual and independent pairwise alignment of feature trajectories is proposed that is applied as a similarity measure in the agglomerative hierarchical clustering of speech segments.

...read moreread less

Abstract: Dynamic time warping (DTW) can be used to compute the similarity between two sequences of generally differing length. We propose a modification to DTW that performs individual and independent pairwise alignment of feature trajectories. The modified technique, termed feature trajectory dynamic time warping (FTDTW), is applied as a similarity measure in the agglomerative hierarchical clustering of speech segments. Experiments using MFCC and PLP parametrisations extracted from TIMIT and from the Spoken Arabic Digit Dataset (SADD) show consistent and statistically significant improvements in the quality of the resulting clusters in terms of F-measure and normalised mutual information (NMI).

...read moreread less

5 citations

Journal Article•DOI•

Subunits Inference and Lexicon Development Based on Pairwise Comparison of Utterances and Signs

[...]

Sandrine Tornay, Mathew Magimai.-Doss

26 Sep 2019-Information-an International Interdisciplinary Journal

TL;DR: A hidden Markov model-based abstract methodology to extract subword units given only pairwise comparison between utterances (or realizations of words in the mode of communication), i.e., whether two utterances correspond to the same word or not is developed.

...read moreread less

Abstract: Communication languages convey information through the use of a set of symbols or units. Typically, this unit is word. When developing language technologies, as words in a language do not have the same prior probability, there may not be sufficient training data for each word to model. Furthermore, the training data may not cover all possible words in the language. Due to these data sparsity and word unit coverage issues, language technologies employ modeling of subword units or subunits, which are based on prior linguistic knowledge. For instance, development of speech technologies such as automatic speech recognition system presume that there exists a phonetic dictionary or at least a writing system for the target language. Such knowledge is not available for all languages in the world. In that direction, this article develops a hidden Markov model-based abstract methodology to extract subword units given only pairwise comparison between utterances (or realizations of words in the mode of communication), i.e., whether two utterances correspond to the same word or not. We validate the proposed methodology through investigations on spoken language and sign language. In the case of spoken language, we demonstrate that the proposed methodology can lead up to discovery of phone set and development of phonetic dictionary. In the case of sign language, we demonstrate how hand movement information can be effectively modeled for sign language processing and synthesized back to gain insight about the derived subunits.

...read moreread less

5 citations

Cites methods from "Lexicon-building methods for an aco..."

...These methods typically involved [20,21]: (a) segmentation of speech utterances based on an acoustic similarity measure; (b) clustering of the segments using methods such as k-means to a pre-set number of subword units/clusters; and (c) finding pronunciations for each word from the occurrences of subword units in the training data followed by another clustering step to select representative pronunciations....
[...]

Dissertation•

Design of Detectors for Automatic Speech Recognition

[...]

Alfonso M. Canterla

01 Jan 2012

TL;DR: This thesis presents methods and results for optimizing subword detectors in continuous speech, and proposes a detection-based automatic speech recognition system based on an MLP/Viterbi decoder.

...read moreread less

Abstract: This thesis presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. Firstly, we propose a structure suitable for subword detection. This structure is based on the standard HMM framework, but in each detector the MFCC feature extractor and the models are trained for the specific detection problem. Our experiments in the TIMIT database validate the effectiveness of this structure for detection of phones and articulatory features. Secondly, two discriminative training techniques are proposed for detector training. The first one is a modification of Minimum Classification Error training. The second one, Minimum Detection Error training, is the adaptation of Minimum Phone Error to the detection problem. Both methods are used to train HMMs and filterbanks in the detectors, isolated or jointly. MDE has the advantage that any detection performance criterion can be optimized directly. F-score and class accuracy optimization experiments show that MDE training is superior to the MCE-based method. The optimized filterbanks reflect some acoustical properties of the detection classes. Moreover, some changes are consistent over classes with similar acoustical properties. In addition, MDE-training of filterbanks results in filters significatively different than in the standard filterbank. In fact, some filters extract information from different critical bands. Finally, we propose a detection-based automatic speech recognition system. Detectors are built with the proposed HMM-based detection structure and trained discriminatively. The linguistic merger is based on an MLP/Viterbi decoder.

...read moreread less

3 citations

Journal Article•DOI•

Research Proposal Paper on Sanskrit Voice Engine: Convert Text-to-Audio in Sanskrit/Hindi

[...]

Piyush Mishra, Jainendra Shukla

31 May 2013-International Journal of Computer Applications

TL;DR: The proposed system is capable of teaching “Sanskrit Language” with the help of “Hindi Language" and its main motivation is to utilize the similarities of the two languages to add on to the Sanskrit learning environments in accordance to the “Modern Education Scenario”.

...read moreread less

Abstract: This paper presents, Methodology,Application area& some Results obtained in association to the proposal of an “Automated System” Our proposed system is capable of teaching “Sanskrit Language” with the help of “Hindi Language” System would have two major modules under its consideration ie “Teaching” & “Evaluation” The system’s main motivation is to utilize the similarities of the two languages &add on to the Sanskrit learning environments in accordance to the “Modern Education Scenario”

...read moreread less

3 citations

Additional excerpts

...[8] K.K. Paliwal,1990,”Lexicon-Building Methods For An Acoustic Sub-Word Based Speech Recognizer”, CH2847-2/90/0000-0729,IEEE....
[...]

Dissertation•

Using Sub-Phonemic Units for HMM Based Phone Recognition

[...]

Jarle Bauck Hamar

01 Jan 2013

TL;DR: A common way to construct a large vocabulary continuous speech recogniser LVCSR is to use 3 state HMMs to model phonemic units, and this dissertation aims to improve this standard phone mode.

...read moreread less

Abstract: A common way to construct a large vocabulary continuous speech recogniser LVCSR is to use 3 state HMMs to model phonemic units. In this dissertation the focus is to improve this standard phone mode ...

...read moreread less

3 citations

Cites methods from "Lexicon-building methods for an aco..."

...A segmentation approach based on dynamic programming (DP) was proposed in [39], and used later in [35, 40, 41, 42, 43]....
[...]

1
2
…
3
4
5
6
7
8
9
…

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A tutorial on hidden Markov models and selected applications in speech recognition

[...]

Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

...read moreread less

21,819 citations

Journal Article•DOI•

An Algorithm for Vector Quantizer Design

[...]

Y. Linde¹, A. Buzo², Robert M. Gray³•Institutions (3)

Codex Corporation¹, National Autonomous University of Mexico², Stanford University³

01 Jan 1980-IEEE Transactions on Communications

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.

...read moreread less

Abstract: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in Linear Predictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.

...read moreread less

7,935 citations

Journal Article•DOI•

A Maximum Likelihood Approach to Continuous Speech Recognition

[...]

Lalit R. Bahl¹, Frederick Jelinek¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

01 Feb 1983-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper describes a number of statistical models for use in speech recognition, with special attention to determining the parameters for such models from sparse data, and describes two decoding methods appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks.

...read moreread less

Abstract: Speech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining the parameters for such models from sparse data. We also describe two decoding methods, one appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks. To illustrate the usefulness of the methods described, we review a number of decoding results that have been obtained with them.

...read moreread less

1,637 citations

Proceedings Article•DOI•

Acoustic Markov models used in the Tangora speech recognition system

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Michael Picheny¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: An automatic technique for constructing Markov word models is described and results are included of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks.

...read moreread less

Abstract: The Speech Recognition Group at IBM Research has developed a real-time, isolated-word speech recognizer called Tangora, which accepts natural English sentences drawn from a vocabulary of 20000 words. Despite its large vocabulary, the Tangora recognizer requires only about 20 minutes of speech from each new user for training purposes. The accuracy of the system and its ease of training are largely attributable to the use of hidden Markov models in its acoustic match component. An automatic technique for constructing Markov word models is described and results are included of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks. >

...read moreread less

245 citations

Journal Article•DOI•

A modified K-means clustering algorithm for use in isolated work recognition

[...]

Jay G. Wilpon¹, Lawrence R. Rabiner²•Institutions (2)

Bell Labs¹, AT&T²

01 Jun 1985-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A clustering algorithm based on a standard K-means approach which requires no user parameter specification is presented and experimental data show that this new algorithm performs as well or better than the previously used clustering techniques when tested as part of a speaker-independent isolated word recognition system.

...read moreread less

Abstract: Studies of isolated word recognition systems have shown that a set of carefully chosen templates can be used to bring the performance of speaker-independent systems up to that of systems trained to the individual speaker. The earliest work in this area used a sophisticated set of pattern recognition algorithms in a human-interactive mode to create the set of templates (multiple patterns) for each word in the vocabulary. Not only was this procedure time consuming but it was impossible to reproduce exactly because it was highly dependent on decisions made by the experimenter. Subsequent work led to an automatic clustering procedure which, given only a set of clustering parameters, clustered patterns with the same performance as the previously developed supervised algorithms. The one drawback of the automatic procedure was that the specification of the input parameter set was found to be somewhat dependent on the vocabulary type and size of population to be clustered. Since a naive user of such a statistical clustering algorithm could not be expected, in general, to know how to choose the word clustering parameters, even this automatic clustering algorithm was not appropriate for a completely general word recognition system. It is the purpose of this paper to present a clustering algorithm based on a standard K-means approach which requires no user parameter specification. Experimental data show that this new algorithm performs as well or better than the previously used clustering techniques when tested as part of a speaker-independent isolated word recognition system.

...read moreread less

218 citations