Minimum prediction residual principle applied to speech recognition

doi:10.1109/TASSP.1975.1162641

Home
/
Papers
/
Minimum prediction residual principle applied to speech recognition

Journal Article•DOI•

Minimum prediction residual principle applied to speech recognition

F. Itakura¹•Institutions (1)

Bell Labs¹

01 Feb 1975-IEEE Transactions on Acoustics, Speech, and Signal Processing (IEEE)-Vol. 23, Iss: 1, pp 154-158

TL;DR: A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual through optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm.

read less

Abstract: A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual. A reference pattern for each word to be recognized is stored as a time pattern of linear prediction coefficients (LPC). The total log prediction residual of an input signal is minimized by optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm (DP). The input signal is recognized as the reference word which produces the minimum prediction residual. A sequential decision procedure is used to reduce the amount of computation in DP. A frequency normalization with respect to the long-time spectral distribution is used to reduce effects of variations in the frequency response of telephone connections. The system has been implemented on a DDP-516 computer for the 200-word recognition experiment. The recognition rate for a designated male talker is 97.3 percent for telephone input, and the recognition time is about 22 times real time.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

An Algorithm for Vector Quantizer Design

[...]

Y. Linde¹, A. Buzo², Robert M. Gray³•Institutions (3)

Codex Corporation¹, National Autonomous University of Mexico², Stanford University³

01 Jan 1980-IEEE Transactions on Communications

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.

...read moreread less

Abstract: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in Linear Predictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.

...read moreread less

7,935 citations

Journal Article•DOI•

Dynamic programming algorithm optimization for spoken word recognition

[...]

H. Sakoe¹, S. Chiba¹•Institutions (1)

NEC¹

01 Feb 1978-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.

...read moreread less

Abstract: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.

...read moreread less

5,906 citations

Cites methods from "Minimum prediction residual princip..."

...As a further investigation, the optimized algorithm is experimentally compared with several varieties of the DP-algorithm, which have been applied to spoken word recognition by some research groups [3] -[ 6 ] . The optimized algorithm superiority is established, indicating the validity of this investigation....
[...]
...Four typical ones, including those proposed by Sakoe and Chiba [3] , Velichko and Zagoruyko [4] , White and Neely [SI , and Itakura [ 6 ] , were subjected to comparison with the algorithms described in this paper....
[...]

Journal Article•DOI•

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

[...]

S. Davis, Paul Mermelstein¹•Institutions (1)

bell northern research¹

01 Aug 1980-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.

...read moreread less

Abstract: Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.

...read moreread less

4,822 citations

Journal Article•DOI•

Spectrum estimation and harmonic analysis

[...]

David J. Thomson¹•Institutions (1)

Bell Labs¹

01 Sep 1982

TL;DR: In this article, a local eigenexpansion is proposed to estimate the spectrum of a stationary time series from a finite sample of the process, which is equivalent to using the weishted average of a series of direct-spectrum estimates based on orthogonal data windows to treat both bias and smoothing problems.

...read moreread less

Abstract: In the choice of an estimator for the spectrum of a stationary time series from a finite sample of the process, the problems of bias control and consistency, or "smoothing," are dominant. In this paper we present a new method based on a "local" eigenexpansion to estimate the spectrum in terms of the solution of an integral equation. Computationally this method is equivalent to using the weishted average of a series of direct-spectrum estimates based on orthogonal data windows (discrete prolate spheroidal sequences) to treat both the bias and smoothing problems. Some of the attractive features of this estimate are: there are no arbitrary windows; it is a small sample theory; it is consistent; it provides an analysis-of-variance test for line components; and it has high resolution. We also show relations of this estimate to maximum-likelihood estimates, show that the estimation capacity of the estimate is high, and show applications to coherence and polyspectrum estimates.

...read moreread less

3,921 citations

Book•

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

[...]

Dan Jurafsky, James Martin

01 Jan 2000

TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.

...read moreread less

Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

...read moreread less

3,794 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Speech analysis and synthesis by linear prediction of the speech wave.

[...]

B. S. Atal, Suzanne L. Hanauer

01 Jan 1970-Journal of the Acoustical Society of America

TL;DR: Application of this method for efficient transmission and storage of speech signals as well as procedures for determining other speechcharacteristics, such as formant frequencies and bandwidths, the spectral envelope, and the autocorrelation function, are discussed.

...read moreread less

Abstract: A method of representing the speech signal by time‐varying parameters relating to the shape of the vocal tract and the glottal‐excitation function is described. The speech signal is first analyzed and then synthesized by representing it as the output of a discrete linear time‐varying filter, which is excited by a suitable combination of a quasiperiodic pulse train and white noise. The output of the linear filter at any sampling instant is a linear combination of the past output samples and the input. The optimum linear combination is obtained by minimizing the mean‐squared error between the actual values of the speech samples and their predicted values based on a fixed number of preceding samples. A 10th‐order linear predictor was found to represent the speech signal band‐limited to 5kHz with sufficient accuracy. The 10 coefficients of the predictor are shown to determine both the frequencies and bandwidths of the formants. Two parameters relating to the glottal‐excitation function and the pitch period are determined from the prediction error signal. Speech samples synthesized by this method will be demonstrated.

...read moreread less

1,124 citations

Journal Article•DOI•

On the Statistical Treatment of Linear Stochastic Difference Equations

[...]

Henry B. Mann, Abraham Wald

01 Jul 1943-Econometrica

611 citations

A Dynamic Programming Approach to Continuous Speech Recognition

[...]

Hiroaki Sakoe, Seibi Chiba

01 Jan 1971

416 citations

A statistical method for estimation of speech spectral density and formant frequencies

[...]

F. Itakura

01 Jan 1970

256 citations

Journal Article•DOI•

Automatic recognition of 200 words

[...]

V.M. Velichko, N.G. Zagoruyko

01 Jul 1970-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: The logarithmic characteristics of acoustic signal in five bands are extracted as features and the measure of similarity between the words of standard and control sequences is calculated by the words maximizing a definite functional using dynamic programming.

...read moreread less

Abstract: Experiments on the automatic recognition of 203 Russian words are described The experimental vocabulary includes terms of the language, ALGOL -60 together with others The logarithmic characteristics of acoustic signal in five bands are extracted as features The measure of similarity between the words of standard and control sequences is calculated by the words maximizing a definite functional using dynamic programming The average reliability of recognition for one speaker obtained for experiments using 5000 words is 0·95 The computational time for recognition is 2-4 sec

...read moreread less

214 citations