scispace - formally typeset
Search or ask a question

Showing papers on "Codebook published in 1987"


Patent
07 Jan 1987
TL;DR: In this article, a system and apparatus for encoding a set of input vector components by initiating a sequential search through a codebook memory to put out a series of associated error code vectors which can be compared in sequence over a period of time in order to select the minimum error code vector (best fit).
Abstract: A system and apparatus for encoding a set of input vector components by initiating a sequential search through a codebook memory to put out a series of associated error code vectors which can be compared in sequence over a period of time in order to select the minimum error code vector (best fit). A clocking-sequencing means enables an output latch to hold (after termination of the sequence period) the index number which represents the particular error code vector presently having the minimum distortion. Each new set of input vector components will be sequenced to search for the minimum error code vector (and index) for that particular set of input vector components.

206 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: The paper describes a related scheme, which allows real time implementation on current DSP chips, and the very efficient search procedure in the codebook is achieved by means of a new technique called "backward filtering" and the use of algebraic codes.
Abstract: Code-Excited Linear Prediction (CELP) produces high quality synthetic speech at low bit rate. However the basic scheme leads to huge computational loads. The paper describes a related scheme, which allows real time implementation on current DSP chips. The very efficient search procedure in the codebook is achieved by means of a new technique called "backward filtering" and the use of algebraic codes. RSB performances are reported for a variety of conditions.

196 citations


PatentDOI
TL;DR: In this article, a vector adaptive predictive coder is proposed, which approximates each vector of K speech samples by using each fixed vectors in a first codebook to excite a time-varying synthesis filter and picking the vector that minimizes distortion.
Abstract: A real-time vector adaptive predictive coder which approximates each vector of K speech samples by using each of M fixed vectors in a first codebook to excite a time-varying synthesis filter and picking the vector that minimizes distortion. Predictive analysis for each frame determines parameters used for computing from vectors in the first codebook zero-state response vectors that are stored at the same address (index) in a second codebook. Encoding of input speech vectors sn is then carried out using the second codebook. When the vector that minimizes distortion is found, its index is transmitted to a decoder which has a codebook identical to the first codebook of the decoder. There the index is used to read out a vector that is used to synthesize an output speech vector sn. The parameters used in the encoder are quantized, for example by using a table, and the indices are transmitted to the decoder where they are decoded to specify transfer characteristics of filters used in producing the vector sn from the receiver codebook vector selected by the vector index transmitted.

153 citations


Journal ArticleDOI
TL;DR: Although the approach is intrinsically text-independent, the system can be easily extended to text-dependent operation for improved performance and security by encoding specified training word utterances to form word prototypes.

118 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: An algorithm for calculating a noise-to-mask ratio is presented which helps to identify, where quantization noise could be audible, where the OCF-Coder can be audible.
Abstract: Optimum Coding in the Frequency domain (OCF) uses entropy coding of quantized spectral coefficients to efficiently code high quality sound signals with 3 bits/sample. In an iterative algorithm psychoacoustic weigthing is used to get the quantization noise to be masked in every critical band. The coder itself uses iterative quantizer control to get each data block to be coded with a fixed number of bits. Details about the OCF-Coder are presented together with information about the codebook needed and the training for the entropy coder. An algorithm for calculating a noise-to-mask ratio is presented which helps to identify, where quantization noise could be audible.

95 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: A new way of using vector quantization for improving recognition performance for a 60,000 word vocabulary speaker-trained isolated word recognizer using a phonemic Markov model approach to speech recognition is proposed.
Abstract: This paper proposes a new way of using vector quantization for improving recognition performance for a 60,000 word vocabulary speaker-trained isolated word recognizer using a phonemic Markov model approach to speech recognition. We show that we can effectively increase the codebook size by dividing the feature vector into two vectors of lower dimensionality, and then quantizing and training each vector separately. For a small codebook size, integration of the results of the two parameter vectors provides significant improvement in recognition performance as compared to the quantizing and training of the entire feature set together. Even for a codebook size as small as 64, the results obtained when using the new quantization procedure are quite close to those obtained when using Gaussian distribution of the parameter vectors.

89 citations


Journal ArticleDOI
TL;DR: A class of adaptive vector quantizers that can dynamically adjust the "gain" or amplitude scale of code vectors according to the input signal level are introduced.
Abstract: The generalization of gain adaptation to vector quantization (VQ) is explored in this paper and a comprehensive examination of alternative techniques is presented. We introduce a class of adaptive vector quantizers that can dynamically adjust the "gain" or amplitude scale of code vectors according to the input signal level. The encoder uses a gain estimator to determine a suitable normalization of each input vector prior to VQ encoding. The normalized vectors have reduced dynamic range and can then be more efficiently coded. At the receiver, the VQ decoder output is multiplied by the estimated gain. Both forward and backward adaptation are considered and several different gain estimators are compared and evaluated. Gain-adaptive VQ can be used alone for "vector PCM" coding (i.e., direct waveform VQ) or as a building block in other vector coding schemes. The design algorithm for generating the appropriate gain-normalized VQ codebook is introduced. When applied to speech coding, gain-adaptive VQ achieves significant performance improvement over fixed VQ with a negligible increase in complexity.

68 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: A real-time 4.8 kb/s Pulse Excitation VXC coder (PVXC) is presented which achieves high reconstructed speech quality and incorporates new techniques which reduce the codebook search complexity to only 0.55 MFlops.
Abstract: In Vector Excitation Coding (VXC), speech is represented by applying a sequence of excitation vectors to a time-varying speech production filter with each vector chosen from a codebook using a perceptually-based performance measure. Although VXC is a powerful technique for achieving natural and high quality speech compression at low bit-rates, it suffers as other excitation coders do from a very high computational complexity. Recent research has shown that codebook search computation can be reduced to approximately 40 MFlops without compromising speech quality. However, this operation count still prohibits a practical real-time implementation of the coder using today's DSP chips. We present a real-time 4.8 kb/s Pulse Excitation VXC coder (PVXC) which achieves high reconstructed speech quality and incorporates new techniques which reduce the codebook search complexity to only 0.55 MFlops. The coder utilizes an optimized excitation codebook and a promising new interframe vector predictive LPC parameter quantization scheme. A preliminary implementation using a single floating-point signal processor is described.

67 citations


Proceedings ArticleDOI
W. Equitz1
01 Apr 1987
TL;DR: A data structure (k-d trees, developed by Bentley) is demonstrated to be appropriate for implementing exact nearest neighbor searching in time logarithmic in codebook size and is generalizable to any vector quantization application with the appropriate distortion measure.
Abstract: Two methods for reducing the computation involved in vector quantization picture coding are presented. First, a data structure (k-d trees, developed by Bentley) is demonstrated to be appropriate for implementing exact nearest neighbor searching in time logarithmic in codebook size. Second, the Pairwise Nearest Neighbor (PNN) algorithm is presented as an alternative to the generalized Lloyd (Linde-Buzo-Gray) algorithm. The PNN algorithm derives a vector quantization codebook in a diminishingly small fraction of the time previously required, without sacrificing performance. Simulations on a variety of images coded at 1/2 bit per pixel indicate that PNN codebooks can be developed in roughly 5% of the time required by the LBG algorithm. The PNN algorithm can be used with squared error and weighted squared error distortion measures. These results are generalizable to any vector quantization application with the appropriate distortion measure.

58 citations


Journal ArticleDOI
TL;DR: Several vector quantization approaches to the problem of text-dependent speaker verification are described in this paper, where a source codebook is designed to represent a particular speaker saying a particular utterance, and this same utterance is spoken by a speaker to be verified and is encoded in the source code book representing the speaker whose identity was claimed.
Abstract: Several vector quantization approaches to the problem of text-dependent speaker verification are described. In each of these approaches, a source codebook is designed to represent a particular speaker saying a particular utterance. Later, this same utterance is spoken by a speaker to be verified and is encoded in the source codebook representing the speaker whose identity was claimed. The speaker is accepted if the verification utterance's quantization distortion is less than a prespecified speaker-specific threshold. The best approach achieved a 0.7 percent false acceptance rate and a 0.6 percent false rejection rate on a speaker population comprising 16 admissible speakers and 111 casual imposters. The approaches are described, and detailed experimental results are presented and discussed.

58 citations


PatentDOI
TL;DR: In this article, a finite impulse response linear predictive coding (LPC) filter and an overlapping codebook are used to determine a candidate excitation vector from the codebook that matches the target excitation vectors after searching the entire codebook for the best match.
Abstract: Apparatus for encoding speech using a code excited linear predictive (CELP) encoder using a recursive computational unit. In response to a target excitation vector that models a present frame of speech, the computational unit utilizes a finite impulse response linear predictive coding (LPC) filter and an overlapping codebook to determine a candidate excitation vector from the codebook that matches the target excitation vector after searching the entire codebook for the best match. For each candidate excitation vector accessed from the overlapping codebook, only one sample of the accessed vector and one sample of the previously accessed vector must have arithmetic operations performed on them to evaluate the new vector rather than all of the samples as is normal for CELP methods. For increased performance, a stochastically excited linear predictive (SELP) encoder is used in series with the adaptive CELP encoder. The SELP encoder is responsive to the difference between the target excitation vector and the best matched candidate excitation vector to search its own overlapping codebook in a recursive manner to determine a candidate excitation vector that provides the best match. Both of the best matched candidate vectors are used in speech synthesis.

PatentDOI
Copperi Maurizio1
TL;DR: In this paper, a linear-prediction inverse filter followed by a shaping filter whose coefficients are chosen out of a codebook of quantized filter coefficient vectors, obtaining a residual signal subdivided into vectors.
Abstract: This method provides a filtering of blocks of digital samples of speech signal by a linear-prediction inverse filter followed by a shaping filter, whose coefficients are chosen out of a codebook of quantized filter coefficient vectors, obtaining a residual signal subdivided into vectors. Each vector is classified by an index q depending on the zero-crossing frequency and r.m.s. value; it is then normalized on the basis of the quantized r.m.s. value, and then of a vector of quantized short-term mean values; the mean-square error made in quantizing said vectors with vectors contained in a codebook and forming excitation waveforms in computed. In this codebook the search is limited to a subset of vectors determined by index q and p of short-term mean vector. The coding signal consists of the index of the filter coefficient vector, of indices q, p, of quantization index m of the r.m.s. value, and of the index of the vector of the excitation waveform which has generated minimum weighted mean-square error (FIG. 1).

PatentDOI
Bruce Alan Fette1
TL;DR: A method of operating a vocoder so that a variable frame rate results while maintaining a constant bit rate is disclosed, and vector quantization operations producing larger, more spectrally accurate outputs are selected when fewer subframes are included within a base frame.
Abstract: A method of operating a vocoder so that a variable frame rate results while maintaining a constant bit rate is disclosed. A base frame rate remains constant. However, spectral change is measured to determine an appropriate subframe rate within the base frame. Only one of a plurality of separate vector quantization processes is selected in response to spectral change for performance of a vector quantization operation on LPC spectrum coefficients. Each of the plurality of vector quantization operations utilizes its own codebook that contains a different quantity of reference patterns from that contained in the other codebooks. Each vector quantization operation produces a reference pattern descriptor code that contains a different number of bits from that produced by the other vector quantization processes. Vector quantization operations producing smaller, less spectrally accurate outputs are selected when more subframes are included with a base frame, and vector quantization operations producing larger, more spectrally accurate outputs are selected when fewer subframes are included within a base frame.

Proceedings ArticleDOI
D. Lin1
01 Apr 1987
TL;DR: In this article, the pseudostochastic block codes are constructed so as to maximize the complementarity of the adjacent codewords in an innovation codebook, and the pseudo-stochastic codebooks are constructed using exhaustive search procedures.
Abstract: In stochastic coding, the speech signal is represented as the product code of several acoustically independent vector elements. The short-time spectral and pitch redundancies in the speech signal are modeled by a set of time-varying linear filters. The unpredictable part of the speech signal (residual) is modeled by a codebook of white innovation signals. Schroeder and Atal had shown the effectiveness of probabilistically generated codes in providing the optimum innovation sequence in each block. However, the computational complexity of exhaustive search block coding using stochastic codebooks is extremely high. We describe in this paper speech coding using efficient pseudostochastic block codes. The pseudo-stochastic codes refer to stochastically populated block codes in which the adjacent codewords in an innovation codebook are non-independent. The pseudostochastic codewords are constructed so as to maximize their "complementarity." We discuss efficient exhaustive search procedures for determining the minimum-error pseudo-stochastic code in a block. These procedures exploit the dependencies of the adjacent codewords to efficiently obtain the ensemble of filter outputs. The computational cost of stochastic coding can be reduced by more than an order of magnitude using the pseudo-stochastic block codes.

Proceedings ArticleDOI
Yair Shoham1
06 Apr 1987
TL;DR: Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB, and Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data without any obvious difference in the perceptual quality.
Abstract: Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.

PatentDOI
TL;DR: In this article, a stochastically excited linear predictive (SELP) encoder is used in series with the adaptive CELP encoder to search its own overlapping codebook in a recursive manner to determine a candidate vector that provides the best match.
Abstract: Apparatus for encoding speech using a code excited linear predictive (CELP) encoder using a virtual searching technique during speech transitions such as from unvoiced to voiced regions of speech. The encoder compares candidate excitation vectors stored in a codebook with a target excitation vector representing a frame of speech to determine the candidate vector that best matches the target vector by repeating a first portion of each candidate vector into a second portion of each candidate vector. For increased performance, a stochastically excited linear predictive (SELP) encoder is used in series with the adaptive CELP encoder. The SELP encoder is responsive to the difference between the target vector and the best matched candidate vector to search its own overlapping codebook in a recursive manner to determine a candidate vector that provides the best match. Both of the best matched candidate vectors are used in speech synthesis.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A vocal cord and tract model for speech coding at bit rates below 4.8 kb/s is proposed, intended to provide good starting values for an iterative optimization, thus alleviating the problem of locking on to a locally optimum solution.
Abstract: This paper proposes the use of a vocal cord and tract model for speech coding at bit rates below 4.8 kb/s. For this, a key requirement is the ability to derive model parameters from an input speech signal. Our approach to this problem employs an acoustic analysis front-end, a linked codebook of vocal-tract configurations and related acoustic characteristics, and an optimizing articulatory synthesizer. While the acoustic front-end is relatively straight-forward involving LPC, pitch, and voicing analyses, the codebook design and usage, as well as the specific method for optimizing the model parameters are new. The codebook is intended to provide good starting values for an iterative optimization, thus alleviating the problem of locking on to a locally optimum solution. In a first stage of optimization, the best vocal tract configuration found in the codebook is refined by varying only the vocal tract parameters. Then, in a second stage of optimization, the best match is found between the glottal waveform of the model and the inverse filtered input speech.

Proceedings ArticleDOI
D.B. Roe1
01 Apr 1987
TL;DR: Two methods for improving the accuracy of an LPC vector-quantization speech recognizer by adapting the vector codebook to noisy conditions by changing the way people speak in noise are reported.
Abstract: Speech recognizers trained in quiet conditions but operating in noise usually have poor accuracy. This paper reports two methods for improving the accuracy of an LPC vector-quantization speech recognizer by adapting the vector codebook to noisy conditions. First, each codebook vector is changed to reflect the way people speak in noise. Second, the estimated spectrum of the background noise is added to the codebook vectors. These ideas have been tested on a total of 2400 utterances of digits recorded in a car by 4 speakers. A baseline word spotter similar to NTT's SPLIT system was modified by adapting its vector codebook to noise. This adapted codebook, when used with a new word decision criterion, yields error rates at least 4 times lower for noisy conditions. The accuracy is significantly better than without codebook adaptation techniques.

Proceedings ArticleDOI
06 Apr 1987
TL;DR: This work presents a very efficient search method useful for vector quantization, and other nearest neighbor search problems, with the saving in the number of multiplications can be over 99%, in comparison with the conventional full search method.
Abstract: In this work, we present a very efficient search method useful for vector quantization, and other nearest neighbor search problems. The algorithm first finds a small area around the input vector with one codevector on its boundary. After finding such an area, the codebook is searched to determine whether there is any other codeword inside this area or not. This search is performed employing two tests, avoiding distortion calculation for those codewords which fail these tests. Using this algorithm the saving in the number of multiplications can be over 99%, in comparison with the conventional full search method, with the number of additions being reduced by as much as 82%. The price paid is a moderate increase in the number of comparisons.

Patent
01 Apr 1987
TL;DR: In this paper, the vector quantization method was used for vector codebook generation and for coding for transmission and/or storage purposes, where the n-dimensional sampling is transformed into coordinates in one-dimensional space preferably in accordance with Hilbert's curve.
Abstract: The method of and the device for digital signal coding by vector quantization exploit representation of signal portions in a one-dimensional space for vector codebook generation and for coding for transmission and/or storage purposes. The n-dimensional sampling is transformed into coordinates in one-dimensional space preferably in accordance with Hilbert's curve.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: The bits per pixel rates for TV quality composite color image encoding using Multi-Stage Vector Quantization are reported and the VQ/MSVQ implementation requires two processors, inner product processor and comparator-address generator.
Abstract: The implementation of real-time image encoding requires a high and constant throughput rate not achievable by a SISD machine. With a reasonable size codebook and SIMD machine architecture. Vector Quantization algorithm can be implemented in real-time. But, single stage Vector Quantization requires fairly large-size codebook for good quality image encoding. Multi-Stage Vector Quantization with codebooks of moderate size at each stage has been shown to be an alternative viable approach. The bits per pixel rates for TV quality composite color image encoding using Multi-Stage Vector Quantization are reported. The VQ/MSVQ implementation requires two processors, inner product processor and comparator-address generator. The implementation details of the processors and their throughput rate are described.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: This scheme uses mean/residual vector quantization with binary-tree search algorithm to vector-quantize the Y component and the low-passed I and Q components of color images and incorporates an objective criteria for codebook replenishment.
Abstract: We present a new adaptive vector-quantization scheme with codebook replenishment for color image-sequence compression in this paper. This scheme uses mean/residual vector quantization with binary-tree search algorithm to vector-quantize the Y component and the low-passed I and Q components of color images. It also incorporates an objective criteria for codebook replenishment. The mean and label replenishment technique is also applied to further reduce the bit-rate. Computer simulations on two 60-frame, 512 × 512 color image-sequences have demonstrated the effectiveness of the proposed technique.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker, and a vector quantization approach to speaker adaptation is evaluated.
Abstract: In view of designing a speaker-independent large vocabulary recognition system, we evaluate a vector quantization approach to speaker adaptation. Only one speaker (the reference speaker) pronounces the application vocabulary. He also pronounces a small vocabulary called the adaptation vocabulary. Each new speaker then merely pronounces the adaptation vocabulary. Two adaptation methods are investigated, establishing a correspondence between the codebooks of these two speakers. This allows us to transform the reference utterances of the reference speaker into suitable references for the new speaker. Method I uses a transposed codebook to represent the new speaker during the recognition process whereas Method II uses a codebook which is obtained by clustering on the new speaker's pronunciation of the adaptation vocabulary. Experiments were carried out on a 20-speaker database (10 male, 10 female). The adaptation vocabulary contains 136 words; the application one has 104 words. The mean recognition error rate without adaptation is 22.3% for inter-speaker experiments; after one of the two methods has been implemented the mean recognition error rate is 10.5%. Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: Recognition experiments indicate that the performance of the weighted cepstral distance with vector quantized spectral data is considerably different from that previously reported for unquantized data.
Abstract: This paper extends the use of weighted cepstral distance measures to speaker independent word recognizers based on vector quantization. Recognition results were obtained for two recognition methods: dynamic timewarping of vector codes and hidden Markov modeling. The experiments were carried out on a vocabulary of the ten digits and the word "oh". Two kinds of spectral analysis were considered: LPC, and a recently proposed, low dimensional, perceptually based representation (PLP). The effects of analysis order and varying degrees of quantization in the spectral representation were also considered. Recognition experiments indicate that the performance of the weighted cepstral distance with vector quantized spectral data is considerably different from that previously reported for unquantized data. Comparison of recognition rates shows wide variations due to interaction of the distance measure with the analysis technique and with vector quantization. The best recognition scores were obtained by the combination of weighted cepstral distance and low order PLP analysis. This combination maintained good recognition rates down to very low (16 or 8 codes) codebook sizes.

Proceedings ArticleDOI
B. Atal1
01 Apr 1987
TL;DR: A stochastic model of LPC-derived log areas that eliminates training of the codebook by constructing codebook entries from random sequences and shows that vector quantization using random codebooks can provide a SNR of 20 dB in quantizing 10 log area parameters with 28 bits/frame.
Abstract: Accurate quantization of LPC parameters with a minimum number of bits is necessary for synthesizing high quality speech at low bit rates. Earlier work by Juang and Gray has shown that vector quantization can provide a significant reduction in the bit rate needed to quantize the LPC parameters. Previous work on vector quantization of LPC parameters employed trained codebooks. In this paper, we describe a stochastic model of LPC-derived log areas that eliminates training of the codebook by constructing codebook entries from random sequences. Each vector of LPC parameters is modelled as a sample function of zero mean Gaussian stochastic process with known covariances. We generate an ensemble of Gaussian codewords with a specified distribution where the number of codewords in the ensemble is determined by the number of bits used to quantize the LPC parameter vector. The optimum codeword is selected by exhaustive search to minimize the Euclidean distance between the original and quantized parameters. Our results show that vector quantization using random codebooks can provide a SNR of 20 dB in quantizing 10 log area parameters with 28 bits/frame. An important advantage of random codebook is that they provide robust performance across different speakers and speech recording conditions.

Proceedings ArticleDOI
Y. Matsuyama1
06 Apr 1987
TL;DR: The presented design method is a nontrivial extention of ordinary vector quantizer design which contains the classical Lloyd-Max algorithm and is applicable to any data including speech and image as long as the quantization distortion is defined.
Abstract: Algorithms for vector quantization of variable region data are given. The design iteration is proved to converge. An important issue here is the optimization step of the region shape with respect to the vector quantization codebook. Thus, the presented design method is a nontrivial extention of ordinary vector quantizer design which contains the classical Lloyd-Max algorithm. First, the main algorithm is given without introducing any physical entity. Therefore, the method is applicable to any data including speech and image as long as the quantization distortion is defined. In the speech coding case, which is the main body of this paper, the region shape optimization is interpreted as the epoch interval adjustment. The selection of the adjusted epochs with respect to the vector quantization codebook considerably reduces the quantizing distortion. This enables very-low-rate speech compression. Then, the image coding case is formulated and some convergence problem is discussed.

Patent
20 Mar 1987
TL;DR: In this paper, a signal processing arrangement uses a codebook of first vector quantized speech feature signals formed responsive to a large collection of speech feature signal signals, which is altered by combining the first speech features of the codebook with second speech features generated responsive to later input speech patterns during normal speech processing.
Abstract: A signal processing arrangement uses a codebook of first vector quantized speech feature signals formed responsive to a large collection of speech feature signals. The codebook is altered by combining the first speech feature signals of the codebook with second speech feature signals generated responsive to later input speech patterns during normal speech processing. The first set of vector quantized feature signals are thereby adaptively modified to second speech feature signals obtained from normally processed speech patterns.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A dynamic time-warping processor based on a word dictionary, in which each word is represented as a time-sequence of the universal codebook elements (SPLIT method), then resolves the choice among the remaining word candidates.
Abstract: This paper proposes a new VQ (Vector Quantization)- based preprocessor for use in a method which reduces the amount of computation necessary in speaker-independent large vocabulary isolated word recognition. A speech wave is analyzed by time functions of instantaneous cepstrum coefficients and short-time regression coefficients for both cepstrum coefficients and logarithmic energy. A universal VQ codebook for these time functions is constructed based on a multi-speaker, multi-word database. Next, a separate codebook is designed as a subset of the universal codebook for each word in the vocabulary. These word-specific codebooks are used for front-end preprocessing to eliminate word candidates whose distance scores are large. A dynamic time-warping processor based on a word dictionary, in which each word is represented as a time-sequence of the universal codebook elements (SPLIT method), then resolves the choice among the remaining word candidates. Effectiveness of this method has been ascertained by recognition experiments using a database consisting of words from a vocabulary of 100 Japanese city names uttered by 20 male speakers.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: The results indicate that good performance can be achieved with relatively small codebooks for the quantizers and that predictive coding with eigenvector rotation gives a small but consistent improvement over direct vector quantization.
Abstract: Stochastic coders provide a way of encoding the excitation to the synthesis filter at bit rates of about 2 kbit/s, thus leading to fhe possibility of high quality speech coding at 4.8 kbit/s. In these coders, the excitation is encoded as an index into a codebook of random excitation waveforms and the coder transmits the parameters of a short-term filter (LPC all-pole predictor), the parameters of a long-term filter (pitch predictor) and the excitation gain to the receiver. Although the coders give excellent speech quality with unquantized parameters, the output degrades significantly when the filter parameters are coarsely quantized. For a 4.8 kbit/s coder, the short-term filter parameters have to be quantized at 1 kbit/s or less and conventional scalar quantizers at this bit rate result in severe degradation of output speech. In this paper we describe the performance of stochastic coders when the short-term filter parameters are quantized using direct vector quantization and vector quantization with predictive coding and eigenvector rotation. Our results indicate that good performance can be achieved with relatively small codebooks for the quantizers and that predictive coding with eigenvector rotation gives a small but consistent improvement over direct vector quantization.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: This paper studies some possibilities of coding speech using the multipulse model but as efficiently as the stochastic coders does.
Abstract: Both multipulse and stochastic coders render telephonic, or close to telephonic quality, for speech signals. Stochastic coders are more efficient from the point of view of bit rate and multipulse coders are the most efficient ones if we are looking for low computational cost. In this paper we study some possibilities of coding speech using the multipulse model but as efficiently as the stochastic coders does. Synthetic speech is generated from a multipulse sequence, obtained from a codebook, through a LPC syntesis filter. Then the model for speech signal is the multipulse model; it resembles the stochastic one in that the input sequence is obtained from a codebook but differs since it does not use pitch filter.