scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1987"


Journal ArticleDOI
TL;DR: An efficient computer program is developed that will serve as a tool for investigating whether articulatory speech synthesis may achieve this low bit rate.
Abstract: High quality speech at low bit rates (e.g., 2400 bits/s) is one of the important objectives of current speech research. As part of long range activity on this problem, we have developed an efficient computer program that will serve as a tool for investigating whether articulatory speech synthesis may achieve this low bit rate. At a sampling frequency of 8 kHz, the most comprehensive version of the program, including nasality and frication, runs at about twice real time on a Cray-1 computer.

243 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: The paper describes a related scheme, which allows real time implementation on current DSP chips, and the very efficient search procedure in the codebook is achieved by means of a new technique called "backward filtering" and the use of algebraic codes.
Abstract: Code-Excited Linear Prediction (CELP) produces high quality synthetic speech at low bit rate. However the basic scheme leads to huge computational loads. The paper describes a related scheme, which allows real time implementation on current DSP chips. The very efficient search procedure in the codebook is achieved by means of a new technique called "backward filtering" and the use of algebraic codes. RSB performances are reported for a variety of conditions.

196 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: Simplifications of ATAL's technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures with applications to acoustic-phonetic synthesis are reported on.
Abstract: ATAL [1] introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios {y_{i}= \Ln ((1+k_{i})/(1-k_{i}))} where k i are the reflection coefficients obtained from short-time stationary LPC analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of articulatory targets (vector quantization code book). A set of speech segments ("polysons") has been encoded using this technique. It includes diphones, demi-syllables, and other units that are difficult to segment. Temporal decomposition using target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analyticaiy explained and modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic speech synthesis.

179 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: An improved Vector APC (VAPC) speech coder at 4800 bps produces speech with very good communications quality while maintaining a complexity low enough to allow a real-time implementation with at most two commercially available DSP chips.
Abstract: An improved Vector APC (VAPC) speech coder at 4800 bps produces speech with very good communications quality while maintaining a complexity low enough to allow a real-time implementation with at most two commercially available DSP chips. The VAPC algorithm combines APC with vector quantization and incorporates analysis-by-synthesis, perceptual noise weighting, and adaptive postfiltering. A novel adaptive postfiltering technique helps to achieve an essentially inaudible level of coding noise. Real-time software has been developed for an implementation using the AT&T DSP32 floating-point processor chip. The overall complexity of the implemented VAPC system is about 3 million multiply-adds/second of computation and 6 kwords of memory.

158 citations


Proceedings ArticleDOI
P. Kroon1, B. Atal
01 Apr 1987
TL;DR: This paper addresses the problem of finding and encoding the excitation parameters with a limited bit rate, such that high quality speech coding in the 4.8 - 7.2 kb/s range becomes feasible.
Abstract: Past research on CELP (Code-Excited Linear Predictive) coders has mainly concentrated on the feasibility of the CELP concept and on the reduction of the computational complexity. In this paper we address the problem of finding and encoding the excitation parameters with a limited bit rate, such that high quality speech coding in the 4.8 - 7.2 kb/s range becomes feasible. First, we examine the effect of the various excitation parameters such as code book size, code book population, order of the long-term predictor and update rate on the quality of the reconstructed speech. Second, we investigate procedures for designing and incorporating quantizers for the parameters involved. Finally, using both scalar and vector quantization techniques for the LPC coefficients, we simulated 4.8 kb/s and 7.2 kb/s coders. We also report on the use of postfiltering to further improve the performance of the CELP coder.

158 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: Three different approaches for automatically segmenting speech into phonetic units are described, onebased on template matching, one based on detecting the spectral changes that occur at the boundaries between phoneticunits and one based upon a constrained-clustering vector quantization approach.
Abstract: For large vocabulary and continuous speech recognition, the sub-word-unit-based approach is a viable alternative to the whole-word-unit-based approach. For preparing a large inventory of subword units, an automatic segmentation is preferrable to manual segmentation as it substantially reduces the work associated with the generation of templates and gives more consistent results. In this paper we discuss some methods for automatically segmenting speech into phonetic units. Three different approaches are described, one based on template matching, one based on detecting the spectral changes that occur at the boundaries between phonetic units and one based on a constrained-clustering vector quantization approach. An evaluation of the performance of the automatic segmentation methods is given.

156 citations



Patent
02 Sep 1987
TL;DR: In this article, a speech analyzer and synthesizer system using a sinusoidal encoding and decoding techniques for voiced frames and noise excitation or multiple pulse excitation for unvoiced frames.
Abstract: A speech analyzer and synthesizer system using a sinusoidal encoding and decoding techniques for voiced frames and noise excitation or multiple pulse excitation for unvoiced frames. For voiced frames, the analyser (100) transmits the pitch, values for each harmonic frequency by defining the offset from integer multiples of the fundamental frequency, total frame energy, and linear predictive coding, LPC, coefficients (FIG. 1). The synthesizer (200) is responsive to that information to determine the phase of the fundamental frequency and each harmonic based on the transmitted pitch and harmonic offset information and to determine the amplitudes of the harmonics utilizing the total frame energy and LPC coefficients (FIG. 2). Once the phase and amplitudes have been determined for the fundamental and harmonic frequencies, the sinusoidal analysis is performed for voiced frames. For each frame, the determined frequencies and amplitudes are defined at the center of the frame, and a linear interpolation is used both to determine continuous frequency and amplitude signals of the fundamental and the harmonics throughout the entire frame by the synthesizer. In addition, the analyzer initially adjusts the pitch so that the harmonics are evenly distributed around integer multiples of this pitch.

88 citations


Journal ArticleDOI
TL;DR: It is observed that the quality of decoded speech improves significantly when stable synthesis filters are employed and the stability and performance of pitch filters in speech coding when pitch prediction is combined with formant prediction is analyzed.
Abstract: This paper analyzes the stability and performance of pitch filters in speech coding when pitch prediction is combined with formant prediction. A computationally simple stability test based on a sufficient condition is formulated for pitch synthesis filters. For typical orders of pitch filters, this sufficient test is very tight. Based on the test, a simple stabilization technique that minimizes the loss in prediction gain of the pitch predictor is employed to generate stable synthesis filters. Finally, it is observed that the quality of decoded speech improves significantly when stable synthesis filters are employed.

84 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: A real-time 4.8 kb/s Pulse Excitation VXC coder (PVXC) is presented which achieves high reconstructed speech quality and incorporates new techniques which reduce the codebook search complexity to only 0.55 MFlops.
Abstract: In Vector Excitation Coding (VXC), speech is represented by applying a sequence of excitation vectors to a time-varying speech production filter with each vector chosen from a codebook using a perceptually-based performance measure. Although VXC is a powerful technique for achieving natural and high quality speech compression at low bit-rates, it suffers as other excitation coders do from a very high computational complexity. Recent research has shown that codebook search computation can be reduced to approximately 40 MFlops without compromising speech quality. However, this operation count still prohibits a practical real-time implementation of the coder using today's DSP chips. We present a real-time 4.8 kb/s Pulse Excitation VXC coder (PVXC) which achieves high reconstructed speech quality and incorporates new techniques which reduce the codebook search complexity to only 0.55 MFlops. The coder utilizes an optimized excitation codebook and a promising new interframe vector predictive LPC parameter quantization scheme. A preliminary implementation using a single floating-point signal processor is described.

67 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: This work proposed a model that is capable of expressing a wide range of voice source characteristics, and demonstrated that source and vocal-tract parameters can be well separated and correctly estimated, for vowel and vowel-like sounds, by combining the proposed source model with the linear predictive analysis.
Abstract: Conventional speech analysis methods based on linear prediction often fail to separate and estimate the source and vocal-tract characteristics, especially in the case of voiced sounds, because of oversimplified assumptions regarding the voice source. We have already proposed a model that is capable of expressing a wide range of voice source characteristics, and demonstrated that source and vocal-tract parameters can be well separated and correctly estimated, for vowel and vowel-like sounds, by combining the proposed source model with the linear predictive analysis. The present paper extends our approach to apply to a wider variety of speech sounds including nasal vowels and nasal consonants, by combining the proposed source model with the ARMA analysis. The validity of the system was demonstrated by analysis of synthetic and natural speech.

Proceedings ArticleDOI
06 Apr 1987
TL;DR: A novel spectral distance measure based on the smoothed LPC group delay spectrum which gives a stable recognition performance under variable frequency transfer characteristics and additive noise and gives a robust recognition rate in spite of variation in frequency characteristics and signal to noise ratio.
Abstract: We present a novel spectral distance measure based on the smoothed LPC group delay spectrum which gives a stable recognition performance under variable frequency transfer characteristics and additive noise. The weight of the n-th cepstral coefficients in our measure is given by W_{n} = n^{s}. \exp(-n^{2}/2\tau^{2}) which can be adjusted by selecting proper values of s and τ. In order to optimize the parameters of this distance measure, extensive experiments are carried out in a speaker-dependent isolated word recognition system using a standard dynamic time warping technique. The input speech data used here is a set of phonetically very similar 68 Japanese city name pairs spoken by male speakers. The experimental results show that our distance measure gives a robust recognition rate in spite of the variation in frequency characteristics and signal to noise ratio(SNR). In noisy situations of segmental SNR 20 dB, the recognition rate was more than 13% higher than that obtained by using the standard Euclidean cepstral distance measure. Finally, it is shown that the optimum value of s is approximately 1, and the optimum range of τΔT is about 1 ms.

PatentDOI
TL;DR: In this article, a method was proposed to determine if a portion of speech corresponds to a speech pattern by time aligning both the speech and a plurality of speech pattern models against a common time-aligning model.
Abstract: A method determines if a portion of speech corresponds to a speech pattern by time aligning both the speech and a plurality of speech pattern models against a common time-aligning model. This compensates for speech variation between the speech and the pattern models. The method then compares the resulting time-aligned speech model against the resulting time-aligned pattern models to determine which of the patterns most probably corresponds to the speech. Preferably there are a plurality of time-aligning models, each representing a group of somewhat similar sound sequences which occur in different words. Each of these time-aligning models is scored for similarity against a portion of speech, and the time-aligned speech model and time-aligned pattern models produced by time alignment with the best scoring time-aligning model are compared to determine the likelihood that each speech pattern corresponds to the portion of speech. This is performed for each successive portion of speech. When a portion of speech appears to correspond to a given speech pattern model, a range of likely start times is calculated for the vocabulary word associated with that model, and a word score is calculated to indicate the likelihood of that word starting in that range. The method uses a more computationally intensive comparison between the speech and selected vocabulary words, so as to more accurately determine which words correspond with which portions of the speech. When this more intensive comparison indicates the ending of a word at a given point in the speech, the method selects the best scoring vocabulary words whose range of start times overlaps that ending time, and performs the computationally intensive comparison on those selected words starting at that point in the speech.

PatentDOI
TL;DR: In this article, a finite impulse response linear predictive coding (LPC) filter and an overlapping codebook are used to determine a candidate excitation vector from the codebook that matches the target excitation vectors after searching the entire codebook for the best match.
Abstract: Apparatus for encoding speech using a code excited linear predictive (CELP) encoder using a recursive computational unit. In response to a target excitation vector that models a present frame of speech, the computational unit utilizes a finite impulse response linear predictive coding (LPC) filter and an overlapping codebook to determine a candidate excitation vector from the codebook that matches the target excitation vector after searching the entire codebook for the best match. For each candidate excitation vector accessed from the overlapping codebook, only one sample of the accessed vector and one sample of the previously accessed vector must have arithmetic operations performed on them to evaluate the new vector rather than all of the samples as is normal for CELP methods. For increased performance, a stochastically excited linear predictive (SELP) encoder is used in series with the adaptive CELP encoder. The SELP encoder is responsive to the difference between the target excitation vector and the best matched candidate excitation vector to search its own overlapping codebook in a recursive manner to determine a candidate excitation vector that provides the best match. Both of the best matched candidate vectors are used in speech synthesis.

PatentDOI
TL;DR: In this article, the quality of speech in a voice communication system is evaluated using a Mahalanobis D2 matrix, yielding D2 data which represents an estimation of the quality in the sample file.
Abstract: A method of evaluating the quality of speech in a voice communication system is used in a speech processor. A digital file of undistorted speech representative of a speech standard for a voice communication system is recorded. A sample file of possibly distorted speech carried by said voice communication system is also recorded. The file of standard speech and the file of possibly distorted speech are passed through a set of critical band filters to provide power spectra which include distorted-standard speech pairs. A variance-covariance matrix is calculated from said pairs, and a Mahalanobis D2 calculation is performed on said matrix, yielding D2 data which represents an estimation of the quality of speech in the sample file.

Journal ArticleDOI
TL;DR: It is shown that the vocal tract characteristics of voiced sounds uttered by females or children can be estimated accurately by the sample-selective linear prediction (SSLP) method proposed by the authors.
Abstract: The conventional linear prediction analysis has difficulties in estimating the vocal tract characteristics of voiced sounds uttered by females or children. This paper shows that the vocal tract characteristics of those speech signals can be estimated accurately by the sample-selective linear prediction (SSLP) method proposed by the authors. The SSLP is a two-stage linear prediction analysis employing only relevant sample values in the second stage analysis, while the conventional linear prediction method employs all the sample values with equal weights as predicted values. The accuracy of the proposed method in estimating formant frequencies is examined on synthetic vowels of short pitch periods. The validity of the method is confirmed by inspecting the estimated spectral envelopes and distributions of the estimated formant frequencies of natural vowels uttered by a female.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A new technique is described for coding the sine-wave amplitudes based on the idea of a pitch-adaptive channel vocoder and operating at a total bit rate of 4.8 kbps, it was possible to code and transmit enough phase information so that very intelligible, natural sounding speech could be synthesized.
Abstract: It has been shown [1] that an analysis/synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially indistinguishable from the original. By exploiting the peak-to-peak correlation of the sine-wave amplitudes [2], a harmonic model for the sine-wave frequencies, and a predictive model for the sine-wave phases [3], it has also been shown that the sine-wave parameters can be coded at 8 kbps. In this paper a new technique is described for coding the sine-wave amplitudes based on the idea of a pitch-adaptive channel vocoder. Using this amplitude-coding strategy and operating at a total bit rate of 4.8 kbps, it was possible to code and transmit enough phase information so that very intelligible, natural sounding speech could be synthesized. This 4.8 kbps system has been implemented in real-time and has achieved a Diagnostic Rhyme Test (DRT) score of 95. At 2.4 kbps no explicit phase information could be coded, but by phase-locking all of the sine waves to the fundamental, by adding a pitch-adaptive quadratic phase, and by adding a voicing dependent random phase to each sine wave, natural sounding synthetic speech could be obtained. This new system is currently being implemented in real-time so that intelligibility tests can be performed.

Proceedings ArticleDOI
Yair Shoham1
06 Apr 1987
TL;DR: Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB, and Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data without any obvious difference in the perceptual quality.
Abstract: Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: This work investigates the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system.
Abstract: We investigate the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system. The system is speaker dependent, and recognizes isolated words, based on dynamic time warping principles. The LP model for the clean speech is estimated through appropriate composite modeling of the noisy speech. This is done by minimizing the Itakura-Saito distortion measure between the sample spectrum of the noisy speech and the power spectral density of the composite model. This approach results in a "filtering-modeling" scheme in which the filter for the noisy speech, and the LP model for the clean speech, are alternatively optimized. The proposed system was tested using the 26 word English alphabet, the ten English digits, and the three command words, "stop," "error," and "repeat," which were contaminated by additive white noise at 5-20 dB signal to noise ratios (SNR's). By replacing the standard LP analysis with the proposed algorithm, during training on the clean speech and testing on the noisy speech, we achieve an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 10 dB.

PatentDOI
TL;DR: In this paper, an analog to digital converter for a speech signal is implemented in modules to allow for changes in bit rate and bit stream length according to requirements of the digital transmission system.
Abstract: An analog to digital converter for a speech signal is implemented in modules to allow for changes in bit rate and changes in bit stream length according to requirements of the digital transmission system. A pre-emphasis circuit provides an array of pre-emphasized speech samples which are stored in memory. A linear predictive coder provides an array of reflection coefficients and an array of filter coefficients. A pulse processor receives the speech samples and filter coefficients and generates speech amplitude and location signals. These signals are multiplied to generate quantized speech samples. The quantized speech samples and reflection coefficients are provided to a buffer which provides an output signal of a proper bit stream length and bit rate for the digital transmission system.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A vocal cord and tract model for speech coding at bit rates below 4.8 kb/s is proposed, intended to provide good starting values for an iterative optimization, thus alleviating the problem of locking on to a locally optimum solution.
Abstract: This paper proposes the use of a vocal cord and tract model for speech coding at bit rates below 4.8 kb/s. For this, a key requirement is the ability to derive model parameters from an input speech signal. Our approach to this problem employs an acoustic analysis front-end, a linked codebook of vocal-tract configurations and related acoustic characteristics, and an optimizing articulatory synthesizer. While the acoustic front-end is relatively straight-forward involving LPC, pitch, and voicing analyses, the codebook design and usage, as well as the specific method for optimizing the model parameters are new. The codebook is intended to provide good starting values for an iterative optimization, thus alleviating the problem of locking on to a locally optimum solution. In a first stage of optimization, the best vocal tract configuration found in the codebook is refined by varying only the vocal tract parameters. Then, in a second stage of optimization, the best match is found between the glottal waveform of the model and the inverse filtered input speech.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: This paper presents an approach to applying the analysis-by-synthesis technique to sinusoidal speech modelling in an attempt to increase the ability of the model to accurately represent the speech waveform.
Abstract: In recent years the concept of analysis-by-synthesis has been applied very successfully to improving the performance of LPC based models At the same time, new speech models have been introduced based on representing speech by a sum of amplitude and frequency-modulated sinusoids which have been shown to successfully represent the non-linear, time-varying and quasi-periodic nature of speech In this paper we present an approach to applying the analysis-by-synthesis technique to sinusoidal speech modelling in an attempt to increase the ability of the model to accurately represent the speech waveform

Proceedings ArticleDOI
01 Apr 1987
TL;DR: Through these experiments, the SEV is shown to be a low complexity, simply implemented speech coder that is competitive with the other coders in this class in producing high quality speech at low bit rates.
Abstract: This paper presents a formal objective and subjective comparison of a number of LPC vocoders which operate at bit rates around 4800 bps. In this work, particular emphasis is placed on the Self Excited Vocoder (SEV), a new speech coding approach which was introduced by the authors at ICASSP86 [1]. Many members of a class of LPC vocoders of which the SEV, the well known Multiple Pulse Excited Linear Predictive Coder (MPLPC) [2], and Code Excited Linear Predictive Coder (CELPC) [3] are members, are simulated and compared. Through these experiments, the SEV is shown to be a low complexity, simply implemented speech coder that is competitive with the other coders in this class in producing high quality speech at low bit rates.

Journal ArticleDOI
TL;DR: Experiments with real, connected speech indicate that the speech waveforms can be accurately represented using the analysis-synthesis approach presented here.
Abstract: A new modeling technique for voiced speech is introduced. Salient features are detailed modeling of speech waveforms and the use of improved parameter estimation techniques. The ideas of pitch-synchronous analysis are extended to make two subintervals synchronous with regions of approximately closed and approximately open glottis. Two LPC models are used in each pitch period, and the model parameters are changed at estimated times of transition from open-to-closed and closed-to-open glottis. The excitation is provided by changing initial conditions at these transition instants. Experiments with real, connected speech indicate that the speech waveforms can be accurately represented using the analysis-synthesis approach presented here.

PatentDOI
TL;DR: A speech coding system includes apparatus for generating a variable threshold dependent upon the power of an input speech signal, and a comparator to generate a discriminating signal for discriminating between aperiod when a speech continues and a period when the speech pauses.
Abstract: A speech coding system includes apparatus for generating a variable threshold dependent upon the power of an input speech signal, and a comparator for comparing the power of the input speech signal with the variable threshold value to generate a discriminating signal for discriminating between a period when a speech continues and a period when the speech pauses, to change the coding operation for the input speech signal in accordance with the level of the discriminating signal, thereby forming voiced and unvoiced frames independently of each other.

Proceedings ArticleDOI
06 Apr 1987
TL;DR: Methods for reducing computational and storage requirements of the segment vocoder are described and an algorithm that is implementable in real-time on hardware containing several Digital Signal Processing chips is presented.
Abstract: In previous papers, we have described the segment vocoder, which transmits intelligible speech at 300 b/s in speaker-independent mode, i.e., new users need not train the system. As expected for vector quantizers, the storage and computational requirements of the segment vocoder are significantly larger than those of the standard LPC-10 vocoder. In this paper, we describe methods for reducing computational and storage requirements of the segment vocoder and present an algorithm that is implementable in real-time on hardware containing several Digital Signal Processing chips. The DRT score of the simplified algorithm is 78%.

Proceedings ArticleDOI
J. Picone1, G. Doddington
01 Apr 1987
TL;DR: A low rate speech coding system which uses contour quantization to encode the LPC excitation is presented and is shown to be extremely robust and efficient in encoding the pitch and energy parameters of the L PC vocoder.
Abstract: Vector quantization-based approaches to speech coding have generated new interest in very low bit rate speech coding, that is, speech coded to bit rates below 1200 bits/sec To achieve such low bit rates, it is necessary to quantize the pitch and energy parameters at rates below 100 bits/sec Contour quantization is introduced as a technique in which the contour of a given parameter is normalized by a nominal value and vector quantized Contour quantization is shown to be extremely robust and efficient in encoding the pitch and energy parameters of the LPC vocoder In this paper, a low rate speech coding system which uses contour quantization to encode the LPC excitation is presented The system is a fixed bit rate system which is intended to operate at bit rates ranging from 400 bits/s to 800 bits/s The overall system delay varies from 300 ms at 800 bits/s to 400 ms at 400 bits/s At 800 bits/s, the system achieved a score of 89 on a three male speaker DRT, and a score of 81 on a three female speaker DRT

Proceedings ArticleDOI
01 Apr 1987
TL;DR: The results of an extensive evaluation of a speaker verification system for access control using a 200 speaker population and over 40,000 impostor attempts, both performed on line, over a 4-month period are presented.
Abstract: The results of an extensive evaluation of a speaker verification system for access control are presented. The system employs an algorithm based on the Principal Spectral Components representation derived from the short term spectrum of the speech signal. This system designed for access control applications has been evaluated using a 200 speaker population and a total of over 13,000 true speaker attempts and over 40,000 impostor attempts, both performed on line, over a 4-month period. A true speaker rejection rate of less than 1 % and an impostor acceptance rate of less than 0.1 % have been obtained.

Journal ArticleDOI
TL;DR: A new class of codes for data compression is described that combines permutations with the fast Hadamard transform (FHT), invented for digital speech compression based on linear predictive coding (LPC), but may be useful for other data compression applications.
Abstract: A new class of codes for data compression is described that combines permutations with the fast Hadamard transform (FHT). It was invented for digital speech compression based on linear predictive coding (LPC), but may be useful for other data compression applications. One particular code with rate \frac{1}{2} is considered: a 16 -bit code for a block length of 32 samples. All coding and decoding steps are fast, so that real-time applications with cheap hardware can be anticipated.

Journal ArticleDOI
TL;DR: Two types of adaptive predictor-control schemes are proposed in which the prediction error at each pel can be obtained at or close to a minimum level and performance of the 2D/3-D ladder filters, their adaptive control schemes, and variations in coding methods are evaluated.
Abstract: This paper presents several adaptive linear predictive coding techniques based upon extension of recursive ladder filters to two and three dimensions (2-D/3-D). A 2-D quarter-plane autoregressive ladder filter is developed using a least square criterion in an exact recursive fashion. The 2-D recursive ladder filter is extended to a 3-D case which can adaptively track the variation of both spatial and temporal changes of moving images. Using the 2-D/3-D ladder filters and a previous frame predictor, two types of adaptive predictor-control schemes are proposed in which the prediction error at each pel can be obtained at or close to a minimum level. We also investigate several modifications of the basic encoding methods. Performance of the 2D/3-D ladder filters, their adaptive control schemes, and variations in coding methods are evaluated by computer simulations on two real sequences and compared to the results of motion compensation and frame differential coders. As a validity test of the ladder filters developed, the error signals for the different predictors are compared and the visual quality of output images is verified.