Showing papers on "Linear predictive coding published in 1987"

PDF

Open Access

Journal Article•DOI•

A hybrid time-frequency domain articulatory speech synthesizer

[...]

Man Mohan Sondhi¹, Juergen Schroeter²•Institutions (2)

01 Jul 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An efficient computer program is developed that will serve as a tool for investigating whether articulatory speech synthesis may achieve this low bit rate.

...read moreread less

Abstract: High quality speech at low bit rates (e.g., 2400 bits/s) is one of the important objectives of current speech research. As part of long range activity on this problem, we have developed an efficient computer program that will serve as a tool for investigating whether articulatory speech synthesis may achieve this low bit rate. At a sampling frequency of 8 kHz, the most comprehensive version of the program, including nasality and frication, runs at about twice real time on a Cray-1 computer.

...read moreread less

243 citations

Proceedings Article•DOI•

Fast CELP coding based on algebraic codes

[...]

J.-P. Adoul¹, P. Mabilleau, M. Delprat, S. Morissette•Institutions (1)

Université de Sherbrooke¹

01 Apr 1987

TL;DR: The paper describes a related scheme, which allows real time implementation on current DSP chips, and the very efficient search procedure in the codebook is achieved by means of a new technique called "backward filtering" and the use of algebraic codes.

...read moreread less

Abstract: Code-Excited Linear Prediction (CELP) produces high quality synthetic speech at low bit rate. However the basic scheme leads to huge computational loads. The paper describes a related scheme, which allows real time implementation on current DSP chips. The very efficient search procedure in the codebook is achieved by means of a new technique called "backward filtering" and the use of algebraic codes. RSB performances are reported for a variety of conditions.

...read moreread less

196 citations

Proceedings Article•DOI•

Modeling spectral speech transitions using temporal decomposition techniques

[...]

G. Ahlbom, Frédéric Bimbot, Gérard Chollet

06 Apr 1987

TL;DR: Simplifications of ATAL's technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures with applications to acoustic-phonetic synthesis are reported on.

...read moreread less

Abstract: ATAL [1] introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios {y_{i}= \Ln ((1+k_{i})/(1-k_{i}))} where k i are the reflection coefficients obtained from short-time stationary LPC analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of articulatory targets (vector quantization code book). A set of speech segments ("polysons") has been encoded using this technique. It includes diphones, demi-syllables, and other units that are difficult to segment. Temporal decomposition using target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analyticaiy explained and modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic speech synthesis.

...read moreread less

179 citations

Proceedings Article•DOI•

Real-time vector APC speech coding at 4800 bps with adaptive postfiltering

[...]

Juin-Hwey Chen¹, Allen Gersho•Institutions (1)

Codex Corporation¹

01 Apr 1987

TL;DR: An improved Vector APC (VAPC) speech coder at 4800 bps produces speech with very good communications quality while maintaining a complexity low enough to allow a real-time implementation with at most two commercially available DSP chips.

...read moreread less

Abstract: An improved Vector APC (VAPC) speech coder at 4800 bps produces speech with very good communications quality while maintaining a complexity low enough to allow a real-time implementation with at most two commercially available DSP chips. The VAPC algorithm combines APC with vector quantization and incorporates analysis-by-synthesis, perceptual noise weighting, and adaptive postfiltering. A novel adaptive postfiltering technique helps to achieve an essentially inaudible level of coding noise. Real-time software has been developed for an implementation using the AT&T DSP32 floating-point processor chip. The overall complexity of the implemented VAPC system is about 3 million multiply-adds/second of computation and 6 kwords of memory.

...read moreread less

158 citations

Proceedings Article•DOI•

Quantization procedures for the excitation in CELP coders

[...]

P. Kroon¹, B. Atal•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: This paper addresses the problem of finding and encoding the excitation parameters with a limited bit rate, such that high quality speech coding in the 4.8 - 7.2 kb/s range becomes feasible.

...read moreread less

Abstract: Past research on CELP (Code-Excited Linear Predictive) coders has mainly concentrated on the feasibility of the CELP concept and on the reduction of the computational complexity. In this paper we address the problem of finding and encoding the excitation parameters with a limited bit rate, such that high quality speech coding in the 4.8 - 7.2 kb/s range becomes feasible. First, we examine the effect of the various excitation parameters such as code book size, code book population, order of the long-term predictor and update rate on the quality of the reconstructed speech. Second, we investigate procedures for designing and incorporating quantizers for the parameters involved. Finally, using both scalar and vector quantization techniques for the LPC coefficients, we simulated 4.8 kb/s and 7.2 kb/s coders. We also report on the use of postfiltering to further improve the performance of the CELP coder.

...read moreread less

158 citations

Proceedings Article•DOI•

On the automatic segmentation of speech signals

[...]

Torbjørn Svendsen¹, F. Soong•Institutions (1)

Bell Labs¹

06 Apr 1987

TL;DR: Three different approaches for automatically segmenting speech into phonetic units are described, onebased on template matching, one based on detecting the spectral changes that occur at the boundaries between phoneticunits and one based upon a constrained-clustering vector quantization approach.

...read moreread less

Abstract: For large vocabulary and continuous speech recognition, the sub-word-unit-based approach is a viable alternative to the whole-word-unit-based approach. For preparing a large inventory of subword units, an automatic segmentation is preferrable to manual segmentation as it substantially reduces the work associated with the generation of templates and gives more consistent results. In this paper we discuss some methods for automatically segmenting speech into phonetic units. Three different approaches are described, one based on template matching, one based on detecting the spectral changes that occur at the boundaries between phonetic units and one based on a constrained-clustering vector quantization approach. An evaluation of the performance of the automatic segmentation methods is given.

...read moreread less

156 citations

Journal Article•DOI•

Long-Time Average Spectrum of Speech and Voice Analysis

[...]

Anders Löfqvist, Bengt Mandersson

01 Jan 1987-Folia Phoniatrica Et Logopaedica

114 citations

Patent•

Digital speech vocoder

[...]

Edward Charles Bronson¹, Walter Thornley Hartwell¹, Willem Bastiaan Kleijn¹, Dimitrios Panos Prezas¹•Institutions (1)

AT&T Corporation¹

02 Sep 1987

TL;DR: In this article, a speech analyzer and synthesizer system using a sinusoidal encoding and decoding techniques for voiced frames and noise excitation or multiple pulse excitation for unvoiced frames.

...read moreread less

Abstract: A speech analyzer and synthesizer system using a sinusoidal encoding and decoding techniques for voiced frames and noise excitation or multiple pulse excitation for unvoiced frames. For voiced frames, the analyser (100) transmits the pitch, values for each harmonic frequency by defining the offset from integer multiples of the fundamental frequency, total frame energy, and linear predictive coding, LPC, coefficients (FIG. 1). The synthesizer (200) is responsive to that information to determine the phase of the fundamental frequency and each harmonic based on the transmitted pitch and harmonic offset information and to determine the amplitudes of the harmonics utilizing the total frame energy and LPC coefficients (FIG. 2). Once the phase and amplitudes have been determined for the fundamental and harmonic frequencies, the sinusoidal analysis is performed for voiced frames. For each frame, the determined frequencies and amplitudes are defined at the center of the frame, and a linear interpolation is used both to determine continuous frequency and amplitude signals of the fundamental and the harmonics throughout the entire frame by the synthesizer. In addition, the analyzer initially adjusts the pitch so that the harmonics are evenly distributed around integer multiples of this pitch.

...read moreread less

88 citations

Journal Article•DOI•

Stability and performance analysis of pitch filters in speech coders

[...]

Ravi P. Ramachandran¹, Peter Kabal¹•Institutions (1)

McGill University¹

01 Jul 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is observed that the quality of decoded speech improves significantly when stable synthesis filters are employed and the stability and performance of pitch filters in speech coding when pitch prediction is combined with formant prediction is analyzed.

...read moreread less

Abstract: This paper analyzes the stability and performance of pitch filters in speech coding when pitch prediction is combined with formant prediction. A computationally simple stability test based on a sufficient condition is formulated for pitch synthesis filters. For typical orders of pitch filters, this sufficient test is very tight. Based on the test, a simple stabilization technique that minimizes the loss in prediction gain of the pitch predictor is employed to generate stable synthesis filters. Finally, it is observed that the quality of decoded speech improves significantly when stable synthesis filters are employed.

...read moreread less

84 citations

Proceedings Article•DOI•

Real-time vector excitation coding of speech at 4800 bps

[...]

G. Davidson¹, Mei Yong, Allen Gersho•Institutions (1)

University of California, Santa Barbara¹

01 Apr 1987

TL;DR: A real-time 4.8 kb/s Pulse Excitation VXC coder (PVXC) is presented which achieves high reconstructed speech quality and incorporates new techniques which reduce the codebook search complexity to only 0.55 MFlops.

...read moreread less

Abstract: In Vector Excitation Coding (VXC), speech is represented by applying a sequence of excitation vectors to a time-varying speech production filter with each vector chosen from a codebook using a perceptually-based performance measure. Although VXC is a powerful technique for achieving natural and high quality speech compression at low bit-rates, it suffers as other excitation coders do from a very high computational complexity. Recent research has shown that codebook search computation can be reduced to approximately 40 MFlops without compromising speech quality. However, this operation count still prohibits a practical real-time implementation of the coder using today's DSP chips. We present a real-time 4.8 kb/s Pulse Excitation VXC coder (PVXC) which achieves high reconstructed speech quality and incorporates new techniques which reduce the codebook search complexity to only 0.55 MFlops. The coder utilizes an optimized excitation codebook and a promising new interframe vector predictive LPC parameter quantization scheme. A preliminary implementation using a single floating-point signal processor is described.

...read moreread less

67 citations

Proceedings Article•DOI•

Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the Glottal source waveform

[...]

Hidehiko Fujisaki¹, M. Ljungqvist¹•Institutions (1)

University of Tokyo¹

01 Apr 1987

TL;DR: This work proposed a model that is capable of expressing a wide range of voice source characteristics, and demonstrated that source and vocal-tract parameters can be well separated and correctly estimated, for vowel and vowel-like sounds, by combining the proposed source model with the linear predictive analysis.

...read moreread less

Abstract: Conventional speech analysis methods based on linear prediction often fail to separate and estimate the source and vocal-tract characteristics, especially in the case of voiced sounds, because of oversimplified assumptions regarding the voice source. We have already proposed a model that is capable of expressing a wide range of voice source characteristics, and demonstrated that source and vocal-tract parameters can be well separated and correctly estimated, for vowel and vowel-like sounds, by combining the proposed source model with the linear predictive analysis. The present paper extends our approach to apply to a wider variety of speech sounds including nasal vowels and nasal consonants, by combining the proposed source model with the ARMA analysis. The validity of the system was demonstrated by analysis of synthetic and natural speech.

...read moreread less

Proceedings Article•DOI•

Distance measure for speech recognition based on the smoothed group delay spectrum

[...]

Fumitada Itakura¹, Taizo Umezaki¹•Institutions (1)

Nagoya University¹

06 Apr 1987

TL;DR: A novel spectral distance measure based on the smoothed LPC group delay spectrum which gives a stable recognition performance under variable frequency transfer characteristics and additive noise and gives a robust recognition rate in spite of variation in frequency characteristics and signal to noise ratio.

...read moreread less

Abstract: We present a novel spectral distance measure based on the smoothed LPC group delay spectrum which gives a stable recognition performance under variable frequency transfer characteristics and additive noise. The weight of the n-th cepstral coefficients in our measure is given by W_{n} = n^{s}. \exp(-n^{2}/2\tau^{2}) which can be adjusted by selecting proper values of s and τ. In order to optimize the parameters of this distance measure, extensive experiments are carried out in a speaker-dependent isolated word recognition system using a standard dynamic time warping technique. The input speech data used here is a set of phonetically very similar 68 Japanese city name pairs spoken by male speakers. The experimental results show that our distance measure gives a robust recognition rate in spite of the variation in frequency characteristics and signal to noise ratio(SNR). In noisy situations of segmental SNR 20 dB, the recognition rate was more than 13% higher than that obtained by using the standard Euclidean cepstral distance measure. Finally, it is shown that the optimum value of s is approximately 1, and the optimum range of τΔT is about 1 ms.

...read moreread less

Patent•DOI•

Method for speech recognition

[...]

James K. Baker, Laurence S. Gillick

03 Apr 1987-Journal of the Acoustical Society of America

TL;DR: In this article, a method was proposed to determine if a portion of speech corresponds to a speech pattern by time aligning both the speech and a plurality of speech pattern models against a common time-aligning model.

...read moreread less

Abstract: A method determines if a portion of speech corresponds to a speech pattern by time aligning both the speech and a plurality of speech pattern models against a common time-aligning model. This compensates for speech variation between the speech and the pattern models. The method then compares the resulting time-aligned speech model against the resulting time-aligned pattern models to determine which of the patterns most probably corresponds to the speech. Preferably there are a plurality of time-aligning models, each representing a group of somewhat similar sound sequences which occur in different words. Each of these time-aligning models is scored for similarity against a portion of speech, and the time-aligned speech model and time-aligned pattern models produced by time alignment with the best scoring time-aligning model are compared to determine the likelihood that each speech pattern corresponds to the portion of speech. This is performed for each successive portion of speech. When a portion of speech appears to correspond to a given speech pattern model, a range of likely start times is calculated for the vocabulary word associated with that model, and a word score is calculated to indicate the likelihood of that word starting in that range. The method uses a more computationally intensive comparison between the speech and selected vocabulary words, so as to more accurately determine which words correspond with which portions of the speech. When this more intensive comparison indicates the ending of a word at a given point in the speech, the method selects the best scoring vocabulary words whose range of start times overlaps that ending time, and performs the computationally intensive comparison on those selected words starting at that point in the speech.

...read moreread less

Patent•DOI•

Code excited linear predictive vocoder

[...]

Richard Harry Ketchum¹, Willem Bastiaan Kleijn¹, Daniel John Krasinski¹•Institutions (1)

AT&T Corporation¹

26 Jun 1987-Journal of the Acoustical Society of America

TL;DR: In this article, a finite impulse response linear predictive coding (LPC) filter and an overlapping codebook are used to determine a candidate excitation vector from the codebook that matches the target excitation vectors after searching the entire codebook for the best match.

...read moreread less

Abstract: Apparatus for encoding speech using a code excited linear predictive (CELP) encoder using a recursive computational unit. In response to a target excitation vector that models a present frame of speech, the computational unit utilizes a finite impulse response linear predictive coding (LPC) filter and an overlapping codebook to determine a candidate excitation vector from the codebook that matches the target excitation vector after searching the entire codebook for the best match. For each candidate excitation vector accessed from the overlapping codebook, only one sample of the accessed vector and one sample of the previously accessed vector must have arithmetic operations performed on them to evaluate the new vector rather than all of the samples as is normal for CELP methods. For increased performance, a stochastically excited linear predictive (SELP) encoder is used in series with the adaptive CELP encoder. The SELP encoder is responsive to the difference between the target excitation vector and the best matched candidate excitation vector to search its own overlapping codebook in a recursive manner to determine a candidate excitation vector that provides the best match. Both of the best matched candidate vectors are used in speech synthesis.

...read moreread less

Patent•DOI•

Method of evaluating speech

[...]

George J. Boggs

06 Apr 1987-Journal of the Acoustical Society of America

TL;DR: In this article, the quality of speech in a voice communication system is evaluated using a Mahalanobis D2 matrix, yielding D2 data which represents an estimation of the quality in the sample file.

...read moreread less

Abstract: A method of evaluating the quality of speech in a voice communication system is used in a speech processor. A digital file of undistorted speech representative of a speech standard for a voice communication system is recorded. A sample file of possibly distorted speech carried by said voice communication system is also recorded. The file of standard speech and the file of possibly distorted speech are passed through a set of critical band filters to provide power spectra which include distorted-standard speech pairs. A variance-covariance matrix is calculated from said pairs, and a Mahalanobis D2 calculation is performed on said matrix, yielding D2 data which represents an estimation of the quality of speech in the sample file.

...read moreread less

Journal Article•DOI•

Analysis of speech signals of short pitch period by a sample-selective linear prediction

[...]

Y. Miyoshi, K. Yamato, R. Mizoguchi, M. Yanagida, O. Kakusho - Show less +1 more

01 Sep 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is shown that the vocal tract characteristics of voiced sounds uttered by females or children can be estimated accurately by the sample-selective linear prediction (SSLP) method proposed by the authors.

...read moreread less

Abstract: The conventional linear prediction analysis has difficulties in estimating the vocal tract characteristics of voiced sounds uttered by females or children. This paper shows that the vocal tract characteristics of those speech signals can be estimated accurately by the sample-selective linear prediction (SSLP) method proposed by the authors. The SSLP is a two-stage linear prediction analysis employing only relevant sample values in the second stage analysis, while the conventional linear prediction method employs all the sample values with equal weights as predicted values. The accuracy of the proposed method in estimating formant frequencies is examined on synthetic vowels of short pitch periods. The validity of the method is confirmed by inspecting the estimated spectral envelopes and distributions of the estimated formant frequencies of natural vowels uttered by a female.

...read moreread less

Proceedings Article•DOI•

Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps

[...]

R. McAulay, T. Quatieri

01 Apr 1987

TL;DR: A new technique is described for coding the sine-wave amplitudes based on the idea of a pitch-adaptive channel vocoder and operating at a total bit rate of 4.8 kbps, it was possible to code and transmit enough phase information so that very intelligible, natural sounding speech could be synthesized.

...read moreread less

Abstract: It has been shown [1] that an analysis/synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially indistinguishable from the original. By exploiting the peak-to-peak correlation of the sine-wave amplitudes [2], a harmonic model for the sine-wave frequencies, and a predictive model for the sine-wave phases [3], it has also been shown that the sine-wave parameters can be coded at 8 kbps. In this paper a new technique is described for coding the sine-wave amplitudes based on the idea of a pitch-adaptive channel vocoder. Using this amplitude-coding strategy and operating at a total bit rate of 4.8 kbps, it was possible to code and transmit enough phase information so that very intelligible, natural sounding speech could be synthesized. This 4.8 kbps system has been implemented in real-time and has achieved a Diagnostic Rhyme Test (DRT) score of 95. At 2.4 kbps no explicit phase information could be coded, but by phase-locking all of the sine waves to the fundamental, by adding a pitch-adaptive quadratic phase, and by adding a voicing dependent random phase to each sine wave, natural sounding synthetic speech could be obtained. This new system is currently being implemented in real-time so that intelligibility tests can be performed.

...read moreread less

Proceedings Article•DOI•

Vector predictive quantization of the spectral parameters for low rate speech coding

[...]

Yair Shoham¹•Institutions (1)

Bell Labs¹

06 Apr 1987

TL;DR: Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB, and Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data without any obvious difference in the perceptual quality.

...read moreread less

Abstract: Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.

...read moreread less

Proceedings Article•DOI•

A linear predictive front-end processor for speech recognition in noisy environments

[...]

Yariv Ephraim¹, J. G. Wilpon, Lawrence R. Rabiner•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: This work investigates the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system.

...read moreread less

Abstract: We investigate the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system. The system is speaker dependent, and recognizes isolated words, based on dynamic time warping principles. The LP model for the clean speech is estimated through appropriate composite modeling of the noisy speech. This is done by minimizing the Itakura-Saito distortion measure between the sample spectrum of the noisy speech and the power spectral density of the composite model. This approach results in a "filtering-modeling" scheme in which the filter for the noisy speech, and the LP model for the clean speech, are alternatively optimized. The proposed system was tested using the 26 word English alphabet, the ten English digits, and the three command words, "stop," "error," and "repeat," which were contaminated by additive white noise at 5-20 dB signal to noise ratios (SNR's). By replacing the standard LP analysis with the proposed algorithm, during training on the clean speech and testing on the noisy speech, we achieve an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 10 dB.

...read moreread less

Patent•DOI•

Multi-rate digital voice coder apparatus

[...]

John Bertrand, Matthew J. Noah

03 Jun 1987-Journal of the Acoustical Society of America

TL;DR: In this paper, an analog to digital converter for a speech signal is implemented in modules to allow for changes in bit rate and bit stream length according to requirements of the digital transmission system.

...read moreread less

Abstract: An analog to digital converter for a speech signal is implemented in modules to allow for changes in bit rate and changes in bit stream length according to requirements of the digital transmission system. A pre-emphasis circuit provides an array of pre-emphasized speech samples which are stored in memory. A linear predictive coder provides an array of reflection coefficients and an array of filter coefficients. A pulse processor receives the speech samples and filter coefficients and generates speech amplitude and location signals. These signals are multiplied to generate quantized speech samples. The quantized speech samples and reflection coefficients are provided to a buffer which provides an output signal of a proper bit stream length and bit rate for the digital transmission system.

...read moreread less

Proceedings Article•DOI•

Speech parameter estimation using a vocal tract/Cord model

[...]

J. Schroeter¹, J. N. Larar, Man Mohan Sondhi•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: A vocal cord and tract model for speech coding at bit rates below 4.8 kb/s is proposed, intended to provide good starting values for an iterative optimization, thus alleviating the problem of locking on to a locally optimum solution.

...read moreread less

Abstract: This paper proposes the use of a vocal cord and tract model for speech coding at bit rates below 4.8 kb/s. For this, a key requirement is the ability to derive model parameters from an input speech signal. Our approach to this problem employs an acoustic analysis front-end, a linked codebook of vocal-tract configurations and related acoustic characteristics, and an optimizing articulatory synthesizer. While the acoustic front-end is relatively straight-forward involving LPC, pitch, and voicing analyses, the codebook design and usage, as well as the specific method for optimizing the model parameters are new. The codebook is intended to provide good starting values for an iterative optimization, thus alleviating the problem of locking on to a locally optimum solution. In a first stage of optimization, the best vocal tract configuration found in the codebook is refined by varying only the vocal tract parameters. Then, in a second stage of optimization, the best match is found between the glottal waveform of the model and the inverse filtered input speech.

...read moreread less

Proceedings Article•DOI•

A new speech coding model based on a least-squares sinusoidal representation

[...]

E. George¹, M. Smith•Institutions (1)

Georgia Institute of Technology¹

01 Apr 1987

TL;DR: This paper presents an approach to applying the analysis-by-synthesis technique to sinusoidal speech modelling in an attempt to increase the ability of the model to accurately represent the speech waveform.

...read moreread less

Abstract: In recent years the concept of analysis-by-synthesis has been applied very successfully to improving the performance of LPC based models At the same time, new speech models have been introduced based on representing speech by a sum of amplitude and frequency-modulated sinusoids which have been shown to successfully represent the non-linear, time-varying and quasi-periodic nature of speech In this paper we present an approach to applying the analysis-by-synthesis technique to sinusoidal speech modelling in an attempt to increase the ability of the model to accurately represent the speech waveform

...read moreread less

Proceedings Article•DOI•

Quality comparison of low complexity 4800 bps self excited and code excited vocoders

[...]

R. Rose¹, Thomas P. Barnwell•Institutions (1)

Georgia Institute of Technology¹

01 Apr 1987

TL;DR: Through these experiments, the SEV is shown to be a low complexity, simply implemented speech coder that is competitive with the other coders in this class in producing high quality speech at low bit rates.

...read moreread less

Abstract: This paper presents a formal objective and subjective comparison of a number of LPC vocoders which operate at bit rates around 4800 bps. In this work, particular emphasis is placed on the Self Excited Vocoder (SEV), a new speech coding approach which was introduced by the authors at ICASSP86 [1]. Many members of a class of LPC vocoders of which the SEV, the well known Multiple Pulse Excited Linear Predictive Coder (MPLPC) [2], and Code Excited Linear Predictive Coder (CELPC) [3] are members, are simulated and compared. Through these experiments, the SEV is shown to be a low complexity, simply implemented speech coder that is competitive with the other coders in this class in producing high quality speech at low bit rates.

...read moreread less

Journal Article•DOI•

Excitation-synchronous modeling of voiced speech

[...]

Sarangarajan Parthasarathy¹, D. Tufts•Institutions (1)

Bell Labs¹

01 Sep 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Experiments with real, connected speech indicate that the speech waveforms can be accurately represented using the analysis-synthesis approach presented here.

...read moreread less

Abstract: A new modeling technique for voiced speech is introduced. Salient features are detailed modeling of speech waveforms and the use of improved parameter estimation techniques. The ideas of pitch-synchronous analysis are extended to make two subintervals synchronous with regions of approximately closed and approximately open glottis. Two LPC models are used in each pitch period, and the model parameters are changed at estimated times of transition from open-to-closed and closed-to-open glottis. The excitation is provided by changing initial conditions at these transition instants. Experiments with real, connected speech indicate that the speech waveforms can be accurately represented using the analysis-synthesis approach presented here.

...read moreread less

Patent•DOI•

Speech coding system using variable threshold values for noise reduction

[...]

Muramatsu Ryujiro¹, Takanori Miyamoto¹, Kazuhiro Kondo¹, Suzuki Toshiro¹•Institutions (1)

Hitachi¹

21 May 1987-Journal of the Acoustical Society of America

TL;DR: A speech coding system includes apparatus for generating a variable threshold dependent upon the power of an input speech signal, and a comparator to generate a discriminating signal for discriminating between aperiod when a speech continues and a period when the speech pauses.

...read moreread less

Abstract: A speech coding system includes apparatus for generating a variable threshold dependent upon the power of an input speech signal, and a comparator for comparing the power of the input speech signal with the variable threshold value to generate a discriminating signal for discriminating between a period when a speech continues and a period when the speech pauses, to change the coding operation for the input speech signal in accordance with the level of the discriminating signal, thereby forming voiced and unvoiced frames independently of each other.

...read moreread less

Proceedings Article•DOI•

A segment vocoder algorithm for real-time implementation

[...]

S. Roucos, A. Wilgus, W. Russell

06 Apr 1987

TL;DR: Methods for reducing computational and storage requirements of the segment vocoder are described and an algorithm that is implementable in real-time on hardware containing several Digital Signal Processing chips is presented.

...read moreread less

Abstract: In previous papers, we have described the segment vocoder, which transmits intelligible speech at 300 b/s in speaker-independent mode, i.e., new users need not train the system. As expected for vector quantizers, the storage and computational requirements of the segment vocoder are significantly larger than those of the standard LPC-10 vocoder. In this paper, we describe methods for reducing computational and storage requirements of the segment vocoder and present an algorithm that is implementable in real-time on hardware containing several Digital Signal Processing chips. The DRT score of the simplified algorithm is 78%.

...read moreread less

Proceedings Article•DOI•

Low rate speech coding using contour quantization

[...]

J. Picone¹, G. Doddington•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: A low rate speech coding system which uses contour quantization to encode the LPC excitation is presented and is shown to be extremely robust and efficient in encoding the pitch and energy parameters of the L PC vocoder.

...read moreread less

Abstract: Vector quantization-based approaches to speech coding have generated new interest in very low bit rate speech coding, that is, speech coded to bit rates below 1200 bits/sec To achieve such low bit rates, it is necessary to quantize the pitch and energy parameters at rates below 100 bits/sec Contour quantization is introduced as a technique in which the contour of a given parameter is normalized by a nominal value and vector quantized Contour quantization is shown to be extremely robust and efficient in encoding the pitch and energy parameters of the LPC vocoder In this paper, a low rate speech coding system which uses contour quantization to encode the LPC excitation is presented The system is a fixed bit rate system which is intended to operate at bit rates ranging from 400 bits/s to 800 bits/s The overall system delay varies from 300 ms at 800 bits/s to 400 ms at 400 bits/s At 800 bits/s, the system achieved a score of 89 on a three male speaker DRT, and a score of 81 on a three female speaker DRT

...read moreread less

Proceedings Article•DOI•

Evaluation of a high performance speaker verification system for access control

[...]

J. Naik¹, G. Doddington•Institutions (1)

Texas Instruments¹

01 Apr 1987

TL;DR: The results of an extensive evaluation of a speaker verification system for access control using a 200 speaker population and over 40,000 impostor attempts, both performed on line, over a 4-month period are presented.

...read moreread less

Abstract: The results of an extensive evaluation of a speaker verification system for access control are presented. The system employs an algorithm based on the Principal Spectral Components representation derived from the short term spectrum of the speech signal. This system designed for access control applications has been evaluated using a 200 speaker population and a total of over 13,000 true speaker attempts and over 40,000 impostor attempts, both performed on line, over a 4-month period. A true speaker rejection rate of less than 1 % and an impostor acceptance rate of less than 0.1 % have been obtained.

...read moreread less

Journal Article•DOI•

New permutation codes using Hadamard unscrambling (Corresp.)

[...]

Manfred R. Schroeder¹, Neil J. A. Sloane¹•Institutions (1)

Bell Labs¹

02 Jan 1987-IEEE Transactions on Information Theory

TL;DR: A new class of codes for data compression is described that combines permutations with the fast Hadamard transform (FHT), invented for digital speech compression based on linear predictive coding (LPC), but may be useful for other data compression applications.

...read moreread less

Abstract: A new class of codes for data compression is described that combines permutations with the fast Hadamard transform (FHT). It was invented for digital speech compression based on linear predictive coding (LPC), but may be useful for other data compression applications. One particular code with rate \frac{1}{2} is considered: a 16 -bit code for a block length of 32 samples. All coding and decoding steps are fast, so that real-time applications with cheap hardware can be anticipated.

...read moreread less

Journal Article•DOI•

Adaptive Linear Predictive Coding of Time-Varying Images Using Multidimensional Recursive Least Squares Ladder Filters

[...]

Man Nam¹, W. O'Neill¹•Institutions (1)

University of Illinois at Chicago¹

01 Aug 1987-IEEE Journal on Selected Areas in Communications

TL;DR: Two types of adaptive predictor-control schemes are proposed in which the prediction error at each pel can be obtained at or close to a minimum level and performance of the 2D/3-D ladder filters, their adaptive control schemes, and variations in coding methods are evaluated.

...read moreread less

Abstract: This paper presents several adaptive linear predictive coding techniques based upon extension of recursive ladder filters to two and three dimensions (2-D/3-D). A 2-D quarter-plane autoregressive ladder filter is developed using a least square criterion in an exact recursive fashion. The 2-D recursive ladder filter is extended to a 3-D case which can adaptively track the variation of both spatial and temporal changes of moving images. Using the 2-D/3-D ladder filters and a previous frame predictor, two types of adaptive predictor-control schemes are proposed in which the prediction error at each pel can be obtained at or close to a minimum level. We also investigate several modifications of the basic encoding methods. Performance of the 2D/3-D ladder filters, their adaptive control schemes, and variations in coding methods are evaluated by computer simulations on two real sequences and compared to the results of motion compensation and frame differential coders. As a validity test of the ladder filters developed, the error signals for the different predictors are compared and the visual quality of output images is verified.

...read moreread less