Showing papers on "Linear predictive coding published in 1993"

PDF

Open Access

Book•

Discrete-Time Processing of Speech Signals

[...]

J. R. Deller, John G. Proakis, John H. L. Hansen

01 Mar 1993

TL;DR: The preface to the IEEE Edition explains the background to speech production, coding, and quality assessment and introduces the Hidden Markov Model, the Artificial Neural Network, and Speech Enhancement.

...read moreread less

Abstract: Preface to the IEEE Edition. Preface. Acronyms and Abbreviations. SIGNAL PROCESSING BACKGROUND. Propaedeutic. SPEECH PRODUCTION AND MODELLING. Fundamentals of Speech Science. Modeling Speech Production. ANALYSIS TECHNIQUES. Short--Term Processing of Speech. Linear Prediction Analysis. Cepstral Analysis. CODING, ENHANCEMENT AND QUALITY ASSESSMENT. Speech Coding and Synthesis. Speech Enhancement. Speech Quality Assessment. RECOGNITION. The Speech Recognition Problem. Dynamic Time Warping. The Hidden Markov Model. Language Modeling. The Artificial Neural Network. Index.

...read moreread less

2,761 citations

Journal Article•DOI•

Signal modeling techniques in speech recognition

[...]

Joseph Picone¹•Institutions (1)

Texas Instruments¹

01 Sep 1993

TL;DR: A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used, and three important trends that have developed in the last five years in speech recognition are examined.

...read moreread less

Abstract: A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used. The four basic operations of signal modeling, i.e. spectral shaping, spectral analysis, parametric transformation, and statistical modeling, are discussed. Three important trends that have developed in the last five years in speech recognition are examined. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similarity transform techniques, often used to normalize and decorrelate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal's spectrum can be estimated in a closed-loop manner. The signal processing components of these algorithms are reviewed. >

...read moreread less

792 citations

Journal Article•DOI•

Efficient vector quantization of LPC parameters at 24 bits/frame

[...]

Kuldip K. Paliwal¹, B. Atal¹•Institutions (1)

Bell Labs¹

01 Jan 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: It is shown that the split vector quantizer can quantize LPC information in 24 bits/frame with an average spectral distortion of 1 dB and less than 2% of the frames having spectral distortion greater than 2 dB.

...read moreread less

Abstract: For low bit rate speech coding applications, it is important to quantize the LPC parameters accurately using as few bits as possible. Though vector quantizers are more efficient than scalar quantizers, their use for accurate quantization of linear predictive coding (LPC) information (using 24-26 bits/frame) is impeded by their prohibitively high complexity. A split vector quantization approach is used here to overcome the complexity problem. An LPC vector consisting of 10 line spectral frequencies (LSFs) is divided into two parts, and each part is quantized separately using vector quantization. Using the localized spectral sensitivity property of the LSF parameters, a weighted LSF distance measure is proposed. With this distance measure, it is shown that the split vector quantizer can quantize LPC information in 24 bits/frame with an average spectral distortion of 1 dB and less than 2% of the frames having spectral distortion greater than 2 dB. The effect of channel errors on the performance of this quantizer is also investigated and results are reported. >

...read moreread less

665 citations

Proceedings Article•DOI•

Mel-cepstral distance measure for objective speech quality assessment

[...]

R. Kubichek

19 May 1993

TL;DR: A perceptually motivated modification to the cepstral distance measure (CD) based on the mel frequency scale and critical-band filtering indicates that critical band filtering (and frequency warping) allows better modeling of perceived quality.

...read moreread less

Abstract: The author proposes a perceptually motivated modification to the cepstral distance measure (CD) based on the mel frequency scale and critical-band filtering. The new objective parameter is referred to as the mel cepstral distance (MCD). The author measures and compares the performance of the CD and MCD algorithms by applying them to a dataset representing low-bit-rate code-excited linear prediction (CELP)-coded speech with simulated channel conditions. The improvement in correlation with subjective DAM scores indicates that critical band filtering (and frequency warping) allows better modeling of perceived quality. >

...read moreread less

435 citations

Patent•DOI•

Reconstruction of wideband speech from narrowband speech using codebooks

[...]

Masanobu Abe¹, Yuki Yoshida¹•Institutions (1)

Nippon Telegraph and Telephone¹

29 Sep 1993-Journal of the Acoustical Society of America

TL;DR: In this article, a wideband speech signal (8 kHz) of high quantity is reconstructed from a narrowband speech signals (300 Hz to 3.4 kHz) by LPC-analyzing to obtain spectrum information parameters.

...read moreread less

Abstract: A wideband speech signal (8 kHz, for example) of high quantity is reconstructed from a narrowband speech signal (300 Hz to 3.4 kHz). The input narrowband speech signal is LPC-analyzed to obtain spectrum information parameters, and the parameters are vector-quantized using a narrowband speech signal codebook. For each code number of the narrowband speech signal codebook, the wideband speech waveform corresponding to the codevector concerned is extracted by one pitch for voiced speech and by one frame for unvoiced speech and prestored in a representative waveform codebook. Representative waveform segments corresponding to the respective output codevector numbers of the quantizer are extracted from the representative waveform codebook. Voiced speech is synthesized by pitch-synchronous overlapping of the extracted representative waveform segments and unvoiced speech is synthesized by randomly using waveforms of one frame length. By this, a wideband speech signal is produced. Then, frequency components below 300 Hz and above 3.4 kHz are extracted from the wideband speech signal and are added to an up-sampled version of the input narrowband speech signal to thereby reconstruct the wideband speech signal.

...read moreread less

219 citations

Journal Article•DOI•

Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding

[...]

W.P. LeBlanc¹, B. Bhattacharya, S.A. Mahmoud, V. Cuperman•Institutions (1)

Carleton University¹

01 Oct 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: It is shown experimentally that as the number of stages is increased above the optimal performance/complexity tradeoff, the quantizer robustness and outlier performance can be improved at the expense of a slight increase in rate.

...read moreread less

Abstract: A tree-searched multistage vector quantization (VQ) scheme for linear prediction coding (LPC) parameters which achieves spectral distortion lower than 1 dB with low complexity and good robustness using rates as low as 22 b/frame is presented. The M-L search is used, and it is shown that it achieves performance close to that of the optimal search for a relatively small M. A joint codebook design strategy for multistage VQ which improves convergence speed and the VQ performance measures is presented. The best performance/complexity tradeoffs are obtained with relatively small size codebooks cascaded in a 3-6 stage configuration. It is shown experimentally that as the number of stages is increased above the optimal performance/complexity tradeoff, the quantizer robustness and outlier performance can be improved at the expense of a slight increase in rate. Results for log area ratio (LAR) and line spectral pairs (LSPs) parameters are presented. A training technique that reduces outliers at the expense of a slight average performance degradation is introduced. The method significantly outperforms the split codebook approach. >

...read moreread less

201 citations

Journal Article•DOI•

Formant location from LPC analysis data

[...]

R.C. Snell, F. Milinazzo

01 Apr 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: The estimation of formant frequencies and bandwidths from the filter coefficients obtained through linear-predictive-coding analysis of speech is discussed from several viewpoints and a method for locating roots within the unit circle is derived.

...read moreread less

Abstract: The estimation of formant frequencies and bandwidths from the filter coefficients obtained through linear-predictive-coding (LPC) analysis of speech is discussed from several viewpoints. A method for locating roots within the unit circle is derived. This algorithm is particularly well suited to computations carried out in fixed-point arithmetic using specialized signal processing hardware. >

...read moreread less

171 citations

Journal Article•DOI•

Optimal quantization of LSP parameters

[...]

F.K. Soong¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

01 Jan 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: A globally optimal scalar quantizer is designed for each differential LSP frequency, which achieves a 1-dB average log spectral distortion, a commonly accepted level for reproducing perceptually transparent spectral information.

...read moreread less

Abstract: Two nonuniform aspects of the line spectrum pair (LSP) linear predictive coding (LPC) parameters are investigated, including nonuniform statistical distributions and spectral sensitivities of adjacent LSP frequency differences. Based upon these two nonuniform properties, a globally optimal scalar quantizer is designed for each differential LSP frequency. The design algorithm is dynamic programming based and minimization of a nontrivial data dependent spectral distortion is adopted as the optimality criterion. At 32 bits/frame, the new LSP quantizer achieves a 1-dB average log spectral distortion, a commonly accepted level for reproducing perceptually transparent spectral information. The quantization performance has also been shown to be robust across different speakers and databases. >

...read moreread less

151 citations

Journal Article•DOI•

Encoding speech using prototype waveforms

[...]

Willem Bastiaan Kleijn¹•Institutions (1)

Bell Labs¹

01 Oct 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: The coding method is easily combined with existing LP-based speech coders, such as CELP, for unvoiced signals and excellent voiced speech quality is obtained at rates between 3.0 and 4.0 kb/s.

...read moreread less

Abstract: Voiced speech is interpreted as a concentration of slowly evolving pitch-cycle waveforms. This signal can be reconstructed by interpolation from a downsampled sequence of pitch-cycle waveforms with a rate of one prototype waveform per 20-30 ms interval. The prototype waveform is described by a set of linear-prediction (LP) filter coefficients describing the formant structure and a prototype excitation waveform, quantized with analysis-by-synthesis procedures. The speech signal is reconstructed by filtering an excitation signal consisting of the concatenation of (infinitesimal) sections of the instantaneous excitation waveforms. To obtain the correct level of periodicity, the short-term and the long-term correlations between the instantaneous excitation waveforms can be controlled explicitly. Thus, distortions such as noise, reverberation, and buzziness can be prevented. The coding method is easily combined with existing LP-based speech coders, such as CELP, for unvoiced signals. Excellent voiced speech quality is obtained at rates between 3.0 and 4.0 kb/s. >

...read moreread less

133 citations

Journal Article•DOI•

Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier

[...]

Yingyong Qi¹, B.R. Hunt¹•Institutions (1)

University of Arizona¹

01 Apr 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: Voiced-unvoiced-silence classification of speech was done using a multilayer feedforward network and results indicated that the network performance was not significantly affected by the size of the training set and a classification rate as high as 96%.

...read moreread less

Abstract: Voiced-unvoiced-silence classification of speech was done using a multilayer feedforward network. The network performance was evaluated and compared to that of a maximum-likelihood classifier. Results indicated that the network performance was not significantly affected by the size of the training set and a classification rate as high as 96% was obtained. >

...read moreread less

128 citations

Patent•DOI•

Enhancement of speech coding in background noise for low-rate speech coder

[...]

Yu-Jih Liu¹•Institutions (1)

Wilmington University¹

12 May 1993-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech coding system employs measurements of robust features of speech frames whose distribution is not strongly affected by noise/levels to make voicing decisions for input speech occurring in a noisy environment.

...read moreread less

Abstract: A speech coding system employs measurements of robust features of speech frames whose distribution are not strongly affected by noise/levels to make voicing decisions for input speech occurring in a noisy environment. Linear programing analysis of the robust features and respective weights are used to determine an optimum linear combination of these features. The input speech vectors are matched to a vocabulary of codewords in order to select the corresponding, optimally matching codeword. Adaptive vector quantization is used in which a vocabulary of words obtained in a quiet environment is updated based upon a noise estimate of a noisy environment in which the input speech occurs, and the "noisy" vocabulary is then searched for the best match with an input speech vector. The corresponding clean codeword index is then selected for transmission and for synthesis at the receiver end. The results are better spectral reproduction and significant intelligibility enhancement over prior coding approaches. Robust features found to allow robust voicing decisions include: low-band energy; zero-crossing counts adapted for noise level; AMDF ratio (speech periodicity) measure; low-pass filtered backward correlation; low-pass filtered forward correlation; inverse-filtered backward correlation; and inverse-filtered pitch prediction gain measure.

...read moreread less

Journal Article•DOI•

Lossless compression of waveform data for efficient storage and transmission

[...]

S.D. Stearns¹, Li Tan², N. Magotra²•Institutions (2)

Sandia National Laboratories¹, University of New Mexico²

01 May 1993-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Applications of the two-stage technique to typical seismic data indicates that an average number of compressed bits per sample close to the lower bound is achievable in practical situations.

...read moreread less

Abstract: A two-stage technique for lossless waveform data compression is described. The first stage is a modified form of linear prediction with discrete coefficients, and the second stage is bilevel sequence coding. The linear predictor generates an error or residue sequence in a way such that exact reconstruction of the original data sequence can be accomplished with a simple algorithm. The residue sequence is essentially white Gaussian with seismic or other similar waveform data. Bilevel sequence coding, in which two sample sizes are chosen and the residue sequence is encoded into subsequences that alternate from one level to the other, further compresses the residue sequence. The algorithm is lossless, allowing exact, bit-for-bit recovery of the original data sequence. The performance of the algorithm at each stage is analyzed. Applications of the two-stage technique to typical seismic data indicates that an average number of compressed bits per sample close to the lower bound is achievable in practical situations. >

...read moreread less

Proceedings Article•DOI•

Performance of noise excitation for unvoiced speech

[...]

Gernot Kubin¹, B.S. Atal, W.B. Kleijn•Institutions (1)

Bell Labs¹

13 Oct 1993

TL;DR: This paper addresses the question what perceptual quality can be achieved for unvoiced speech by a linear model with white noise excitation and demonstrates that this linear model results in unvoicing speech of high perceptual quality.

...read moreread less

Abstract: Recent interest in nonlinear modeling of speech has brought up the need to re-assess the performance limitations of linear speech models. While nonlinearity is essential in the production mechanism of speech, it need not be reflected in a speech-signal model. This paper addresses the question what perceptual quality can be achieved for unvoiced speech by a linear model with white noise excitation. Formal MOS test results demonstrate that this linear model results in unvoiced speech of high perceptual quality.

...read moreread less

Patent•DOI•

Speech processing system and method for enhancing a speech signal in a noisy environment

[...]

Sangil Park¹, Ed F. Martinez¹, Dae-Hee Youn¹•Institutions (1)

Motorola¹

30 Apr 1993-Journal of the Acoustical Society of America

TL;DR: In this paper, an adaptive filter such as a finite impulse response (FIR) filter receives a digital accelerometer input signal, adjusts filter coefficients according to an estimation error signal, and provides an enhanced speech signal as an output.

...read moreread less

Abstract: A speech processing system (30) operates in a noisy environment (20) by performing adaptive prediction between inputs from two sensors positioned to transduce speech from a speaker, such as an accelerometer and a microphone. An adaptive filter (37) such as a finite impulse response (FIR) filter receives a digital accelerometer input signal, adjusts filter coefficients according to an estimation error signal, and provides an enhanced speech signal as an output. The estimation error signal is a difference between a digital microphone input signal and the enhanced speech signal. In one embodiment, the adaptive filter (37) selects a maximum one of a first predicted speech signal based on a relatively-large smoothing parameter and a second predicted speech signal based on a relatively-small smoothing parameter, with which to normalize a predicted signal power. The predicted signal power is then used to adapt the filter coefficients.

...read moreread less

Proceedings Article•DOI•

Immittance spectral pairs (ISP) for speech encoding

[...]

Yuval Bistritz¹, S. Peller¹•Institutions (1)

Tel Aviv University¹

27 Apr 1993

TL;DR: In quantization experiments ISP has been found to compare favorably with LSP, and a study of interframe differentiation coding for ISP and LSP demonstrates the respective performances of the two sets.

...read moreread less

Abstract: Immittance spectral pairs (ISPs) form a new set of parameters for representing the linear predictive coding (LPC) filter. For a filter of order n ISP consists of a gain and n-1 frequency parameters, instead of n frequency parameters as is the case for line spectrum pair (LSPs). In regarding LPC as a pseudo-model for the vocal tract, ISP can represent the immitance at the glottis without imposing, like LSP, artificial boundary conditions. In quantization experiments ISP has been found to compare favorably with LSP. A study of interframe differentiation coding for ISP and LSP demonstrates the respective performances of the two sets. >

...read moreread less

Journal Article•DOI•

Variable rate vector quantization for speech, image, and video compression

[...]

T. Lookabaugh¹, Eve A. Riskin¹, Philip A. Chou¹, Robert M. Gray¹•Institutions (1)

Stanford University¹

01 Jan 1993-IEEE Transactions on Communications

TL;DR: Three variable-rate vector quantizer systems are applied to speech, image, and video sources and compared to standard vector quantization and noiseless variable- rate coding approaches, providing significant performance improvements for subband speech coding, predictive image coding, and motion-compensated video.

...read moreread less

Abstract: The performance of a vector quantizer can be improved by using a variable-rate code. Three variable-rate vector quantization systems are applied to speech, image, and video sources and compared to standard vector quantization and noiseless variable-rate coding approaches. The systems range from a simple and flexible tree-based vector quantizer to a high-performance, but complex, jointly optimized vector quantizer and noiseless code. The systems provide significant performance improvements for subband speech coding, predictive image coding, and motion-compensated video, but provide only marginal improvements for vector quantization of linear predictive coefficients in speech and direct vector quantization of images. Criteria are suggested for determining when variable-rate vector quantization may provide significant performance improvement over standard approaches. >

...read moreread less

Book•DOI•

Speech and Audio Coding for Wireless and Network Applications

[...]

Bishnu S. Atal, Vladimir Cuperman, Allen Gersho

01 Jan 1993

TL;DR: Speech Coding for Wireless Transmission, a Beginner's Guide to Speech Coding, and Topics in speech Coding.

...read moreread less

Abstract: I: Introduction. II: Low Delay Speech Coding. III: Speech Quality. IV: Speech Coding for Wireless Transmission. V: Audio Coding. VI: Speech Coding for Noisy Transmission Channels. VII: Topics in Speech Coding. Author Index. Index.

...read moreread less

Journal Article•DOI•

Recovery of missing speech packets using the short-time energy and zero-crossing measurements

[...]

Nurgun Erdol¹, C. Castelluccia¹, Ali Zilouchian¹•Institutions (1)

Florida Atlantic University¹

01 Jul 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: In this paper, a waveform substitution technique using interpolation based on the slowly varying speech parameters of short-time energy and zero-crossing information is developed for a packetized speech communication system.

...read moreread less

Abstract: A waveform substitution technique using interpolation based on the slowly varying speech parameters of short-time energy and zero-crossing information is developed for a packetized speech communication system. The system uses 64-kb conventional pulse code modulation (PCM) for encoding and takes advantage of active talkspurts and silence intervals to increase the efficiency of utilizing a digital link. The short-time energy and information on the zero-crossings needed for the purpose of determining talkspurts are transmitted in a preceding packet. Hence, when a packet is pronounced lost, its envelope and frequency characteristics are obtained from a previous packet and used to synthesize a substitution waveform which is free of annoying sounds that are due to abrupt changes in amplitude. >

...read moreread less

Proceedings Article•DOI•

Integrated data and speech transmission using packet reservation multiple access

[...]

Wai-Choong Wong¹, David J. Goodman²•Institutions (2)

National University of Singapore¹, Rutgers University²

23 May 1993

TL;DR: The proposed scheme is shown to provide equitable access to channel resources for both types of users, yielding improvements in overall system performance while significantly increasing data throughput compared to a system without data packet reservation.

...read moreread less

Abstract: The authors propose an integrated packet reservation multiple access (IPRMA) protocol for transmitting both speech and data information. While speech users are allowed to contend for reservation slots on a frame-by-frame basis, data users may reserve multiple slots across a frame to increase throughput. The protocol includes a priority mechanism which ensures that speech users have greater access to idle slots since speech packets have a more demanding delay constraint. The proposed scheme is shown to provide equitable access to channel resources for both types of users, yielding improvements in overall system performance while significantly increasing data throughput compared to a system without data packet reservation. >

...read moreread less

Proceedings Article•DOI•

Qcelp: The North American Cdma Digital Cellular Variable Rate Speech Coding Standard

[...]

Andrew P. Dejaco¹, W. Gardner, P. Jacobs, Chong Lee•Institutions (1)

Qualcomm¹

13 Oct 1993

TL;DR: This chapter describes the “QCELP” algorithm, which has recently been selected as the North American CDMA digital cellular variable rate speech coding standard.

...read moreread less

Abstract: Digital cellular telephone systems require efficient encoding of speech to achieve capacity improvements required of the next generation of cellular systems. The use of a variable rate speech coder can reduce the average data rate required to transmit conversational speech by a factor of two or more, while providing many other advantages. This reduction in average data rate leads to a factor of two increase in the capacity of a Code Division Multiple Access, or CDMA, based digital cellular telephone system by decreasing the mutual interference among users. This chapter describes the “QCELP” algorithm, which has recently been selected as the North American CDMA digital cellular variable rate speech coding standard [l, 21.

...read moreread less

Book•

Visual representations of speech signals

[...]

Martin Cooke¹, S.W. Beet¹, Malcolm Crawford¹•Institutions (1)

University of Sheffield¹

04 Jun 1993

TL;DR: Advanced Time-Frequency Representations for Speech Processing Auditory-Based Wavelet Representation Distortion Maps for Speech Analysis Phase Representations of Acoustic Speech Waveforms Speech Analysis Using Higher Order Statistics Group Delay Processing of Speech Signals Contributors.

...read moreread less

Abstract: Advanced Time-Frequency Representations for Speech Processing Auditory-Based Wavelet Representation Distortion Maps for Speech Analysis Phase Representations of Acoustic Speech Waveforms Speech Analysis Using Higher Order Statistics Group Delay Processing of Speech Signals Contributors The Sheffield Signals Index.

...read moreread less

Proceedings Article•DOI•

Speech enhancement using the dual excitation speech model

[...]

John C. Hardwick¹, Chang D. Yoo¹, Jae Lim¹•Institutions (1)

Massachusetts Institute of Technology¹

27 Apr 1993

TL;DR: Preliminary evidence shows that the dual excitation (DE) speech model may be able to improve the intelligibility of noisy speech for hearing impaired listeners.

...read moreread less

Abstract: The dual excitation (DE) speech model is applied to the problem of speech enhancement. The use of this model and its novel decomposition of speech into coexisting voiced and unvoiced components allow removal of additive wideband noise from the degraded speech with only the knowledge of the power spectrum of the noise. The unique properties of each component are exploited to improve the performance of the enhancement system. Informal comparisons between the DE speech enhancement system and a traditional spectral subtraction algorithm show a clear preference for the DE enhancement system. Although the amount of noise reduction in the two systems was similar, the DE system did not contain the tonal artifacts which were present in the spectral subtraction system. Preliminary evidence shows that the DE speech enhancement system may be able to improve the intelligibility of noisy speech for hearing impaired listeners. >

...read moreread less

Proceedings Article•DOI•

A multi-mode variable rate CELP coder based on frame classification

[...]

P. Lupini¹, N.B. Cox, Vladimir Cuperman•Institutions (1)

Simon Fraser University¹

23 May 1993

TL;DR: The authors present the results of informal MOS tests which show that the variable-rate system running at an average rate of 8 kb/s achieves subjective speech quality close to that of the 16-kb/s fixed- rate system.

...read moreread less

Abstract: The authors present a modular CELP (code-excited linear prediction) coder which can switch bit-rates in response to local speech characteristics (source-controlled mode) or external network conditions (network-controlled mode). The coder is capable of operating at several bit-rates and is optimized for 16 kb/s, 8 kb/s, and 4 kb/s. A 925-b/s configuration is included for silent frames. The authors present the results of informal MOS tests which show that the variable-rate system running at an average rate of 8 kb/s achieves subjective speech quality close to that of the 16-kb/s fixed-rate system (a difference of less than 0.1 on the MOS scale). >

...read moreread less

Proceedings Article•DOI•

A 5.85 kbits CELP algorithm for cellular applications

[...]

Willem Bastiaan Kleijn¹, Peter Kroon¹, L. Cellario², Daniele Sereno²•Institutions (2)

Bell Labs¹, CSELT²

27 Apr 1993

TL;DR: Two versions of the RCELP (relaxation-type code excited linear prediction) algorithm are described and it is shown that only using the perceptual weighting where needed results in reduction of computational effort and can increase speech quality.

...read moreread less

Abstract: Two versions of the RCELP (relaxation-type code excited linear prediction) algorithm are described. They show that the generalized analysis-by-synthesis paradigm provides increased coding efficiency in practical applications, and that it can be implemented in a variety of ways. A novel pitch-period extraction algorithm which improves the effectiveness of RCELP is described. The computational requirements of RCELP are similar to or less that those of an equivalent conventional CELP algorithm. Existing fast procedures for the fixed-codebook contribution to the excitation can be used. At 5.85 kbit/s the two versions of RCELP provide a speech quality which is equal to the current 13 kbit/s GSM speech-coding standard. It is shown that only using the perceptual weighting where needed results in reduction of computational effort and can increase speech quality. >

...read moreread less

Patent•

Block adaptive linear predictive coding with multi-dimensional adaptive gain and bias

[...]

James R. Sullivan¹, Craig M. Smith¹•Institutions (1)

Eastman Kodak Company¹

24 May 1993

TL;DR: A block adaptive linear predictive coding method for encoding a signal having multi-dimensional correlation, such as an image signal as improved by employing multidimensional blocks of error signals for making the prediction is presented in this paper.

...read moreread less

Abstract: A block adaptive linear predictive coding method for encoding a signal having multi-dimensional correlation, such as an image signal as improved by employing multi-dimensional blocks of error signals for making the prediction. As a preferred mode, two statistical quantities are employed to select a quantizer from a set of minimum square error two-variable quantizers based on probability models of statistical quantities.

...read moreread less

Proceedings Article•DOI•

An 8-bit/s speech coder based on conjugate structure CELP

[...]

A. Kataoka, Takehiro Moriya, S. Hayashi

27 Apr 1993

TL;DR: A high-quality 8-bit/s speech coder based on CS:CELP (conjugate structure code excited linear prediction) with 10 ms frame length is presented and it is found that the proposed coder is robust against random bit errors.

...read moreread less

Abstract: A high-quality 8-bit/s speech coder based on CS:CELP (conjugate structure code excited linear prediction) with 10 ms frame length is presented. To provide high quality in both error-free and error conditions, it uses four schemes: LSP (line spectrum pair) quantization using interframe correlation, preselection of codebook search, a conjugate structure, and backward adaptation of the VQ (vector quantization) gain. LSP parameters are quantized by multistage VQ with MA prediction. The preselection of the codebook reduces computational complexity and improves robustness. The CS improves the ability to handle random bit errors and reduces memory requirements. The backward adaptation of the VQ gain provides high quality and robustness without having to transmit input speech power information. Subjective testing indicates that the quality of the proposed coder is equivalent to that of the 32 kbit/s ADPCM (adaptive differential pulse code modulation) under error-free conditions. It is also found that the proposed coder is robust against random bit errors. >

...read moreread less

Proceedings Article•DOI•

Adaptive predictive coding of speech by means of volterra predictors

[...]

Enzo Mumolo¹, D. Francescato•Institutions (1)

University of Trieste¹

17 Jan 1993

TL;DR: The main result is that, by using this type of predictor, lower variance error signals can be obtained, as compared to the classical, linear, case.

...read moreread less

Abstract: In this paper a waveform coder configuration based on non linear adaptive prediction will be described. The coder is based on the characteristic of Volterra predictors to model non linear phenomena and to gather informations about the periodicity of the signal via high order statistical moments. The main result is that, by using this type of predictor, lower variance error signals can be obtained, as compared to the classical, linear, case.

...read moreread less

Patent•

Neural network speech recognition apparatus recognizing the frequency of successively input identical speech data sequences

[...]

Mitsuhiro Inazumi¹•Institutions (1)

Epson¹

06 Aug 1993

TL;DR: In this article, a speech recognition non-layered neural network unit is used to determine whether the input speech data sequence matches at least one predetermined speech data sequences, which can be achieved even when the speech sequence to be recognized is inputted successively.

...read moreread less

Abstract: The speech recognition apparatus recognizes a frequency of successively input identical speech data sequences. The speech recognition apparatus includes a speech recognition non-layered neural network unit. Speech data sequence is inputted as feature vectors from a feature extracting unit. The neural network performs speech recognition and determines whether the input speech data sequence matches at least one predetermined speech data sequence. The neural network generates a speech recognition signal when the input speech data sequence matches the at least one predetermined speech data sequence. A recognition signal detecting unit outputs a reset instruction signal each time the neural network generates the speech recognition signal. An internal state value setting unit resets the neural network unit to an initial state each time the recognition signal detecting unit outputs the reset instruction signal. Since the neural network unit is reset each time the speech recognition signal is outputted, accurate detection can be achieved even when speech data sequence to be recognized is inputted successively.

...read moreread less

Patent•DOI•

Time-frequency interpolation with application to low rate speech coding

[...]

Shoham Yair¹•Institutions (1)

AT&T¹

30 Sep 1993-Journal of the Acoustical Society of America

TL;DR: Timing-frequency interpolation (TFI) as mentioned in this paper was proposed for low-rate speech coding, which offers advantages over conventional CELP (code-excited linear predictive) algorithms for low rate coding.

...read moreread less

Abstract: A new method for high quality speech coding, Timing-Frequency Interpolation (TFI) which offers advantages over conventional CELP (code-excited linear predictive) algorithms for low rate coding. The method, provides a perceptually advantageous framework for voiced speech processing. The general formulation of the TFI technique is described.

...read moreread less

Patent•DOI•

Code excitation linear predictive (CELP) encoder and decoder and code excitation linear predictive coding method

[...]

Hosoda Kenichiro¹, Aoyagi Hiromi¹, Hiroshi Katsuragawa¹, Ariyama Yoshihiro¹•Institutions (1)

Oki Electric Industry¹

10 Jun 1993-Journal of the Acoustical Society of America

TL;DR: There is provided a code excitation linear predictive coding or decoding apparatus in which a code vector, which is transmitted by a codebook such as a stochastic codebook, is converted adaptively in accordance with vocal tract analysis information (LPC) so that a high quality reproduction speech is obtained at a low coding rate.

...read moreread less

Abstract: There is provided a code excitation linear predictive (CELP) coding or decoding apparatus in which a code vector, which is transmitted by a codebook such as a stochastic codebook, is converted adaptively in accordance with vocal tract analysis information (LPC) so that a high quality reproduction speech is obtained at a low coding rate. Further, in order to obtain a similar effect, a pulse-like excitation codebook formed of an isolated impulse is provided in addition to the adaptive excitation codebook and stochastic excitation codebook so that either the stochastic excitation codebook or the pulse-like excitation codebook is selectively used to provide a vocal tract parameter as a linear spectrum pair parameter.

...read moreread less

Collapse