Topic

Linear predictive coding

About: Linear predictive coding is a research topic. Over the lifetime, 6565 publications have been published within this topic receiving 142991 citations. The topic is also known as: Linear predictive coding, LPC.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Theoretical analysis of the high-rate vector quantization of LPC parameters

[...]

Gardner William R¹, Bhaskar D. Rao•Institutions (1)

Qualcomm¹

01 Sep 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures is presented, and the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems is described.

...read moreread less

Abstract: The paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems. First, it is shown that in many high-rate VQ systems the quantization distortion approaches a simple quadratically weighted error measure, where the weighting matrix is a "sensitivity matrix" that is an extension of the concept of the scalar sensitivity. The approximate performance of VQ systems that train and quantize using mismatched distortion measures is derived, and is used to construct better distortion measures. Second, these results are used to determine the performance of LPC vector quantizers, as measured by the log spectral distortion (LSD) measure, which have been trained using other error measures, such as mean-squared (MSE) or weighted mean-squared error (WMSE) measures of LEPC parameters, reflection coefficients and transforms thereof, and line spectral pair (LSP) frequencies. Computationally efficient algorithms for computing the sensitivity matrices of these parameters are described. In particular, it is shown that the sensitivity matrix for the LSP frequencies is diagonal, implying that a WMSE measured LSP frequencies converges to the LSD measure in high-rate VQ systems. Experimental results to support the theoretical performance estimates are provided. >

...read moreread less

182 citations

Patent•

Predictive coding of speech signals

[...]

Bishnu S. Atal¹•Institutions (1)

Bell Labs¹

19 Aug 1968

TL;DR: In this article, an adaptive predictor is employed which is readjusted periodically to match the time-varying characteristics of a speech signal, which is used to reduce the channel capacity required to transmit a signal with specified fidelity.

...read moreread less

Abstract: Predictive coding of signals, i.e., the reduction or redundancy in a signal by subtracting from it that part which can be predicted from its past, is a well-known technique for reducing the channel capacity required to transmit a signal with specified fidelity. It has been widely applied to signals, such as television signals which have regularly repeating intervals of information, but has not been satisfactorily applied to signals, such as speech, which exhibit characteristics that vary from speaker to speaker and from time to time for one speaker. According to this invention, an adaptive predictor is employed which is readjusted periodically to match the time-varying characteristics of a speech signal.

...read moreread less

181 citations

Journal Article•DOI•

Significance of the Modified Group Delay Feature in Speech Recognition

[...]

Rajesh M. Hegde¹, Hema A. Murthy¹, Venkata Ramana Rao Gadde²•Institutions (2)

Indian Institute of Technology Madras¹, SRI International²

01 Jan 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The group delay function is modified to overcome the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects and is called the modified group delay feature (MODGDF).

...read moreread less

Abstract: Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed

...read moreread less

181 citations

Proceedings Article•DOI•

Modeling spectral speech transitions using temporal decomposition techniques

[...]

G. Ahlbom, Frédéric Bimbot, Gérard Chollet

06 Apr 1987

TL;DR: Simplifications of ATAL's technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures with applications to acoustic-phonetic synthesis are reported on.

...read moreread less

Abstract: ATAL [1] introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios {y_{i}= \Ln ((1+k_{i})/(1-k_{i}))} where k i are the reflection coefficients obtained from short-time stationary LPC analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of articulatory targets (vector quantization code book). A set of speech segments ("polysons") has been encoded using this technique. It includes diphones, demi-syllables, and other units that are difficult to segment. Temporal decomposition using target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analyticaiy explained and modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic speech synthesis.

...read moreread less

179 citations

Proceedings Article•DOI•

A segment vocoder at 150 b/s

[...]

S. Roucos¹, Richard Schwartz, J. Makhoul•Institutions (1)

BBN Technologies¹

01 Apr 1983

TL;DR: It is demonstrated in this paper that this random quantizer used in the original vocoder is near-optimal by comparing it with quantizers that use clustering algorithms for quantizing speech segments.

...read moreread less

Abstract: In this paper we investigate several methods for reducing the bit rate of a segment vocoder [1] by 35% to 150 b/s. In the original vocoder we used a random sample of vectors as a set of templates for vector quantization. We demonstrate in this paper that this random quantizer is near-optimal by comparing it with quantizers that use clustering algorithms for quantizing speech segments. The reduction of the bit rate of the segment vocoder was achieved primarily by using a segment network, i.e., not all segment templates are allowed to follow a given segment template. The spectral continuity of speech is used to determine the subset of templates, that can be used to quantize an input segment. To achieve the low rate of 150 b/s, we also reduced the bit rate for coding pitch, gain, and segment duration. Finally, we present the bit allocation used for transmitting speech at 150 b/s as a single speaker segment vocoder.

...read moreread less

178 citations

Collapse

Network Information

Performance

Metrics

6,598

Papers

148,119

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	25
2021	26
2020	42
2019	25
2018	37

Linear predictive coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics