scispace - formally typeset
Search or ask a question
Topic

Linear predictive coding

About: Linear predictive coding is a research topic. Over the lifetime, 6565 publications have been published within this topic receiving 142991 citations. The topic is also known as: Linear predictive coding, LPC.


Papers
More filters
Journal ArticleDOI
TL;DR: Instrumental measures predict that by incorporating uncertain prior information of the phase, the quality and intelligibility of processed speech can be improved both over traditional phase insensitive approaches, and approaches that treat prior information on the phase as deterministic.
Abstract: While most short-time discrete Fourier transform-based single-channel speech enhancement algorithms only modify the noisy spectral amplitude, in recent years the interest in phase processing has increased in the field. The goal of this paper is twofold. First, we derive Bayesian probability density functions and estimators for the clean speech phase when different amounts of prior knowledge about the speech and noise amplitudes is given. Second, we derive a joint Bayesian estimator of the clean speech amplitudes and phases, when uncertain a priori knowledge on the phase is available. Instrumental measures predict that by incorporating uncertain prior information of the phase, the quality and intelligibility of processed speech can be improved both over traditional phase insensitive approaches, and approaches that treat prior information on the phase as deterministic.

47 citations

Patent
31 Oct 1975
TL;DR: In this paper, the autocorrelation function of a digital signal representing the speech signal is determined by a circuit which employs simple combinational logic and an updown counter circuit, and a signal representative of speech energy is provided by summing the digital speech signals over a predetermined time interval and intervals of silence are detected by comparing the speech energy in an interval of time with a predetermined or adaptively determined threshold energy.
Abstract: Apparatus for the real-time analysis of speech signals in which a digital signal representative of the speech signal is adaptive threshold center-clipped and infinite peak-clipped to form a signal comprising three logic states (+1,0,-1). The autocorrelation function of this signal is determined by a circuit which employs simple combinational logic and an updown counter circuit. Pitch period and voiced-unvoiced indication are determined from the location and magnitude of the peak value of the autocorrelation function. Additionally, a signal representative of the speech energy is provided by summing the digital speech signals over a predetermined time interval and intervals of silence are detected by comparing the speech energy in an interval of time with a predetermined or adaptively determined threshold energy.

47 citations

Journal ArticleDOI
TL;DR: Analytical results from statistical room acoustics are utilizes to analyze the AR modeling of speech under reverberant conditions and it is demonstrated that at each individual source-microphone position (without spatial expectation), the M-channel AR coefficients provide the best approximation to the clean speech coefficients when microphones are closely spaced.
Abstract: Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an M-channel observation (M > 1); and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the M-channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced (<0.3m).

47 citations

Journal ArticleDOI
TL;DR: This work overviews some recently proposed discrete Fourier transform (DFT)- and discrete wavelet packet transform (DWPT)-based speech parameterization methods and compares their performance against traditional techniques, such as the Mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP), which presently dominate the speech recognition field.
Abstract: In the present work we overview some recently proposed discrete Fourier transform (DFT)- and discrete wavelet packet transform (DWPT)-based speech parameterization methods and evaluate their performance on the speech recognition task. Specifically, in order to assess the practical value of these less studied speech parameterization methods, we evaluate them in a common experimental setup and compare their performance against traditional techniques, such as the Mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) cepstral coefficients which presently dominate the speech recognition field. In particular, utilizing the well established TIMIT speech corpus and employing the Sphinx-III speech recognizer, we present comparative results of 8 different speech parameterization techniques.

47 citations

Journal ArticleDOI
TL;DR: The results of perceptual evaluation indicated that listeners generally preferred to listen to the alaryngeal speech samples enhanced by the modified conversions over original samples.
Abstract: Two existing speech conversion algorithms were modified and used to enhance alaryngeal speech. The modifications were aimed at reducing the spectral distortion (bandwidth increase) in a vector-quantization (VQ) based system and the spectral discontinuity in a linear multivariate regression (LMR) based system. Spectral distortion was compensated for by formant enhancement using the chirp z-transform and cepstral weighting. Spectral discontinuity was alleviated using overlapping clusters during the construction of the conversion mapping function. The modified VQ and LMR algorithms were used to enhance alaryngeal speech. The results of perceptual evaluation indicated that listeners generally preferred to listen to the alaryngeal speech samples enhanced by the modified conversions over original samples.

47 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Noise
110.4K papers, 1.3M citations
81% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
80% related
Filter (signal processing)
81.4K papers, 1M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202225
202126
202042
201925
201837