Topic

Linear predictive coding

About: Linear predictive coding is a research topic. Over the lifetime, 6565 publications have been published within this topic receiving 142991 citations. The topic is also known as: Linear predictive coding, LPC.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation

[...]

Jesper B. Boldt, Daniel P. W. Ellis¹•Institutions (1)

Columbia University¹

24 Aug 2009

TL;DR: A measure based on the similarity between the time-varying spectral envelopes of target speech and system output, as measured by correlation, can provide a more meaningful evaluation measure for nonlinear speech enhancement systems, as well as providing a transparent objective function for the optimization of such systems.

...read moreread less

Abstract: Applying a binary mask to a pure noise signal can result in speech that is highly intelligible, despite the absence of any of the target speech signal. Therefore, to estimate the intelligibility benefit of highly nonlinear speech enhancement techniques, we contend that SNR is not useful; instead we propose a measure based on the similarity between the time-varying spectral envelopes of target speech and system output, as measured by correlation. As with previous correlation-based intelligibility measures, our system can broadly match subjective intelligibility for a range of enhanced signals. Our system, however, is notably simpler and we explain the practical motivation behind each stage. This measure, freely available as a small Matlab implementation, can provide a more meaningful evaluation measure for nonlinear speech enhancement systems, as well as providing a transparent objective function for the optimization of such systems.

...read moreread less

43 citations

DOI•

Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems

[...]

К. R. Aida–Zade, Cemal Ardil, Samir Rustamov

02 Nov 2017

TL;DR: In this article, the authors proposed a combined use of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features to improve the reliability of speech recognition system.

...read moreread less

Abstract: Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown.

...read moreread less

43 citations

Proceedings Article•

Robust speech recognition using a voiced-unvoiced feature.

[...]

András Zolnay¹, Ralf Schlüter, Hermann Ney•Institutions (1)

RWTH Aachen University¹

01 Jan 2002

TL;DR: A voiced-unvoiced measure was combined with the standard Mel Frequency Cepstral Coefficients using linear discriminant analysis (LDA) to choose the most relevant features for continuous speech recognition.

...read moreread less

Abstract: In this paper, a voiced-unvoiced measure is used as acoustic feature for continuous speech recognition. The voiced-unvoiced measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis (LDA) to choose the most relevant features. Experiments were performed on the SieTill (German digit strings recorded over telephone line) and on the SPINE (English spontaneous speech under different simulated noisy environments) corpus. The additional voiced-unvoiced measure results in improvements in word error rate (WER) of up to 11% relative to using MFCC alone with the same overall number of parameters in the system.

...read moreread less

43 citations

Journal Article•DOI•

R/D optimal linear prediction

[...]

Paolo Prandoni¹, Martin Vetterli•Institutions (1)

École Normale Supérieure¹

01 Nov 2000-IEEE Transactions on Speech and Audio Processing

TL;DR: An algorithm which determines the optimal segmentation with respect to a cost function relating prediction error to modeling cost is presented, whereby the segmentation is implicitly computed while minimizing the modelization distortion for a given modelization cost.

...read moreread less

Abstract: A common technique to extend linear prediction to nonstationary signals is time segmentation: the signal is split into small portions and the modelization is carried out locally. The accuracy of the analysis is, however, dependent on the window size and on the signal characteristics, so that the problem of finding a good segmentation is crucial to the entire modeling scheme. In this paper, we present an algorithm which determines the optimal segmentation with respect to a cost function relating prediction error to modeling cost. The proposed approach casts the problem in a rate/distortion (R/D) framework, whereby the segmentation is implicitly computed while minimizing the modelization distortion for a given modelization cost. The algorithm is implemented by means of dynamic programming and takes the form of a trellis-based Lagrangian minimization. The optimal linear predictor, when applied to speech coding, dramatically reduces the number of bits per second devoted to the modeling parameters in comparison to fixed-window schemes.

...read moreread less

43 citations

Journal Article•DOI•

Content-Dependent Watermarking Scheme in Compressed Speech With Identifying Manner and Location of Attacks

[...]

Oscal T.-C. Chen¹, Chia-Hsiung Liu¹•Institutions (1)

National Chung Cheng University¹

01 Jul 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A content-dependent watermarking scheme suitable for codebook-excited linear prediction (CELP)-based speech codec that ensures the integrity of compressed speech data.

...read moreread less

Abstract: As speech compression technologies have advanced, digital recording devices have become increasingly popular. However, data formats used in popular speech codecs are known a priori, such that compressed data can be modified easily via insertion, deletion, and replacement. This work proposes a content-dependent watermarking scheme suitable for codebook-excited linear prediction (CELP)-based speech codec that ensures the integrity of compressed speech data. Speech data are initially partitioned into many groups, each of which includes multiple speech frames. The watermark embedded in each frame is then generated according to the line spectrum frequency (LSF) feature in the current frame, the pitch extracted from the succeeding frame, the watermark embedded in the preceding frame, and the group index which is determined by the location of the current frame. Finally, some of the least significant bits (LSBs) of the indices indicating the excitation pulse positions or excitation vectors are substituted for the watermark. Conventional watermarking schemes can only detect whether compressed speech data are intact. They cannot determine where compressed speech data are altered by insertion, deletion, or replacement, whereas the proposed scheme can. Experiments established that the proposed scheme used in the G.723.1 6.3 kb/s speech codecs embeds 12 bits in each compressed speech frame with 189 bits, and only decreases the perceptual evaluation of speech quality (PESQ) by 0.11. Additionally, its accuracy in detecting the locations of attacked frames is very high, with only two normal frames mistaken as attacked frames. Therefore, the proposed watermarking scheme effectively ensures the integrity of compressed speech data.

...read moreread less

43 citations

Collapse

Network Information

Performance

Metrics

6,598

Papers

148,119

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	25
2021	26
2020	42
2019	25
2018	37

Linear predictive coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics