scispace - formally typeset
Search or ask a question
Topic

Linear predictive coding

About: Linear predictive coding is a research topic. Over the lifetime, 6565 publications have been published within this topic receiving 142991 citations. The topic is also known as: Linear predictive coding, LPC.


Papers
More filters
Proceedings ArticleDOI
17 Oct 2013
TL;DR: Several experimental results, signal-to-noise ratio, key space, key sensitivity tests, statistical analysis, chosen/known plaintext attack and time analysis show that the proposed method for speech scrambling performs efficiently and can be applied for secure real time speech communications.
Abstract: This paper presents a chaos-based speech scrambling system. Chaotic maps have been successfully used for large-scale data encryption such as image, audio and video data, due to their good properties such as pseudo-randomness, sensitivity to changes in initial conditions and system parameters and aperiodicity. This paper uses two chaotic maps, circle map and logistic map for speech confusion and diffusion, respectively. In the confusion stage, speech samples are divided into small segments. Then, indices of ordered generated sequence of circle map are used to shuffle the positions of the speech signal segments. Then, a one-time pad generated by the logistic map is used for the diffusion stage. Several experimental results, signal-to-noise ratio, key space, key sensitivity tests, statistical analysis, chosen/known plaintext attack and time analysis show that the proposed method for speech scrambling performs efficiently and can be applied for secure real time speech communications.

28 citations

Proceedings ArticleDOI
R.C. Rose1, Hong Kook Kim
30 Nov 2003
TL;DR: A hybrid procedure for barge-in detection is proposed and evaluated that combines a feature-based voice activity detection (VAD) algorithm with a model-based approach for verifying hypothesized speech segments and is found to obtain better detection performance than procedures that rely on the speech recognition decoder to detect speech.
Abstract: This paper investigates techniques designed to allow the users of human-machine dialog systems to interrupt or barge-in over machine generated speech messages. An experimental study was performed on utterances collected from a telephone based dialog system to analyze the effect of barge-in performance on users' speech. One result of this study was that excessive barge-in latencies resulted in disfluencies appearing in over half of users' utterances. A hybrid procedure for barge-in detection is proposed and evaluated on the utterances collected from the same domain. The procedure combines a feature-based voice activity detection (VAD) algorithm with a model-based approach for verifying hypothesized speech segments. The procedure is shown in the paper to obtain better detection performance than procedures that rely on the speech recognition decoder to detect speech. It is also found to have latencies that are comparable to those obtained by low delay feature-based speech detection algorithms.

28 citations

Journal ArticleDOI
TL;DR: A network architecture is proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time and which compares favorably with alternative methods for formant estimation and tracking.
Abstract: Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding–based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.

28 citations

Proceedings ArticleDOI
01 Apr 1986
TL;DR: Preliminary results indicate that high quality reproduction can be obtained with this speech coding system for both clean and noisy speech without the "buzziness" and severe degradation in noise typically associated with vocoder speech.
Abstract: A 9.6 kbps speech coding system based on a new speech model is presented. In this model, the short-time spectrum of speech is modeled as the product of an excitation spectrum and a spectral envelope. The spectral envelope is some smoothed version of the speech spectrum and the excitation spectrum is represented by a fundamental frequency, a voiced/unvoiced (V/UV) decision for each harmonic of the fundamental, and the phase of each harmonic declared voiced. In speech analysis, the model parameters are estimated by explicit comparison between the original speech spectrum and the synthetic speech spectrum. Preliminary results indicate that high quality reproduction can be obtained with this speech coding system for both clean and noisy speech without the "buzziness" and severe degradation in noise typically associated with vocoder speech.

28 citations

Journal ArticleDOI
TL;DR: A spectral domain speech enhancement algorithm is developed, and hidden Markov model (HMM) based MMSE estimators for speech periodogram coefficients are derived under this gamma assumption in both a high uniform resolution and a reduced-resolution Mel domain.
Abstract: The derivation of MMSE estimators for the DFT coefficients of speech signals, given an observed noisy signal and super-Gaussian prior distributions, has received a lot of interest recently. In this letter, we look at the distribution of the periodogram coefficients of different phonemes, and show that they have a gamma distribution with shape parameters less than one. This verifies that the DFT coefficients for not only the whole speech signal but also for individual phonemes have super-Gaussian distributions. We develop a spectral domain speech enhancement algorithm, and derive hidden Markov model (HMM) based MMSE estimators for speech periodogram coefficients under this gamma assumption in both a high uniform resolution and a reduced-resolution Mel domain. The simulations show that the performance is improved using a gamma distribution compared to the exponential case. Moreover, we show that, even though beneficial in some aspects, the Mel-domain processing does not lead to better results than the algorithms in the high-resolution domain.

28 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Noise
110.4K papers, 1.3M citations
81% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
80% related
Filter (signal processing)
81.4K papers, 1M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202225
202126
202042
201925
201837