scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
PatentDOI
Kazunori Ozawa1
TL;DR: In this article, a speech coding method in which spectrum parameters representing a spectrum envelope and a pitch parameter representing a pitch are obtained from an input discrete speech signal is presented. And a frame interval is divided into subintervals in accordance with the pitch parameter.
Abstract: A speech coding method in which spectrum parameter representing a spectrum envelope and a pitch parameter representing a pitch are obtained from an input discrete speech signal. A frame interval is divided into subintervals in accordance with the pitch parameter. A sound source signal in one of the subintervals is obtained by obtaining a multipulse with respect to a difference signal obtained by performing prediction on the basis of a past sound source signal. Correction information for correcting at least one of the amplitude and the phase of the sound source signal are obtained and output in other pitch intervals in the frame.

183 citations

Patent
16 Dec 1997
TL;DR: A subband audio coder employs perfect/nonperfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean square error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio as mentioned in this paper.
Abstract: A subband audio coder employs perfect/non-perfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean-square-error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio. The audio coder windows the multi-channel audio signal such that the frame size, i.e. number of bytes, is constrained to lie in a desired range, and formats the encoded data so that the individual subframes can be played back as they are received thereby reducing latency. Furthermore, the audio coder processes the baseband portion (0-24 kHz) of the audio bandwidth for sampling frequencies of 48 kHz and higher with the same encoding/decoding algorithm so that audio coder architecture is future compatible.

183 citations

Proceedings ArticleDOI
Jing Huang1, Brian Kingsbury1
26 May 2013
TL;DR: This work uses DBNs for audio-visual speech recognition; in particular, it uses deep learning from audio and visual features for noise robust speech recognition and test two methods for using DBN’s in a multimodal setting.
Abstract: Deep belief networks (DBN) have shown impressive improvements over Gaussian mixture models for automatic speech recognition. In this work we use DBNs for audio-visual speech recognition; in particular, we use deep learning from audio and visual features for noise robust speech recognition. We test two methods for using DBNs in a multimodal setting: a conventional decision fusion method that combines scores from single-modality DBNs, and a novel feature fusion method that operates on mid-level features learned by the single-modality DBNs. On a continuously spoken digit recognition task, our experiments show that these methods can reduce word error rate by as much as 21% relative over a baseline multi-stream audio-visual GMM/HMM system.

182 citations

Journal ArticleDOI
TL;DR: A theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures is presented, and the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems is described.
Abstract: The paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems. First, it is shown that in many high-rate VQ systems the quantization distortion approaches a simple quadratically weighted error measure, where the weighting matrix is a "sensitivity matrix" that is an extension of the concept of the scalar sensitivity. The approximate performance of VQ systems that train and quantize using mismatched distortion measures is derived, and is used to construct better distortion measures. Second, these results are used to determine the performance of LPC vector quantizers, as measured by the log spectral distortion (LSD) measure, which have been trained using other error measures, such as mean-squared (MSE) or weighted mean-squared error (WMSE) measures of LEPC parameters, reflection coefficients and transforms thereof, and line spectral pair (LSP) frequencies. Computationally efficient algorithms for computing the sensitivity matrices of these parameters are described. In particular, it is shown that the sensitivity matrix for the LSP frequencies is diagonal, implying that a WMSE measured LSP frequencies converges to the LSD measure in high-rate VQ systems. Experimental results to support the theoretical performance estimates are provided. >

182 citations

Proceedings ArticleDOI
07 May 2001
TL;DR: In this paper, the authors present several mechanisms that enable effective spread-spectrum audio watermarking systems: prevention against detection desynchronization, cepstrum filtering, and chess watermarks.
Abstract: We present several mechanisms that enable effective spread-spectrum audio watermarking systems: prevention against detection desynchronization, cepstrum filtering, and chess watermarks. We have incorporated these techniques into a system capable of reliably detecting a watermark in an audio clip that has been modified using a composition of attacks that degrade the original audio characteristics well beyond the limit of acceptable quality. Such attacks include: fluctuating scaling in the time and frequency domain, compression, addition and multiplication of noise, resampling, requantization, normalization, filtering, and random cutting and pasting of signal samples.

182 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108