scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Patent
25 Jun 1992
TL;DR: In this article, a low-bit-rate speech decoder is proposed, which operates in two modes depending on the received mode bit, pitch prefiltering and global post-filtering are employed for enhancement of the synthesized speech.
Abstract: Code excited linear prediction (CELP) is performed using two voiced and unvoiced sets of windows, each set is used both for linear prediction and pitch determination. The accompanying degradation in voice quality is comparable to the IS54 standard 8.0 Kbps voice coder employed in U.S. digital cellular systems. This is accomplished by using the same parametric model used in traditional CELP coders but determining, quantizing, encoding, and updating these parameters differently. The low bit rate speech decoder is like most CELP decoders except that it operates in two modes depending on the received mode bit. Both pitch prefiltering and global postfiltering are employed for enhancement of the synthesized speech. In addition, built-in error detection and error recovery schemes are used that help mitigate the effects of any uncorrectable transmission errors.

113 citations

Journal ArticleDOI
TL;DR: This paper introduces a novel non-parametric, exemplar-based method for reconstructing clean speech from noisy observations, based on techniques from the field of Compressive Sensing, which can impute missing features using larger time windows such as entire words.
Abstract: An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing), and to replace (impute) the missing ones by clean speech estimates. Conventional imputation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low signal-to-noise ratios (SNRs), these techniques fail, because too many time frames may contain few, if any, reliable features. In this paper, we introduce a novel non-parametric, exemplar-based method for reconstructing clean speech from noisy observations, based on techniques from the field of Compressive Sensing. The method, dubbed sparse imputation, can impute missing features using larger time windows such as entire words. Using an overcomplete dictionary of clean speech exemplars, the method finds the sparsest combination of exemplars that jointly approximate the reliable features of a noisy utterance. That linear combination of clean speech exemplars is used to replace the missing features. Recognition experiments on noisy isolated digits show that sparse imputation outperforms conventional imputation techniques at SNR = -5 dB when using an ideal `oracle' mask. With error-prone estimated masks sparse imputation performs slightly worse than the best conventional technique.

113 citations

Patent
Steven D. Curtin1
11 Apr 2000
TL;DR: In this paper, a digital wireless premises audio system, a method of operating the same and a home theater system incorporating the audio system or the method, is presented, which includes a digital audio encoder/transmitter, located on the premises, that accepts an audio channel in digital form, encodes the channel into a stream of digital data and wirelessly transmits the stream about the premises.
Abstract: A digital wireless premises audio system, a method of operating the same and a home theater system incorporating the audio system or the method. In one embodiment, the audio system includes: (1) a digital audio encoder/transmitter, located on the premises, that accepts an audio channel in digital form, encodes the channel into a stream of digital data and wirelessly transmits the stream about the premises and (2) a speaker module, located on the premises, couplable to a power source and including, in series, a digital audio receiver/decoder, an audio amplifier and a speaker, that receives the stream, decodes the audio channel therefrom, converts the audio channel to analog form and employs power from the power source to amplify the audio channel and drive the speaker therewith.

113 citations

Proceedings ArticleDOI
12 May 1998
TL;DR: Performance of the modified Bark spectral distortion is reported in terms of frame sizes, speech classes, and spectral regions and the high frequency region appears to play an important role in human perception of speech quality.
Abstract: The modified Bark spectral distortion (MBSD), used for an objective speech quality measure, was presented previously (see IEEE Speech Coding Workshop, p.55-6, 1997). The MBSD measure takes into account the noise masking threshold in order to use only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD over the conventional BSD. In this paper, performance of the MBSD is reported in terms of frame sizes, speech classes, and spectral regions. The performance of the MBSD is not very sensitive to the frame size. The performance of the MBSD for voiced speech is almost the same as for non-silent speech. The high frequency region appears to play an important role in human perception of speech quality.

113 citations

Proceedings ArticleDOI
01 Apr 1985
TL;DR: In vector quantization schemes usually speech and speaker dependent codebooks are applied in order to achieve good speech quality at medium bit rates, but this paper deals with another approach: the speech waveforms are transformed into signals which ideally do no longer containspeech and speaker specific features.
Abstract: In vector quantization schemes usually speech and speaker dependent codebooks are applied in order to achieve good speech quality at medium bit rates. This paper deals with another approach: The speech waveforms are transformed into signals which ideally do no longer contain speech and speaker specific features. Thus these signals can be encoded by an universal vector quantizer. This concept is realized by a system called RELP-VQ. The performance of this RELP-VQ scheme was evaluated by SNR-measurements as well as by informal listening tests including female and male English and German speakers.

112 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108