scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Patent
Yang Gao1
24 Aug 1999
TL;DR: In this article, a multi-rate speech codec supports a number of encoding bit rate modes by adaptively selecting encoding bits rate modes to match communication channel restrictions, and a variety of techniques are applied, many of which involve the classification of the input signal.
Abstract: A multi-rate speech codec supports a number of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code-excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To achieve high quality in high lower bit rate encoding modes, the speech encoder departs from the strict waveform matching criteria of regular CELP coders and strives to identify significant perceptual features of the input signal. To support lower bit rate encoding modes, a variety of techniques are applied, many of which involve the classification of the input signal. For each of the bit rate modes selected, a number of fixed or innovation sub-codebooks are selected in use in generating innovation vectors.

77 citations

Proceedings Article
01 Jan 1997
TL;DR: In this paper, a review of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio signals is presented, including algorithms which manipulate transform components and subband signal decompositions.
Abstract: Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions. The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms. The paper concludes with a brief discussion of future research directions.

77 citations

Proceedings ArticleDOI
17 Sep 2000
TL;DR: An algorithm to generate wideband speech from narrow band speech using as low as 500 bit/s of side information is presented, which has enhanced quality compared to narrowband speech.
Abstract: Wireless telephone speech is usually limited to the 300-3400 Hz band, which reduces its quality. There is thus a growing demand for wideband speech systems that transmit from 50 Hz to 8000 Hz. This paper presents an algorithm to generate wideband speech from narrowband speech using as low as 500 bit/s of side information. The 50-300 Hz band is predicted from the narrowband signal. A source-excitation model is used for the 3400-8000 Hz band, where the excitation is extrapolated at the receiver, and the spectral envelope is transmitted. Though some artifacts are present, the resulting wideband speech has enhanced quality compared to narrowband speech.

77 citations

Proceedings ArticleDOI
Juin-Hwey Chen1
03 Apr 1990
TL;DR: A high-quality 16-kb/s speech coder which has a one-way coding delay of less than 2 ms is presented and formal subjective tests indicate that this coder produces high- quality speech comparable to that of the CCITT G.721 32- kb/s ADPCM standard.
Abstract: A high-quality 16-kb/s speech coder which has a one-way coding delay of less than 2 ms is presented. The coder is basically a backward-adaptive version of the code-excited linear prediction (CELP) coder. The low coding delay is achieved by using backward-adaptive predictor and gain and by using an excitation vector size as small as five samples. The pitch predictor in conventional CELP coders is eliminated, and the linear predictive coding (LPC) predictor order is increased from 10 to 50. The excitation gain is updated by a tenth-order adaptive logarithmic gain predictor. This log-gain predictor and the LPC predictor are updated by performing LPC analysis on previous log-gain and coded speech, respectively. The excitation codebook is closed-loop optimized and the codebook index is Gray-coded to improve the robustness against channel errors. Formal subjective tests indicate that this coder produces high-quality speech comparable to that of the CCITT G.721 32-kb/s ADPCM standard. >

77 citations

Journal ArticleDOI
TL;DR: A new method is described for quantifying the quality degradation introduced by wide-band speech codecs via a one-dimensional impairment factor, based on auditory listening-only tests, which may be used for predicting speech quality in an instrumental way.
Abstract: A new method is described for quantifying the quality degradation introduced by wide-band speech codecs via a one-dimensional impairment factor. The method is based on auditory listening-only tests, but the resulting impairment factors may be used for predicting speech quality in an instrumental way, e.g., for network planning purposes. Following the method, auditory test results are first transformed to an overall quality rating scale, and then adjusted to rule out test-specific effects. The derived impairment factors fit into the common framework which is defined by the E-model for narrow-band telephone networks, and which is hereby extended towards wide-band speech transmission. This paper presents the necessary auditory test data, describes the derivation and adjustment methodology, and provides numerical values for a range of wide-band speech codecs. The values are tested for their robustness in case of codec tandems and adjusted to represent the effects of packet loss

77 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108