Topic
Adaptive Multi-Rate audio codec
About: Adaptive Multi-Rate audio codec is a research topic. Over the lifetime, 1467 publications have been published within this topic receiving 19736 citations. The topic is also known as: AMR & Adaptive Multi-Rate.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A single channel PCM codec is described which is being manufactured for use in the new British Post Office range of digital PABXs and meets all the relevant CCITT Recommendations with significant operating margins.
Abstract: A single channel PCM codec is described which is being manufactured for use in the new British Post Office range of digital PABXs. The codec works on the principle of converting to and from PCM using an intermediate delta sigma modulation code format. This technique allows relatively simple analog circuitry to be used in conjunction with a digital LSI chip to perform the conversion between the intermediate code and PCM. The codec meets all the relevant CCITT Recommendations with significant operating margins.
37 citations
••
TL;DR: This new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification and is compared with the two standard MP3 and AAC codecs in terms of performance and computation time.
Abstract: Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compression format, without having to fully decode the signals. Here, we show that the representations used in standard transform-based audio codecs (e.g., MDCT for AAC, or hybrid PQF/MDCT for MP3) have a sufficient time resolution for some rhythmic features, but a poor frequency resolution, which prevents their use in tonality-related applications. Alternatively, a recently developed audio codec based on a sparse multi-scale MDCT transform has a good resolution both for time- and frequency-domain features. We show that this new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification. We compare results obtained with this new audio codec and the two standard MP3 and AAC codecs, in terms of performance and computation time.
37 citations
••
TL;DR: Two-stage noise feedback coding (TSNFC) as discussed by the authors combines two predictors into a single composite predictor and derives appropriate filters for use in a conventional single-stage NFC codec structure.
Abstract: Codec structures for achieving two-stage prediction and two-stage noise spectral shaping at the same time, resulting in a Two-Stage Noise Feedback Coding (TSNFC) method. One approach combines two predictors into a single composite predictor; and derives appropriate filters for use in a conventional single-stage NFC codec structure. Another approach duplicates a conventional single-stage NFC codec structure in a nested manner, thereby decoupling the operations of the long-term prediction and long-term noise spectral shaping from the operations of the short-term prediction and short-term noise spectral shaping.
37 citations
••
TL;DR: This article is an overview of the standardization, architecture, and performance of the new ITU-T Recommendation G.718, an embedded variable bit rate codec providing a scalable solution for compression of 8 and 16 kHz sampled speech and audio signals at rates between 8 kb/s and 32kb/s.
Abstract: This article is an overview of the standardization, architecture, and performance of the new ITU-T Recommendation G.718. G.718 is an embedded variable bit rate codec providing a scalable solution for compression of 8 and 16 kHz sampled speech and audio signals at rates between 8 kb/s and 32 kb/s. It comprises five layers where higher-layer bitstreams can be discarded without affecting the lower layersiquest decoding. The codec also has an optional core layer interoperable with ITU-T G.722.2 (3GPP AMR-WB) at 12.65 kb/s. G.718 was designed to provide high speech quality at low bit rates and to be robust to significant rates of frame erasures or packet losses. It is also targeting good quality for generic audio at higher rates.
36 citations
••
07 May 2001TL;DR: A new technique for 16 kHz wideband speech and audio coding, whereby analysis and synthesis are performed using a linear phase gammatone filter bank, based upon well-known models of the auditory system, is highly scalable, and has moderate complexity.
Abstract: Considerable research attention has been directed towards speech and audio coding algorithms capable of producing high quality coded speech and audio, however few of these use signal representations which account for temporal as well as spectral detail. This paper presents a new technique for 16 kHz wideband speech and audio coding, whereby analysis and synthesis are performed using a linear phase gammatone filter bank. The outputs of these critical band filters are processed to obtain a series of pulse trains that represent neural firing. Auditory masking is then applied to reduce the number of pulses, producing a more compact time-frequency parameterization. The critical band gains and pulse amplitudes and positions are then coded using a combination of non-uniform quantization, arithmetic coding and vector quantization. This coding paradigm produces high quality coded speech and audio, is based upon well-known models of the auditory system, is highly scalable, and has moderate complexity.
36 citations