Topic
Adaptive Multi-Rate audio codec
About: Adaptive Multi-Rate audio codec is a research topic. Over the lifetime, 1467 publications have been published within this topic receiving 19736 citations. The topic is also known as: AMR & Adaptive Multi-Rate.
Papers published on a yearly basis
Papers
More filters
••
21 Oct 1996TL;DR: A speech code/decode algorithm which combines MBE and LPC speech model is proposed, which can operate at 2.4 kbps with much higher quality of synthesised speech than LPC-10e and less computation complexity than CELP, VSELP and so on.
Abstract: A speech code/decode algorithm which combines MBE and LPC speech model is proposed. In this model, the spectral envelope is represented using Linear Prediction Coefficients, which are coded using Line Spectrum Frequencies (LSFs). It can operate at 2.4 kbps with much higher quality of synthesised speech than LPC-10e and less computation complexity than CELP, VSELP and so on. Therefore it is particularly attractive for VLSI implementation.
••
12 May 2011TL;DR: This work focuses on optimizing C code by assembly in Basic Operators (BASOP) that shows the significant improvement of complexity and execution time of G.711.1 codec on ARM processor.
Abstract: ITU-T wideband speech codec G.711.1[1] extends and interoperates with widely-used narrowband G.711 codec. This results in an efficient deployment of G.711.1 codec over existing G.711-based VoIP to provide higher voice quality. However, processing capacity of commercial processors is very limited to implement wideband codec; therefore the codec needs to be optimized for real-time processing. This work focuses on optimizing C code by assembly in Basic Operators (BASOP) [2], that shows the significant improvement of complexity and execution time of G.711.1 codec on ARM processor.
••
04 Jun 2023TL;DR: In this article , an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48 kHz speech signals while operating at only 12 kbps and running with less than 6 ms (GPU)/10 ms (CPU) latency.
Abstract: A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e. the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e. encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal. In this work, we propose an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48 kHz speech signals while operating at only 12 kbps and running with less than 6 ms (GPU)/10 ms (CPU) latency. An efficient training paradigm is also demonstrated for developing such neural audio codecs for real-world scenarios. Both objective and subjective evaluations using the VCTK corpus are provided. To sum up, AudioDec is a well-developed plug-and-play benchmark for audio codec applications.
••
07 Jul 2022TL;DR: In this article , the authors present Neural End-2-End Speech Codec (NESC), a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps.
Abstract: Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed Dual-PathConvRNN (DPCRNN) layer, while the decoder architecture is based on our previous work Streamwise-StyleMelGAN. Our subjective listening tests on clean and noisy speech show that NESC is particularly robust to unseen conditions and signal perturbations.
••
16 Apr 1990TL;DR: A 4.8 kb/s voice codec based on a coding algorithm using pitch-synchronous DFT (discrete Fourier transform) is described, and adaptive postspectral emphasis was shown to be as effective as adaptive postfiltering for enhancing the output signal quality.
Abstract: A 48 kb/s voice codec based on a coding algorithm using pitch-synchronous DFT (discrete Fourier transform) is described This algorithm combines waveform repetition with pitch-synchronous DFT spectral coding in which the phase information and nonharmonic spectral components are ignored This implementation introduces adaptive postspectral emphasis for the enhancement of the reproduced speech quality The prototype codec uses three Fujitsu MB86232 digital signal processor (DSP) chips The reproduced speech signal quality was evaluated, and the performance of adaptive postspectral emphasis was compared with that of adaptive postfiltering The reproduced speech was found to be good Pitch-synchronous DFT thus expresses the power spectra well Adaptive postspectral emphasis was shown to be as effective as adaptive postfiltering for enhancing the output signal quality This does not add complexity to frequency-domain coding >