scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 1998"


Patent
09 Jun 1998
TL;DR: In this article, the authors proposed a new method and apparatus for the enhancement of source coding systems, which employs bandwidth reduction (101) prior to or in the encoder, followed by spectral-band replication (105) at the decoder.
Abstract: The present invention proposes a new method and apparatus for the enhancement of source coding systems. The invention employs bandwidth reduction (101) prior to or in the encoder (103), followed by spectral-band replication (105) at the decoder (107). This is accomplished by the use of new transposition methods, in combination with spectral envelope adjustments. Reduced bitrate at a given perceptual quality or an improved perceptual quality at a given bitrate is offered. The invention is preferably integrated in a hardware or software codec, but can also be implemented as a separate processor in combination with a codec. The invention offers substantial improvements practically independent of codec type and technological progress.

488 citations


Journal ArticleDOI
TL;DR: This work chronicles the development of rate-distortion theory and provides an overview of its influence on the practice of lossy source coding.
Abstract: Lossy coding of speech, high-quality audio, still images, and video is commonplace today. However, in 1948, few lossy compression systems were in service. Shannon introduced and developed the theory of source coding with a fidelity criterion, also called rate-distortion theory. For the first 25 years of its existence, rate-distortion theory had relatively little impact on the methods and systems actually used to compress real sources. Today, however, rate-distortion theoretic concepts are an important component of many lossy compression techniques and standards. We chronicle the development of rate-distortion theory and provide an overview of its influence on the practice of lossy source coding.

213 citations


Patent
TL;DR: In this paper, an audio signal is decomposed into lower and upper sub-band and at least the noise component of the upper subband is encoded at the decoder by a decoding means which utilises a synthesised noise excitation signal and a filter to reproduce the noise components in the lower subband.
Abstract: An audio signal is decomposed into lower and upper sub-band and at least the noise component of the upper sub-band is encoded. At the decoder the audio signal is synthesised by a decoding means which utilises a synthesised noise excitation signal and a filter to reproduce the noise component in the upper sub-band.

160 citations


Proceedings ArticleDOI
S.A. Ramprashad1
12 May 1998
TL;DR: A two stage hybrid embedded speech/audio coding structure uses a speech coder as a core to provide the minimal bitrate and an acceptable performance on speech inputs and a transform coder using a modified discrete cosine transform and perceptual coding principles is proposed.
Abstract: A two stage hybrid embedded speech/audio coding structure is proposed. The structure uses a speech coder as a core to provide the minimal bitrate and an acceptable performance on speech inputs. The second stage is a transform coder using a modified discrete cosine transform (MDCT) and perceptual coding principles. This stage is itself embedded both in complexity and bitrate, and provides various levels of enhancement of the core output, particularly for general audio signals like music. Informal A-B comparison tests show that the performance of the structure at 16 kb/s is between that of the GSM enhanced full rate coder at 12.2 kb/s, and the G.728 LD-CELP coder at 16 kb/s.

69 citations


PatentDOI
TL;DR: An audio coder/decoder that is suitable for real-time applications due to reduced computational complexity, and a novel adaptive sparse vector quantization (ASVQ) scheme and algorithms for general purpose data quantization, which provides low bit-rate compression for music and speech, while being applicable to higher bit- rate audio compression.
Abstract: An audio coder/decoder ("codec") that is suitable for real-time applications due to reduced computational complexity, and a novel adaptive sparse vector quantization (ASVQ) scheme and algorithms for general purpose data quantization. The codec provides low bit-rate compression for music and speech, while being applicable to higher bit-rate audio compression. The codec includes an in-path implementation of psychoacoustic spectral masking, and frequency domain quantization using the novel ASVQ scheme and algorithms specific to audio compression. More particularly, the inventive audio codec employs frequency domain quantization with critically sampled subband filter banks to maintain time domain continuity across frame boundaries. The input audio signal is transformed into the frequency domain in which in-path spectral masking can be directly applied. This in-path spectral masking usually results in sparse vectors. The ASVQ scheme is a vector quantization algorithm that is particularly effective for quantizing sparse signal vectors. In the preferred embodiment, ASVQ adaptively classifies signal vectors into six different types of sparse vector quantization, and performs quantization accordingly. The ASVQ technique applies to general purpose data quantization as well as to quantization in the context of audio compression. The invention also includes a "soft clipping" algorithm in the decoder as a post-processing stage. The soft clipping algorithm preserves the waveform shapes of the reconstructed time domain audio signal in a frame- or block-oriented stateless manner while maintaining continuity across frame or block boundaries. The invention includes related methods, apparatus, and computer programs.

68 citations


Patent
25 Sep 1998
TL;DR: In this article, an Internet telephony gateway and a method for operating a gateway are disclosed, where the gateway is designed with a port to support a predefined maximum number of audio data channels, and the gateway contains sufficient processing throughput to operate a first, high quality audio codec on a subset of the channels.
Abstract: An Internet telephony gateway and method for operating a gateway are disclosed. The gateway is designed with a port to support a predefined maximum number of audio data channels. The gateway contains sufficient processing throughput to operate a first, high quality audio codec on a subset of the channels. However, this throughput is sufficient to operate a second, lower quality audio codec on a greater number of the channels, preferably all of them. The first and second codecs are designed to produce compressed audio data streams that are interoperably decompressable. In operation, the gateway host processor assigns new calls to either the first or second codec, depending on the current traffic being handled by the gateway. If new calls would result in the gateway's processing throughput being exceeded, the host processor may reassign a channel from the first codec to the second codec in order to create processing headroom for the addition of a new channel. Because the codecs are interoperably decompressable, no renegotiation need occur with the far end of the communication channel when a codec is reassigned. This gateway offers the potential for high-quality communication over the maximum number of channels possible, with a natural degradation as the gateway reaches its full channel capacity, using modest processing resources.

62 citations


Proceedings Article
01 Jan 1998
TL;DR: It is observed that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.
Abstract: Speech coding affects speech recognition performance, with recognition accuracy deteriorating as the coded bit rate decreases. Virtually all systems that recognize coded speech reconstruct the speech waveform from the coded parameters, and then perform recognition (after possible noise and/or channel compensation) using conventional techniques. In this paper we compare the recognition accuracy of coded speech obtained by reconstructing the speech waveform with the speech recognition accuracy obtained when using cepstral features derived from the coding parameters. We focus our efforts on speech that has been coded using the 13-kbps full-rate GSM codec, a Regular Pulse Excited Long Term Prediction (RPE-LTP) codec. The GSM codec develops separate representations for the linear prediction (LPC) filter and the residual signal components of the coded speech. We measure the effects of quantization and coding on the accuracy with which these parameters are represented, and present two different methods for recombining them for speech recognition purposes. We observe that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.

56 citations


Proceedings ArticleDOI
A. Uvliden1, S. Bruhn, R. Hagen
01 Nov 1998
TL;DR: This work reviews the general AMR system concept and discusses the capacity and quality benefits that can be achieved and an example solution for GSM is described including speech coding, channel coding, inband signaling, and the adaptation scheme.
Abstract: Adaptive multi-rate (AMR) is an emerging speech service currently being standardized in the ETSI for the GSM system. The new AMR standard will be flexible by adapting the error protection level and the allocated radio resources. A trade-off between speech quality and system capacity can be achieved for a variety of radio channel and operating conditions. The adaptation of the protection level will be fast and speech service specific. Besides the basic source and channel codec for speech signal payload, the AMR system concept further includes channel state tracking and inband transmission of adaptation data. We review the general AMR system concept and discuss the capacity and quality benefits that can be achieved. An example solution for GSM is described including speech coding, channel coding, inband signaling, and the adaptation scheme.

33 citations


Proceedings ArticleDOI
12 May 1998
TL;DR: This paper describes a wideband (7 kHz) speech compression scheme operating at a bit rate of 13.0 kbit/s, i.e. 0.8 bit per sample, using a split-band technique, where the 0-6 kHz band is critically subsampled and coded by an ACELP approach.
Abstract: This paper describes a wideband (7 kHz) speech compression scheme operating at a bit rate of 13.0 kbit/s, i.e. 0.8 bit per sample. We apply a split-band (SB) technique, where the 0-6 kHz band is critically subsampled and coded by an ACELP approach. The high frequency signal components (6-7 kHz) are generated by an improved high-frequency-resynthesis (HFR) at the decoder such that no additional information has to be transmitted. In informal listening tests, the subjective speech quality was rated to be comparable to the CCITT G.722 wideband codec at 48 kbit/s.

29 citations



PatentDOI
TL;DR: In this paper, the authors present a method for providing full-duplex audio communication utilizing a half duplex audio circuit in an audio communication system, which comprises the steps of configuring an idle state, a listen state, and a talk state.
Abstract: The present invention discloses a method for providing full-duplex audio communication utilizing a half-duplex audio circuit in an audio communication system. The method comprises the steps of: (1) configuring an idle state, a listen state, and a talk state; (2) receiving an event triggered by one of an incoming speech, an outgoing speech, and a talk request from the half-duplex audio circuit; and (3) transitioning from one of the states to any one of the states in response to the event to provide full duplex communication.

Proceedings ArticleDOI
12 May 1998
TL;DR: T-tests show that the proposed speech codec based on the multi-pulse based CELP coding and convolutional coding algorithms for the ETSI adaptive multi-rate (AMR) standard meets about 80% of the seventeen requirements, which are selected from the AMR standard study report.
Abstract: This paper proposes a speech codec based on the multi-pulse based CELP (MP-CELP) coding and convolutional coding algorithms for the ETSI adaptive multi-rate (AMR) standard The codec operates at several speech coding rates, maintaining a fixed gross rate including speech and channel coding for the full-rate (FR) and half-rate (HR) channel modes MP-CELP has great features of easily changing the speech coding rate by controlling the parameters such as the number of pulses and other parameters Subjective tests show that the proposed AMR codec in the FR channel mode achieves higher performance than that of the enhanced FR codec, and the proposed codec in the HR channel mode gives a comparable coding quality to that by the full-rate codec, by selecting an optimal coding rate for each channel condition T-tests based on the test results also show that the proposed speech codec meets about 80% of the seventeen requirements, which are selected from the AMR standard study report Therefore, the proposed codec is promising for the AMR standard

Proceedings ArticleDOI
12 May 1998
TL;DR: A multi-rate codec family developed as a potential candidate for the GSM adaptivemulti-rate (AMR) codec standard, which consists of the G SM enhanced full rate (EFR) codec and lower bit-rate extensions thereof.
Abstract: This paper describes a multi-rate codec family developed as a potential candidate for the GSM adaptive multi-rate (AMR) codec standard. The codec family consists of the GSM enhanced full rate (EFR) codec and lower bit-rate extensions thereof. The codec family consists of several codecs, i.e., modes that have different bit-rate partitionings between source coding and error protection. All the source codecs use the same ACELP-method (algebraic code excited linear predictive coding) used also in the GSM EFR codec. The codec operates at gross bit-rates of 22.8 kbit/s in the GSM full rate (FR) channel and 11.4 kbit/s in the GSM half rate (HR) channel. In the full rate channel, the codec provides improved error robustness over the GSM enhanced full rate (EFR) codec. It extends wireline quality (equal to or better than G.726-32 ADPCM) to poor channel error conditions with low C/I-ratios of 7 dB or even below. When operated in the half rate channel, the codec provides improved channel capacity while still providing wireline quality at high C/I-ratios above 16-19 dB.

Patent
21 May 1998
TL;DR: In this paper, an improved telecommunication system is capable of supporting an enhanced audio transmission mode and a conventional PCM waveform encoding mode, while the PCM mode is governed by a PCM protocol such as μ-law encoding.
Abstract: An improved telecommunication system is capable of supporting an enhanced audio transmission mode and a conventional PCM waveform encoding mode. The enhanced audio transmission mode is governed by an audio coding protocol, while the PCM mode is governed by a PCM protocol such as μ-law encoding. The telecommunication system performs an in-band signaling routine during a first communication session in accordance with the PCM protocol. The in-band signaling routine employs a form of robbed bit signaling to transmit information between the calling codec and the called codec. The signaling information is utilized to determine whether the called codec is compatible with the enhanced audio coding mode and, as necessary, to initiate the transition between the PCM mode and the audio coding mode. The audio coding mode transmits signals using a wider bandwidth than that used during the PCM mode. The use of a wider bandwidth results in a higher quality sound that better resembles person-to-person speech.

Proceedings ArticleDOI
08 Sep 1998
TL;DR: The aim is to show the gain provided by an AMR system compared with an existing GSM system using second generation EFR and HR (half rate) coders, and show that there is a trade-off between capacity increase and speech quality degradation.
Abstract: The AMR (adaptive multi-rate) is an emerging speech codec cellular standard in the ETSI. This standard should be ready during as a speech GSM evolution. It is a new concept for achieving a high speech quality maintaining an efficient spectrum usage. According to the channel quality and the traffic load, the radio resource algorithm allocates a half-rate or a full-rate channel in order to obtain the best balance between quality and capacity. Within this channel, the codec is quickly adapted to track changes in the radio link. An AMR system model has been developed to show the impact on speech quality by varying the capacity from only full-rate channels to only half-rate channels. The aim is also to show the gain provided by an AMR system compared with an existing GSM system using second generation EFR (enhanced full rate) and HR (half rate) coders. The results show that there is a trade-off between capacity increase and speech quality degradation. It is also very clear that there is a potential gain in quality by using AMR compared to existing speech codecs in GSM systems.

Proceedings ArticleDOI
Schuyler Quackenbush1
12 May 1998
TL;DR: This paper presents an overview of the MPEG-4 natural audio coding framework and each of its component coding techniques.
Abstract: MPEG-4 standardizes natural audio coding at bit rates ranging from 2 kbit/s, suitable for intelligible speech coding, to 64 kbit/s per channel, suitable for high-quality audio coding. Within this range, three categories of coding are defined: parametric coding, code excited linear predictive coding (CELP) and time/frequency (T/F) coding. The unique contribution of MPEG-4 audio is that not only does it scale across a wide range of bit rates, but it also scales across a broad set of other parameters, such as sampling rate, bandwidth, voice pitch and complexity. This paper presents an overview of the MPEG-4 natural audio coding framework and each of its component coding techniques.

Proceedings ArticleDOI
18 Nov 1998
TL;DR: The Robust Audio Tool is discussed, methods of real-time multimedia delivery, and issues of particular importance for music transmission over the Internet are identified, and the Internet performance is illustrated in terms of packet loss, and variable transit delays.
Abstract: The Robust Audio Tool (RAT) allows users to achieve real-time multiway communication over the Internet. It was initially intended for use in multiway conferences, but is being used as an Internet audio broadcast application, by radio stations in the US and elsewhere. RAT can also be used in a point-to-point manner, and as a transcoder between networks of differing capabilities, e.g. for mobile access to the Internet. The emphasis of work in RAT has been on maximising the audio quality despite inherent problems of packet transport, processor scheduling and audio capabilities of the end system. The important features of RAT, in comparison to other Internet audio tools, is that it is able to support multirate processing, has no restrictions on audio frame duration, and supports multi-channel audio, and both fixed and variable size audio frames. We discuss methods of real-time multimedia delivery, and identify issues of particular importance for music transmission over the Internet. For music coding researchers interested in using RAT to exploit their research, we present an overview of the architecture of the RAT and specifically focus on codec integration. Finally, we present some off-line performance measurements of a public domain MPEG1 music codec that has been integrated into the RAT, and illustrate the Internet performance in terms of packet loss, and variable transit delays.

Proceedings ArticleDOI
Y. Naito1, I. Kuroda
12 May 1998
TL;DR: Fast algorithms, such as a fast motion estimation algorithm and a low complexity noise reduction filter, are proposed to implement the video codec on a single DSP chip maintaining sufficient picture quality by using a 50 MIPS, 100 mW DSP.
Abstract: This paper describes an H.263 video codec implementation based on a low power consumption general purpose DSP. Fast algorithms, such as a fast motion estimation algorithm and a low complexity noise reduction filter, are proposed to implement the video codec on a single DSP chip maintaining sufficient picture quality. By using a 50 MIPS, 100 mW DSP, the developed codec encodes and decodes 7.5 QCIF frames per second, which is sufficient performance for low bit-rate video compression, typically below 64 kbps.

Patent
19 Oct 1998
TL;DR: In this article, a network-based CODEC (coder-decoder) includes an echo canceler, which detects the presence of a data call by detecting predefined signaling portions of a modem handshaking process and uses a stored channel model for performing echo cancellation during the data call.
Abstract: A network-based CODEC (coder-decoder) includes an echo canceler. This CODEC recognizes the presence of a data call by detecting predefined signaling portions of a modem handshaking process. For each detected data call, the CODEC uses a stored channel model for performing echo cancellation during the data call. The CODEC trains off-line during selected segments of the modem call and then stores the new channel model for use in a future data call.

Patent
18 Dec 1998
TL;DR: An encoder for compressing video data to allow for its transmission over a narrow bandwidth is described in this paper, where the encoder comprises a multiformat video codec for real-time compression digital data and a dynamic random access memory which operates as a temporary storage device storing compressed data while the codec is compressing data.
Abstract: An encoder for compressing video data to allow for its transmission over a narrow bandwidth. The encoder comprises a multiformat video codec for real-time compression digital data and a dynamic random access memory which operates as a temporary storage device storing compressed data while the codec is compressing data. A digital signal processor adjust the data compression ratio for the codec while the codec is compressing video data. An EPROM, which is connected to the digital signal processor contains the software to run the digital signal processor. A programmable gate array operates as an interface between the codec and an external processor. The array includes a read write controller which provides a read signal to the codec to allow compressed video data to be read from the codec to a parallel to serial shift register within the array. The write control signals which allow data to be written into and shifted through the register are also generated by the read write controller. The array includes a FIFO flush data controller which is used to flush data from a FIFO within the codec whenever the codec supplies a service request signal to the programmable gate array. The service request signal is provided to the array whenever an overflow condition is about to occur within the FIFO of the codec.

Proceedings ArticleDOI
R.C.F. Tucker1
18 Nov 1998
TL;DR: This work proposes encoding just the noise component of the upper frequency band of the original signal using about 500 bits/sec, which greatly enhances contemporary music and close-microphone speech, but has little effect on classical music.
Abstract: There are now a number of applications, most notably streamed Internet audio, which require audio and speech to be encoded at a low bit rate, typically 16 kbit/sec or below. To achieve an acceptable quality, the original signal is normally low-pass filtered to somewhere between 4 and 5.5 kHz before encoding. Rather than discard the upper frequency band completely, we propose encoding just the noise component of it using about 500 bits/sec. This greatly enhances contemporary music and close-microphone speech, but has little effect on classical music. The process can be used to enhance any audio or speech codec, knowing only its encoding/decoding delay.

Proceedings Article
01 Sep 1998
TL;DR: An extremely low delay perceptual audio codec is presented based on warped linear prediction which inherently utilizes auditory frequency resolution and frequency masking characteristics of hearing using backward adaptive lattice methods.
Abstract: In this paper an extremely low delay perceptual audio codec is presented. The codec is based on warped linear prediction which inherently utilizes auditory frequency resolution and frequency masking characteristics of hearing. In the current version of the codec the coding delay is the minimum. This is achieved using backward adaptive lattice methods where waveform modeling is completely based on already transmitted data. Coding technique is applied separately to the two channels but the quantization processes are unified to gain more bit rate reduction.

Journal ArticleDOI
TL;DR: Two fast algorithms of the time-frequency transform, one for memory economization and the other is for time domain subsampling, are presented and the current performance status of the AC-3 decoder is state.
Abstract: AC-3 audio coding technology is a kind of perceptual audio coder (PAC) developed by the Dolby Company. Up to 5 full-bandwidth channels and one subwoofer channel (cutoff at 120 Hz) are available in AC-3 to provide multi-channel, low bit rate, and high perceptual quality of audio. This explains why AC-3 has become the audio standard of many international standards. In this paper, we focus on the real-time software implementation issues of AC-3. Two fast algorithms of the time-frequency transform, one for memory economization and the other is for time domain subsampling, are presented. Meanwhile, we state the current performance status of our AC-3 decoder.

Proceedings ArticleDOI
30 Mar 1998
TL;DR: A simple, lossless audio codec, called AudioPaK, which uses only a small number of integer arithmetic operations on both the coder and the decoder side, and performs as well, or even better than most losslessaudio codecs.
Abstract: We designed a simple, lossless audio codec, called AudioPaK, which uses only a small number of integer arithmetic operations on both the coder and the decoder side. The main operations of this codec are polynomial prediction and Golomb-Rice coding, and are done on a frame basis. Our coder performs as well, or even better than most lossless audio codecs.


Proceedings ArticleDOI
Wen Xu1
08 Nov 1998
TL;DR: The basic idea is to convert the residual redundancy of the source encoded parameters into the bit redundancy such that it can be more efficiently utilized in the channel decoding and the resulting parameters are less vulnerable to digital errors.
Abstract: The optimal binary mappings for converting the signal redundancy of the zero-th order (nonuniformity) and the first order (correlation) into individual bits are described. By employing a mapping matched to the residual redundancy inherent in the source-encoded parameters further gains can be obtained in the joint source-channel coding. The basic idea is to convert the residual redundancy of the source encoded parameters into the bit redundancy such that it can be more efficiently utilized in the channel decoding and the resulting parameters are less vulnerable to digital errors. The approach is successfully applied to the GSM full rate (FR) codec to achieve a more reliable transmission of speech signals.

Dissertation
01 Jan 1998
TL;DR: The MPEG standard is improved and enhanced by introducing new algorithmic and architectural enhancements while staying compliant with the standard, and a new, real-time lossless audio codec is designed and implemented, which is optimized for Internet transmission because of its low instruction complexity and good compression performance.
Abstract: The focus of this thesis is the development of novel and practical algorithms to encode, transmit, and decode in real-time digital compact disc quality audio over the Internet. More precisely, we improve and enhance the widespread accepted international MPEG audio standard, which defines a lossy compression codec, and we innovate in the area of lossless audio coding, which we believe is likely to play an important part in audio transmission over the Internet in conjunction with the lossy technologies. We enhance the MPEG standard by introducing new algorithmic and architectural enhancements while staying compliant with the standard. Also, we design a decoding process so that it can adapt to varying computational characteristics. Finally, we transcode the compressed bit stream for the streaming over packet networks to several users through paths with heterogeneous characteristics. In the area of lossless audio compression, we survey and classify state-of-the-art lossless audio codecs, and we design and implement a new, real-time lossless audio codec (AudioPaK), which is optimized for Internet transmission because of its low instruction complexity and good compression performance.

Proceedings ArticleDOI
10 Feb 1998
TL;DR: A VLSI implementation of the H.324 audiovisual codec is described, using 0.35 /spl mu/m CMOS 4LM technology, which contains totally 420 K transistors with the dissipation of 224.32 mW from single 3.3 V supply.
Abstract: A VLSI implementation of the H.324 audiovisual codec is described. A number of sophisticated low-power architectures have been devised dedicatedly for the mobile use. A set of specific functional units, each corresponding to a process of H.263 video codec, is employed to lighten different performance bottlenecks. A compact DSP core composed of two MAC units is used for both ACELP and MP-MLQ coding schemes of the G.723.1 speech codec. The proposed audiovisual codec core has been implemented by using 0.35 /spl mu/m CMOS 4LM technology, which contains totally 420 K transistors with the dissipation of 224.32 mW from single 3.3 V supply.


Patent
23 Nov 1998
TL;DR: In this article, an interpolation digital filter for an audio CODEC system with a clock signal of 256 FS was proposed. But the performance of the filter was not as good as the one proposed in this paper.
Abstract: An interpolation digital filter for an audio CODEC uses a bit serial method for an audio CODEC system with a clock signal of 256 FS. The interpolation digital filter converts a 32-bits data signal of sampling frequency of 1 FS to a 32-bit data signal of the sampling frequency of 8 FS using a clock signal of 256 FS in a filter unit. Therefore, the present invention reduces the size of the system and reduces the cost.