scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2004"


01 Dec 2004
TL;DR: This memo defines an Experimental Protocol for the Internet community that enables graceful speech quality degradation in the case of lost frames, which occurs in connection with lost or delayed IP packets.
Abstract: This document specifies a speech codec suitable for robust voice communication over IP. The codec is developed by Global IP Sound (GIPS). It is designed for narrow band speech and results in a payload bit rate of 13.33 kbit/s for 30 ms frames and 15.20 kbit/s for 20 ms frames. The codec enables graceful speech quality degradation in the case of lost frames, which occurs in connection with lost or delayed IP packets. This memo defines an Experimental Protocol for the Internet community.

104 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: The paper presents a novel scalable audio coding scheme where the bitrates vary continuously between a minimal and a maximal value, allowing free modification of the bitrate.
Abstract: Networks are getting more and more heterogeneous. Scalable codecs are especially suited for such a context as they permit the bitrate to be lowered in a simple way, at any point of the transmission, for adaptation to network conditions and to terminal capacities. Classically, scalable codecs are organised in layers and scalability is obtained by sending more or fewer layers to the decoder. The obtained granularity depends on the layer sizes, and the available bitrates are fixed and limited in number. The paper presents a novel scalable audio coding scheme where the bitrates vary continuously between a minimal and a maximal value, allowing free modification of the bitrate. With this novel approach, all bitrates are valid, sending even one more bit results in different output signal with statistically growing quality. Test results show that this method provides quality as good as or even better than that of a non-scalable version.

60 citations


Proceedings ArticleDOI
20 Jun 2004
TL;DR: The results show that the new combined QoS control method achieved the best performance under different network congestion conditions compared to separate adaptive sender rate or packet priority marking method and the use of an objective MOS as the control parameter for the sender rate adaptation improves the overall perceived speech quality.
Abstract: Quality of service (QoS) control is an important issue in voice over IP (VoIP) applications because of the need to meet technical and commercial requirements. The main objective of this paper is to propose a new QoS control scheme that combines the strengths of adaptive rate and speech priority marking QoS control techniques to provide a superior QoS control performance, in terms of perceived speech quality. A second objective is to propose the use of an objective measure of perceived speech quality (i.e. objective MOS score) for adaptive control of sender behaviour as this provides a direct link to user-perceived speech quality, unlike individual network impairment parameters (e.g. packet loss and/or delay). Our results show that the new combined QoS control method achieved the best performance under different network congestion conditions compared to separate adaptive sender rate or packet priority marking method. Our results also show that the use of an objective MOS as the control parameter for the sender rate adaptation improves the overall perceived speech quality. The results reported here are based on a simulation platform that integrates DiffServ enabled NS-2 network simulator, a real speech codec (AMR codec) and the ITU-T standard speech quality evaluation tool (PESQ).

50 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper describes a real time prototype implementation of a system, which enables secure voice and data communication over the GSM voice channel, and a proprietary speech codec is used on the real time data channel to produce communication quality speech.
Abstract: This paper describes a real time prototype implementation of a system, which enables secure voice and data communication over the GSM voice channel. The security of GSM is not guaranteed especially over the core network. The proposed system modulates digital data, which may be encrypted onto speech-like waveforms. The modulated waveform is then transmitted over the GSM voice channel, which can be demodulated and decrypted at the receiver. The real time prototype system has been tested on GSM-to-GSM voice calls, and a proprietary speech codec is used on the real time data channel to produce communication quality speech. A demonstration will be provided at the presentation. (4 pages)

49 citations


Journal ArticleDOI
TL;DR: A region-based video codec is presented, which is compatible with the H.263+ standard, and its associated rate control algorithm for low variable-bit-rate (VBR) video, which incorporates traditional block DCT coding as well as object-based coding.
Abstract: This paper presents a region-based video codec, which is compatible with the H.263+ standard, and its associated rate control algorithm for low variable-bit-rate (VBR) video. The proposed region-based coding scheme is a hybrid method that incorporates traditional block DCT coding as well as object-based coding. To achieve this, we adopt H.263+ as the platform, and develop a fast macroblock-based segmentation method to implement the new region-based codec. The associated rate control solution includes rate control in three levels: encoding frame selection, frame-layer rate control and macroblock-layer rate control. The goal is to improve human visual perceptual quality at low bit rates. The efficiency of the proposed rate control algorithm applied to the region-based H.263+ codec is demonstrated via several typical test sequences.

47 citations


01 Jan 2004
TL;DR: This paper shows the efficacy of enhancing a standards-compliant base layer codec (H.264/AVC) with a PRISM refinement bit stream, resulting in an overall robust spatio-temporal scalable video codec that is backward-compatible with the existing standards-based baseline codec.
Abstract: A video coding paradigm called PRISM (Power-efficient, Robust, hIgh compression, Syndrome-based Multimedia coding) built on distributed source coding principles has been recently proposed in [1] In this paper, we study a scalable version of PRISM which addresses both spatial and temporal scalability The proposed codec inherits the main attributes of the PRISM architecture, namely increased robustness to channel losses and more flexible sharing of computational complexity between encoder and decoder, while adding scalability as an additional feature In this paper, we show the efficacy of enhancing a standards-compliant base layer codec (H264/AVC) with a PRISM refinement bit stream, resulting in an overall robust spatio-temporal scalable video codec that is backward-compatible with the existing standards-based baseline codec

33 citations


Journal Article
TL;DR: This paper describes the delay sources and magnitude of the most common audio codecs and thus provides a guideline for the choice of themost suitable codec for a given application.
Abstract: Digital audio processing has been revolutionized by perceptual audio coding in the past decade. The main parameter to benchmark different codecs is the audio quality at a certain bit-rate. For many applications, however, delay is another key parameter which varies between only a few and hundreds of milliseconds depending on the algorithmic properties of the codec. Latest research results in low delay audio coding can significantly improve the performance of applications such as communications, digital microphones, and wireless loudspeakers with lip synchronicity to a video signal. This paper describes the delay sources and magnitude of the most common audio codecs and thus provides a guideline for the choice of the most suitable codec for a given application.

32 citations


Patent
Jari Mäkinen1, Janne Vainio1
03 Aug 2004
TL;DR: In this article, a method for multi-rate encoding in a communication system is presented, which comprises the step of providing a codec with sets of tuning parameters for use in selection of codec modes.
Abstract: According to an embodiment of the invention there is provided a method for multi-rate encoding in a communication system. The method comprises the step of providing a codec with sets of tuning parameters for use in selection of codec modes. Each set of tuning parameters provides an average bit rate. A bit rate target is received for encoding a signal by the codec, the bit rate target having any value between the minimum and maximum average bit rate of the codec. An encoding mode is then selected based on the bit rate target and the sets of tuning parameters, and the signal is encoded by means of the selected encoding mode. A multi-rate codec comprising a selector for selecting an encoding mode from a set of encoding modes based on a bit rate target is also provided.

30 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: The paper describes the basic elements of the codec, points out envisaged applications, and gives an outline of the standardization process.
Abstract: Lossless coding is to become the latest extension of the MPEG-4 audio standard. In response to a call for proposals, many companies have submitted lossless audio codecs for evaluation. The codec of the Technical University of Berlin was chosen as reference model for MPEG-4 audio lossless coding (ALS), attaining working draft status in July 2003. The encoder is based on linear prediction, which enables high compression even with moderate complexity, while the corresponding decoder is straightforward. The paper describes the basic elements of the codec, points out envisaged applications, and gives an outline of the standardization process.

29 citations


Patent
Frank Berreth1
04 Aug 2004
TL;DR: In this article, the authors describe a system and methods for interfacing with codec(s) on an architecture optimized for audio, where a device driver accesses an application programming interface (API), which facilitates communications between the device driver and one or more codecs via a controller coupled to the codecs.
Abstract: Systems and methods for interfacing with codec(s) on an architecture optimized for audio are described. In one aspect, a device driver accesses an application programming interface (API). The API facilitates communications between the device driver and one or more codec(s) via a controller coupled to the codec(s). The codec(s) and the controller are implemented in an environment that is substantially optimized for audio. Such communication includes, for example, registering for event(s), transferring data to or from the codec(s), obtaining information about the capability of the codec(s), and/or managing bus or codec resources.

26 citations


Patent
05 Jan 2004
TL;DR: In this article, a system and method for channel decoding speech frames in a receiver capable of multiple (M) codec modes, wherein channel encoded speech frames include an inband bit portion and a speech portion, was presented.
Abstract: Disclosed is a system and method for channel decoding speech frames in a receiver capable of multiple (M) codec modes, wherein channel encoded speech frames include an inband bit portion and a speech portion. An inband bit decoder decodes the inband bit portion (700) of a received frame to obtain confidence levels associated with each of the M codec modes. Using these confidence levels, the codec modes are ordered from most to least likely. The speech frame is then decoded by a channel decoder using the most likely codec mode (704). A frame determination check (720) is performed to determine the quality of the decoded speech frame. If the decoded speech frame is determined to be of poor quality, then the channel decoding process is repeated using the next most likely codec mode (736) corresponding to the next highest inband bit decoding confidence level. This process is repeated until a good speech frame is decoded or some exit criteria is reached.

Journal ArticleDOI
TL;DR: This paper focuses on a source coding and subjective mean opinion score methods are used to measure the speech perception based on score given by the listener to achieve low bit rate coding.
Abstract: This paper describes the objective of speech coding to achieve toll quality performance at minimum bit rate and to improve the efficiency of transmission and storage, reduce cost, increase security and robustness in transmission. Speech coding can be achieved based on two facts: redundancy in speech signals, and the perception properties of human ears. Performance evaluation of speech coding is obtained by speech quality, coding rate, algorithm complexity and delay. The two speech coding methods: waveform coding and parametric coding uses a synthetic model for speech analysis and reconstruction. This paper focuses on a source coding and subjective mean opinion score methods are used to measure the speech perception based on score given by the listener to achieve low bit rate coding.

Proceedings ArticleDOI
17 May 2004
TL;DR: The paper gives an overview of the source-controlled variable-rate multimode wideband (VMR-WB) codec and the methodologies that enable high quality wideband coding at average data rates ranging from TIA/EIA/IS-733 ADR (average data rate) to that of TIA/.
Abstract: The description and design of the source-controlled variable-rate multimode wideband (VMR-WB) codec recently selected by the 3/sup rd/ Generation Partnership Project 2 (3GPP2) for the cdma2000/spl reg/ system in Rate-Set II are presented. The paper gives an overview of the codec and the methodologies that enable high quality wideband coding at average data rates ranging from TIA/EIA/IS-733 ADR (average data rate) to that of TIA/EIA/IS-127. The codec has three modes of operation at different average data rates and a fourth mode that is interoperable with 3GPP/AMR-WB (ITU-T G.722.2). Despite the interoperability constraint, the codec is capable of meeting the aggressive performance requirements through the use of novel techniques such as noise suppression, efficient signal classification, new coding types optimized for stable voiced and unvoiced frames, novel post-processing technique for periodicity enhancement in the lower frequency band, and improved frame erasure concealment mechanisms.

Proceedings ArticleDOI
17 May 2004
TL;DR: For GSM mobile radio networks employing the adaptive multi-rate (AMR) speech codec, a new instrumental and non-intrusive speech quality metric is proposed and a novel AMR mode switching procedure is introduced which outperforms the recommended method of the GSM standard.
Abstract: In the competitive environment of cellular radio networks, the speech quality experienced by the end user is of vital importance for the network operator. Therefore, the operator needs reliable measures for monitoring and controlling the speech quality. For GSM mobile radio networks employing the adaptive multi-rate (AMR) speech codec, a new instrumental and non-intrusive speech quality metric is proposed. 'Non-intrusive' means that the metric is based only on received transmission parameters, such as the frame erasure rate, and needs neither the original nor the transmitted speech samples. However, for the validation of a new metric, standardized intrusive speech quality measures like PESQ, link-level simulations and field measurements are used. Furthermore, for an improved control of speech quality, a novel AMR mode switching procedure is introduced which outperforms the recommended method of the GSM standard.

Patent
06 Dec 2004
TL;DR: In this paper, a speech coding/decoding apparatus and method is provided, in which the input speech signals are classified into several classes in accordance with characteristics of the inputs, and the inputs are coded using frame sizes, quantizer structures, and bit assignment methods corresponding to the determined classes.
Abstract: There is provided a speech coding/decoding apparatus and method, in which the input speech signals are classified into several classes in accordance with characteristics of the input speech signals and the input speech signals are coded using frame sizes, quantizer structures, and bit assignment methods corresponding to the determined classes, or in which the frame sizes can be adjusted in accordance with network conditions or codec type of a counter part. Therefore, by optimally adjusting the frame size, the quantizer structure, and the bit assignment method in accordance with the characteristics of input speech, it is possible to improve the performance of the speech coding apparatus, and by adjusting the frame size in accordance with the speech codec type of a counter part, it is also possible to reduce the total end-to-end delay.

Patent
14 Oct 2004
TL;DR: In this paper, an apparatus for variable bitrate wideband speech and audio coding is described, which utilizes a speech-and audio divider for dividing signals inputted to a CODEC into speech or audio signals.
Abstract: An apparatus for coding of variable bitrate wideband speech and audio is described. The apparatus utilizes: a) a speech and audio divider for dividing signals inputted to a CODEC into speech or audio signals; b) a narrowband coder for performing narrowband coding, in the case the divided input signals are speech signals; c) a bitrate modifier for modifying a bitrate for coding of low frequency band and a bitrate for coding of a high frequency band, in the case the divided input signals are audio signals; and d) a wideband coder for performing coding by the modified bitrate in the bitrate modifier.

Patent
02 Dec 2004
TL;DR: In this paper, a communication system in which a profile of codecs is loaded at the originating end is presented, and packets are coded and decoded using only the codec identified by a bit map that is available to the DSP channel at both the originating and the terminating end.
Abstract: A communication system in which a profile of codecs is loaded at the originating end. The originating end sends the profile of codecs to the terminating end and receives in return an indication of which codec to use. The originating end and the terminating end create a reduced profile that includes the codec identified by the terminating side and it's peers. The peers are codecs that use the same amount of resources as the selected codec or that use less resources than the selected codec. Even though the reduced profile includes more than one codec, packets are coded and decoded using only the codec identified by a bit map that is available to the DSP channel at both the originating end and the terminating end. At a later time a change can be made to a different codec in the reduced profile by changing the bitmap and without closing and re-opening the channel.

Patent
02 Jul 2004
TL;DR: In this paper, an encoder/decoder is configured to cause the coding/decoding algorithms used by the codec to dynamically adapt according to the available computational resources in response to actual complexity measurements performed at run-time.
Abstract: An encoder/decoder (codec) is configured to cause the coding/decoding algorithms used by the codec to dynamically adapt according to the available computational resources in response to actual complexity measurements performed at run-time. In a system in which multiple video codecs have to operate simultaneously in real-time, sharing the system's available resources, this invention provides a way to increase the number of simultaneous codecs that can co-exist. The principles of the invention are also applicable to an individual encoder and an individual decoder.

Proceedings ArticleDOI
17 May 2004
TL;DR: A new approach to reduce environmental background noise by modifying the codec parameters is discussed, which can be done as pre-processing before speech encoding or in the network by decoding the bitstream.
Abstract: The transmission of speech in mobile or packet networks requires the use of a speech codec. In order to improve the quality of speech in a noisy environment, a noise reduction algorithm is used. This noise reduction can either be done as pre-processing before speech encoding or in the network by decoding the bitstream, performing the speech enhancement in the time and/or frequency domain and re-encoding the speech. Both methods are computationally expensive. In this paper a new approach to reduce environmental background noise by modifying the codec parameters is discussed.

Proceedings ArticleDOI
J. Makinen1, J. Vainio1
05 Apr 2004
TL;DR: A source signal based rate adaptation algorithm for AMR codec in GSM system can be used to increase the system capacity and further increase the robustness of GSM AMR Codec.
Abstract: Adaptive multirate (AMR) codec was standardised for GSM in 1999. AMR offers substantial improvement over previous GSM speech codecs in error robustness by adapting speech and channel coding depending on channel conditions. However, current standard do not exploit the multirate capability of AMR codec in source signal based adaptation that would optimise the average bit-rate vs. quality trade-off. This paper presents a source signal based rate adaptation algorithm for AMR codec in GSM system. Together with fast power control, it can be used to increase the system capacity and further increase the robustness of GSM AMR codec.

Proceedings ArticleDOI
07 Nov 2004
TL;DR: A wideband (44.1 kHz sampling rate) audio and speech coder is presented that combines two different strategies, namely, parametric and waveform coding and is competitive, in terms of quality, to those of standardized coders tuned at a particular bit rate.
Abstract: In this paper we present a wideband (44.1 kHz sampling rate) audio and speech coder that combines two different strategies, namely, parametric and waveform coding. It is shown how this approach can be used to design a layered bit stream scalable coder offering a wide variety of decoding bit rates with little scalability loss. Moreover, the bit rates associated with the different layers are competitive, in terms of quality, to those of standardized coders (MP3, AAC) tuned at a particular bit rate.

Proceedings ArticleDOI
26 Sep 2004
TL;DR: Several optimization techniques are presented for efficient implementation of ITU G.729 standard (CS-ACELP, conjugate structure algebraic code excited linear prediction) of 8 Kbit/s bit rate on a real time digital signal processor (DSP), with the aim of overcoming the limitation of computational burden and also scaling this application for enhanced speed to process more channels.
Abstract: Spectral efficiency is the most important aspect in wireless communication systems and cellular mobile radio As speech transmissions are the most used form of communications in the personal communication systems, the low bit speech codecs play an important role in determining the system's spectral efficiency A toll quality low bit rate speech codec that was proposed to meet the personal communication system's requirement is the CS-ACELP speech codec The speech codec has high robustness to withstand high-bit error rates and performs well in tandeming conditions, hence leading to efficient bandwidth utilization and increased channel capacity In this paper several optimization techniques are presented for efficient implementation of ITU G729 standard (CS-ACELP, conjugate structure algebraic code excited linear prediction) of 8 Kbit/s bit rate on a real time digital signal processor (DSP), with the aim of overcoming the limitation of computational burden and also scaling this application for enhanced speed to process more channels These techniques are in general applicable to any speech codec and DSP processor platform

Patent
19 May 2004
TL;DR: In this article, two sets of channelization codes are defined, each adapted for use with a corresponding first spreading factor and second spreading factor, and one of the first and second AMR codec sets may be selected based on a comparison of a given number of available channels for a voice call to a threshold, so as to encode voice data for transmission over an air interface.
Abstract: In a method for improving capacity for voice users in a communication network and/or a method of avoiding channelization code starvation in the downlink when establishing a voice call with one or more users in the wireless network, two sets of channelization codes may be defined, each adapted for use with a corresponding first spreading factor and second spreading factor. One of the first and second AMR codec sets may be selected based on a comparison of a given number of available channelization codes for a voice call to a threshold, so as to encode voice data for transmission over an air interface.

Patent
12 Aug 2004
TL;DR: In this paper, a broadcast system dynamically selects the codec that performs the best for the audio in any given packet, and the packet produced by the encoder of the best-performing codec devices is selected to be broadcasted/transmitted.
Abstract: Several encoders at a broadcast system encode the same audio content. Packets from the resulting streams are immediately decoded and compared against the packets of the original audio stream. The broadcast system dynamically selects the codec that performs the best for the audio in any given packet. The packet produced by the encoder of the best-performing codec devices is selected to be broadcasted/transmitted.

Proceedings ArticleDOI
07 Nov 2004
TL;DR: This paper presents a new method for speech-codec detection providing reliable information about the perceived quality of a telephone link by specific decomposition of the short-time speech-spectrum into a harmonic and a noise component.
Abstract: This paper presents a new method for speech-codec detection providing reliable information about the perceived quality of a telephone link. Our specific decomposition of the short-time speech-spectrum into a harmonic and a noise component reveals codec-specific distortions of the speech signal. In conjunction with an appropriate classifier, we thus can identify five different codecs with an error rate of less than 8%.

Patent
Vivek Gupta1
30 Jun 2004
TL;DR: In this paper, a packet-based communication bus is used to communicate audio information with each other over a communications bus, and power is selectively controlled to selected ones of the audio codec node and the audio application nodes, based on whether or not a node is of use at a particular time.
Abstract: In an embodiment, an electronic device includes one or more audio codec nodes and one or more audio application nodes, which communicate audio information with each other over a communications bus. The communications bus is a packet-based bus, in an embodiment. Power is selectively controlled to selected ones of the audio codec node and the audio application nodes, based on whether or not a node is of use at a particular time.

Journal ArticleDOI
TL;DR: The research presented here investigates the application of the narrow-band adaptive multirate speech codec and the wide-band AMR (WB-AMR) codec, both originally designed for the 200 kHz GSM channel, in the TDMA (TIA/EIA-136) 30-kHz system.
Abstract: A new system enhancement method is proposed for the EIA/TIA-136 system offering both channel operational range extension and improved performance within the current operational range. The existing time-division multiple-access (TDMA) (136) speech codec, the IS-641 enhanced full rate vocoder, operates at a fixed bit rate and does not allow the reallocation of bits to channel error protection as channel conditions degrade. The research presented here investigates the application of the narrow-band adaptive multirate (NB-AMR) speech codec and the wide-band AMR (WB-AMR) codec, both originally designed for the 200 kHz GSM channel, in the TDMA (TIA/EIA-136) 30-kHz system. In particular, we investigate adaptively allocating bits between NB/WB speech coding and error control coding within the limited channel bandwidth. Four modes out of 17 have been carefully chosen for the new TDMA/AMR system. Switching between codec rates as channel conditions change produces range extension below a C/I of 15 dB while also improving performance in the existing operational range above 15 dB. We keep the time slot formats unchanged so that our method is completely compatible with existing 136 systems.

Proceedings ArticleDOI
25 Jul 2004
TL;DR: In this paper, a location-variable content-dependent watermark generating mechanism is proposed to detect where in the recording the content is being replaced, inserted, or deleted, and an attempt is done to store partial reconstruction data in the LSBs of excitation signals in the G.1 speech codec, so that the original speech content may be reconstructed after counterfeiting.
Abstract: The wide use of digital speech recorders becomes a serious matter when they are involved in assisting with court rulings. How to distinguish if a recorded content is valid or not becomes a life-or-death question. In light of this concern, least significant bits (LSB) of excitation signals would be used as fragile watermarks in the hybrid speech vocoder. In addition, a location-variable content-dependent watermark generating mechanism is proposed. Such location-variable content-based watermark would allow users to detect where in the recording the content is being replaced, inserted, or deleted. Lastly, an attempt is done to store partial reconstruction data in the LSBs of excitation signals in the G.723.1 speech codec, so that the original speech content may be reconstructed after counterfeited. The proposed system is demonstrated to be a reliable system, with test results showing that a recording with watermarks has a perceptual evaluation of speech quality (PESQ) value down 0.2, while the accuracy in detecting faked regions can be up to 97.45%.

Proceedings ArticleDOI
05 Oct 2004
TL;DR: The continuously variable slope delta (CVSD) speech coding method, which is widely used in Bluetooth applications, is presented, a nonlinear sampled data feedback system which accepts a band limited analog signal and encodes it into binary form for transmission through a digital channel.
Abstract: Speech coding is a procedure used to represent a digital speech signal by as few bits as possible, maintaining at the same time a reasonable level of speech quality Due to increasing demand for speech communications, speech coding technology has received a lot of research interest Speech coding is performed using numerous steps or operations specified as an algorithm This paper presents the continuously variable slope delta (CVSD) speech coding method which is widely used in Bluetooth applications It is a nonlinear sampled data feedback system which accepts a band limited analog signal and encodes it into binary form for transmission through a digital channel At the receiver, the binary signal is decoded into a close approximation of the original analog signal This coding technique is implemented in Matlab Simulink and the simulation results show that the coding has good noise immunity Further, shown in this work is the fact that damaged packets are discarded straight away without having a serious impact on the speech quality, making it more suitable for two way audio communications

Patent
Cheng-mean Liu1, Noel R. Cross1
14 Jun 2004
TL;DR: In this article, the authors describe a system and methods for parsing flexible audio codec topologies from an audio codec, which includes a list of processing node connections and configuration default data for respective processing nodes associated with the audio codec.
Abstract: Systems and methods for parsing flexible audio codec topologies are described. In one aspect, codec topology information is obtained from an audio codec. The codec topology information includes a list of processing node connections and configuration default data for respective processing nodes associated with the audio codec. The codec topology information is evaluated to identify valid audio codec topologies based on set of valid connection path criteria.