Showing papers on "Adaptive Multi-Rate audio codec published in 2010"

PDF

Open Access

Journal Article•DOI•

[...]

W. Bastiaan Kleijn¹, Peter Kroon¹, Dror Nahumi¹•Institutions (1)

07 Sep 2010-European Transactions on Telecommunications

TL;DR: This work presents a coder (RCELP) that uses a generalization of the analysis-by-synthesis paradigm, which relaxes the waveform-matching constraints without affecting speech quality.

...read moreread less

Abstract: At bit rates between 4 and 16 kbit/s, many state-of-the-art speech coding algorithms fall into the class of linear-prediction based analysis-by-synthesis (LPAS) speech coders. At the lower bit rates the waveform-matching, on which LPAS coders rely, constrains the speech quality. To overcome this drawback, we present a coder (RCELP) that uses a generalization of the analysis-by-synthesis paradigm. This generalization relaxes the waveform-matching constraints without affecting speech quality. We describe several implementations at bit rates between 4 and 6 kbit/s. MOS tests show that a 6 kbit/s RCELP has a quality similar or better than the 13 kbit/s GSM full-rate coder, and a 4.4 kbit/s RCELP has a speech quality significantly better than the 4.8 kbit/s FS1016 standard.

...read moreread less

90 citations

Journal Article•DOI•

A High-Quality Speech and Audio Codec With Less Than 10-ms Delay

[...]

Jean-Marc Valin, Timothy B. Terriberry, C. Montgomery, Gregory Maxwell¹•Institutions (1)

Juniper Networks¹

01 Jan 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This work proposes a codec that simultaneously addresses both high quality and low delay, with a delay of only 8.7 ms at 44.1 kHz, and uses gain-shape algebraic vector quantization in the frequency domain with time-domain pitch prediction.

...read moreread less

Abstract: With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantization in the frequency domain with time-domain pitch prediction. We demonstrate that the proposed codec operating at 48 kb/s and 64 kb/s out-performs both G.722.1C and MP3 and has quality comparable to AAC-LD, despite having less than one fourth of the algorithmic delay of these codecs.

...read moreread less

78 citations

Proceedings Article•

The VISNET II DVC codec: Architecture, tools and performance

[...]

Joao Ascenso¹, Catarina Brites¹, Frederic Dufaux², Anil Fernando³, Touradj Ebrahimi², Fernando Pereira¹, Stefano Tubaro⁴ - Show less +3 more•Institutions (4)

Instituto Superior Técnico¹, École Polytechnique Fédérale de Lausanne², University of Surrey³, Polytechnic University of Milan⁴

01 Aug 2010

TL;DR: This paper introduces the VISNET II DVC codec, which achieves very high RD performance thanks to the efficient combination of many state-of-the-art coding tools into a fully practical video codec.

...read moreread less

Abstract: This paper introduces the VISNET II DVC codec. This codec achieves very high RD performance thanks to the efficient combination of many state-of-the-art coding tools into a fully practical video codec. Experimental results show that the proposed DVC codec consistently outperforms H.264/AVC Intra. For sequences with coherent motion, it even surpasses H.264/AVC zero-motion. Finally, it is also always better than the DISCOVER DVC codec. Therefore, it is expected that the proposed high performing DVC codec will be used by other researchers in the field as a reference to benchmark their results.

...read moreread less

39 citations

Journal Article•DOI•

Audio Signal Representations for Indexing in the Transform Domain

[...]

Emmanuel Ravelli, Gael Richard¹, Laurent Daudet•Institutions (1)

Télécom ParisTech¹

01 Mar 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification and is compared with the two standard MP3 and AAC codecs in terms of performance and computation time.

...read moreread less

Abstract: Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compression format, without having to fully decode the signals. Here, we show that the representations used in standard transform-based audio codecs (e.g., MDCT for AAC, or hybrid PQF/MDCT for MP3) have a sufficient time resolution for some rhythmic features, but a poor frequency resolution, which prevents their use in tonality-related applications. Alternatively, a recently developed audio codec based on a sparse multi-scale MDCT transform has a good resolution both for time- and frequency-domain features. We show that this new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification. We compare results obtained with this new audio codec and the two standard MP3 and AAC codecs, in terms of performance and computation time.

...read moreread less

37 citations

Patent•

Encoding and decoding apparatuses for high quality multi-channel audio codec

[...]

Jeongil Seo¹, Jae-Hyoun Yoo¹, Kyeongok Kang¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

23 Jun 2010

TL;DR: In this paper, an encoding apparatus for a High Quality Multi-channel Audio Codec (HQMAC) and a decoding apparatus for the HQMAC are provided, provided that the encoding/decoding apparatus can perform a HQMCA-CB encoding or an HQMAC-CB decoding in accordance with characteristics of input signals to provide compatibility with a lower channel.

...read moreread less

Abstract: Provided is an encoding apparatus for a High Quality Multi-channel Audio Codec (HQMAC) and a decoding apparatus for the HQMAC. The encoding/decoding apparatuses for the HQMAC may perform a High Quality Multi-channel Audio Codec-Channel Based (HQMAC-CB) encoding or an HQMAC-CB decoding in accordance with characteristics of inputted audio signals to provide compatibility with a lower channel.

...read moreread less

36 citations

SILK Speech Codec

[...]

Soeren Jensen, Koen Vos, Karsten Soerensen

09 Sep 2010

TL;DR: SILK, a speech codec for real-time, packet- based voice communications, provides scalability in several dimensions through control of bitrate, packet rate, packet loss resilience and use of discontinuous transmission (DTX).

...read moreread less

Abstract: This document describes SILK, a speech codec for real-time, packet- based voice communications. Targeting a diverse range of operating environments, SILK provides scalability in several dimensions. Four different sampling frequencies are supported for encoding the audio input signal. Adaptation to network characteristics is provided through control of bitrate, packet rate, packet loss resilience and use of discontinuous transmission (DTX). And several different complexity levels let SILK take advantage of available processing power without relying on it. Each of these properties can be adjusted during operation of the codec on a frame-by-frame basis.

...read moreread less

30 citations

Patent•

Full-Band Scalable Audio Codec

[...]

Jinwei Feng¹, Chu Peter•Institutions (1)

Polycom¹

01 Jul 2010

TL;DR: In this article, a scalable audio codec for a processing device determines first and second bit allocations for each frame of input audio, and the allocations are made on a frame-by-frame basis based on the energy ratio between the two bands.

...read moreread less

Abstract: A scalable audio codec for a processing device determines first and second bit allocations for each frame of input audio. First bits are allocated for a first frequency band, and second bits are allocated for a second frequency band. The allocations are made on a frame-by-frame basis based on the energy ratio between the two bands. For each frame, the codec transform codes both frequency bands into two sets of transform coefficients, which are then packetized based on the bit allocations. The packets are then transmitted with the processing device. Additionally, the frequency regions of the transform coefficients can be arranged in order of importance determined by power levels and perceptual modeling. Should bit stripping occur, the decoder at a receiving device can produce audio of suitable quality given that bits have been allocated between the bands and the regions of transform coefficients have been ordered by importance.

...read moreread less

17 citations

Journal Article•DOI•

Adaptive Signal Modeling Based on Sparse Approximations for Scalable Parametric Audio Coding

[...]

N. Ruiz Reyes¹, Pedro Vera Candeas¹•Institutions (1)

University of Jaén¹

01 Mar 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A fully parametric audio coder, which decomposes the audio signal into sinusoids, transients and noise, is here proposed and the performance of the proposed parametricaudio coders is assessed in comparison to widely used audio coders operating at similar bit rates.

...read moreread less

Abstract: This paper deals with the application of adaptive signal models for parametric audio coding. A fully parametric audio coder, which decomposes the audio signal into sinusoids, transients and noise, is here proposed. Adaptive signal models for sinusoidal, transient, and noise modeling are therefore included in the parametric scheme in order to achieve high-quality and low bit-rate audio coding. In this paper, a new sinusoidal modeling method based on a perceptual distortion measure is proposed. For transient modeling, a fast and effective method based on matching pursuit with a mixed dictionary is chosen. The residue of the previous models is analyzed as a noise-like signal. The proposed parametric audio coder allows high quality audio coding for one-channel audio signals at 16 kbits/s (average bit rate). A bit-rate scalable version of the parametric audio coder is also proposed in this work. Bit-rate scalability is intended for audio streaming applications, which are highly demanded nowadays. The performance of the proposed parametric audio coders (nonscalable and scalable coders) is assessed in comparison to widely used audio coders operating at similar bit rates.

...read moreread less

14 citations

Proceedings Article•DOI•

Bandwidth extension of telephone speech using a filter bank implementation for highband MEL spectrum

[...]

Hannu Pulakka¹, Ville Myllylau², Laura Laaksonen², Paavo Alku¹•Institutions (2)

Aalto University¹, Nokia²

01 Aug 2010

TL;DR: A new method for the bandwidth extension of telephone speech using only the information in the narrowband speech to improve speech quality compared with a previously published bandwidth extension method.

...read moreread less

Abstract: The limited audio bandwidth used in telephone systems degrades both the quality and the intelligibility of speech. This paper presents a new method for the bandwidth extension of telephone speech. Frequency components are added to the frequency band 4–8 kHz using only the information in the narrowband speech. First, a wideband excitation is generated by spectral folding from the narrowband linear prediction residual. The highband of this signal is divided into four subbands with a filter bank, and a neural network is used to weight the subbands based on features calculated from the narrowband speech. Bandwidth-extended speech is obtained by summing the weighted subbands and the original narrowband signal. Listening tests show that this new method improves speech quality compared with a previously published bandwidth extension method.

...read moreread less

13 citations

Patent•

Ip based microphone and intercom

[...]

Fred H. Holmes, Kevin C. Baxter

01 Sep 2010

TL;DR: In this paper, a transceiver consisting of a codec, a microcontroller, and a radio is used to receive the first digital audio signal from the codec and packetize it into a first packet for transmission over a TCP/IP network.

...read moreread less

Abstract: A transceiver including, a codec, microcontroller, and radio. The codec including an analog to digital converter for receiving a first audio program and converting it to a first digital signal; a digital to analog converter for receiving a second digital audio signal and converting it to a second audio program; and, a control function for managing characteristics of the codec. The microcontroller is in electrical communication with the codec: for receiving the first digital audio signal from the codec and packetizing it into a first packet for transmission over a TCP/IP network; for receiving a second packet from network and converting it into the second digital audio signal and sending it to the codec; and for receiving control signals from the network. The radio is in electrical communication with the microcontroller for connection to the network to transmit the first packet to the network and receive the second packet from the network.

...read moreread less

12 citations

Journal Article•DOI•

Autoregressive Models of Amplitude Modulations in Audio Compression

[...]

Sriram Ganapathy¹, Petr Motlicek², Hynek Hermansky¹•Institutions (2)

Johns Hopkins University¹, Idiap Research Institute²

01 Aug 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The subjective and objective quality evaluations show that the reconstruction signal quality for the proposed FDLP codec compares well with the state-of-the-art audio codecs in the 32-64 kbps range.

...read moreread less

Abstract: We present a scalable medium bit-rate wide-band audio coding technique based on frequency-domain linear prediction (FDLP). FDLP is an efficient method for representing the long-term amplitude modulations of speech/audio signals using autoregressive models. For the proposed audio codec, relatively long temporal segments (1000 ms) of the input audio signal are decomposed into a set of critically sampled sub-bands using a quadrature mirror filter (QMF) bank. The technique of FDLP is applied on each sub-band to model the sub-band temporal envelopes. The residual of the linear prediction, which represents the frequency modulations in the sub-band signal, are encoded and transmitted along with the envelope parameters. These steps are reversed at the decoder to reconstruct the signal. The proposed codec utilizes a simple signal independent nonadaptive compression mechanism for a wide class of speech and audio signals. The subjective and objective quality evaluations show that the reconstruction signal quality for the proposed FDLP codec compares well with the state-of-the-art audio codecs in the 32-64 kbps range.

...read moreread less

Patent•

Codec deployment using in-band signals

[...]

Min Wang¹, Dai Yang¹, Vivek Rajendran¹, Ananthapadmanabhan Arasanipalai Kandhadai¹, Alireza Ryan Heidari¹, Jeremy Toman¹ - Show less +2 more•Institutions (1)

Qualcomm¹

14 Oct 2010

TL;DR: In this article, in-band signaling is used between two stations to change the codec that is to be used during a call setup, and if the receiving station detects and reacts to the inband signals, then both stations change to communicate with the second codec.

...read moreread less

Abstract: After a call is established between two stations using a codec that has been negotiated during call setup, in-band signaling may be used between the two stations to change the codec that is to be used. The in-band signals are indicative that the station that is transmitting the in-band signals can operate with a second codec and are used to probe whether the receiving station can also operate with that second codec. If the receiving station detects and reacts to the in-band signals, then both stations change to communicate with the second codec. The second codec has compatible packet sizes of the deployed (originally negotiated) codec without any need of infrastructure upgrade and/or quality compromise to legacy phone users (i.e., stations that cannot operate with the second codec).

...read moreread less

Patent•

Echo canceller with correlation using pre-whitened data values received by downlink codec

[...]

Adrian Fratila¹•Institutions (1)

Agere Systems¹

13 Oct 2010

TL;DR: In this article, a finite impulse response filter (FIR) coefficients characterizing an echo path between its local audio output and audio input are applied to the received/decompressed audio data, and the predicted echo is subtracted from the uplink signal.

...read moreread less

Abstract: Duplex audio communications over a network use compressed audio data, with linear prediction coefficients (LPCs) and variances by which sample values differ from predictions. A adaptive echo canceller for a transceiver develops finite impulse response filter (FIR) coefficients characterizing an echo path between its local audio output and audio input. The received/decompressed audio data is applied to the FIR coefficients, and the predicted echo is subtracted from the uplink signal. Echo is detected as cross-correlation of the receive signal versus the uplink/send signal over time. In one embodiment, the cross-correlation is determined using a pre-whitened receive signal, obtained by adopting the variance values received over the network by the downlink Codec. Apart from the uplink Codec, no speech analysis filter or process is needed. The technique is apt for GSM, AMR and similar compressed audio communications.

...read moreread less

Journal Article•DOI•

Glottal-Shape Codebook to Improve Robustness of CELP Codecs

[...]

V Eksler¹, M. Jelinek¹•Institutions (1)

Université de Sherbrooke¹

01 Aug 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A new technique for the class of code-excited linear prediction speech codecs designed to reduce error propagation after lost frames is presented, which consists in replacing the interframe long-term prediction with a glottal-shape codebook in the subframe containing the firstglottal impulse in a given frame.

...read moreread less

Abstract: This paper presents a new technique for the class of code-excited linear prediction speech codecs designed to reduce error propagation after lost frames. Its principle consists in replacing the interframe long-term prediction with a glottal-shape codebook in the subframe containing the first glottal impulse in a given frame. This technique, independent of previous frames, is of particular interest in voiced speech frames following transitions as these frames are the most sensitive to frame erasures. It is a basis of a structured coding scheme called transition coding (TC). The TC greatly improves codec performance in noisy channels while maintaining clean channel performance. It is a part of the new embedded speech and audio codec recently standardized as Recommendation G.718 by ITU-T.

...read moreread less

FPGA-Based Design of Controller for Sound Fetching from Codec Using Altera DE2 Board

[...]

A. R. M. Khan, Ajay P. Thakare, S. M. Gulhane

01 Jan 2010

TL;DR: An experimental design and implementation of the controller using the specification given by the Philips for I2C protocol & DSP mode of operation of CODEC on cyclone-II EP2C35F72C6 FPGA in Altera DE2 board is presented.

...read moreread less

Abstract: The trend in hardware design is towards implementing a complete system, intended for various applications, on a single chip. In order to implement the any speech application in Altera DE2 board a controller is designed to control the CODEC and acquire the digital data from it. This paper presents an experimental design and implementation of the controller using the specification given by the Philips for I2C protocol & DSP mode of operation of CODEC on cyclone-II EP2C35F72C6 FPGA in Altera DE2 board . A controller was designed using VHDL language, which performs the two operations: I2C protocol operation to drive the Wolfson Codec WM8731, sound fetching from Wolfson Codec WM8731 to FPGA in DSP mode. Altera Quartus II 9.0 sp2 web Edition is used for the synthesis of the VHDL logic on FPGA and ModelSim- Altera 6.5b (Quartus II 9.1) Starter Edition is used for the simulation of VHDL logic. Three modules have been created in the design: the I2C bus controller, virtual sound fetcher, and the clock module. The FPGA communicates with the Wolfson via the I2C (Inter-Integrated Circuit) protocol using two pins: 'SDIN' (the data line), and 'SCLK' (the bus clock). I2C bus controller modifies internal settings of Codec, de-mute the microphone input, boost the microphone volume, and change the default sound path (so that the microphone is given priority over other inputs). After the codec digitalizes the input it put the digital data on digital audio interface, to fetch the data on DACDAT of codec form digital audio interface DSP mode of operation of codec is used in the design. DACDAT is the formatted digital audio data stream with left and right channels multiplexed together. DACLRC (alignment clock) and BCLK (synchronization clock) is used to fetch the data on DACDAT this data can be use for any sound application. Clock module is design to generate different clock requirement for the controller.

...read moreread less

Proceedings Article•DOI•

Dual-mode switching used for unified speech and audio codec

[...]

Min Lu¹, Shuhua Zhang¹, Weibei Dou¹•Institutions (1)

Tsinghua University¹

01 Nov 2010

TL;DR: The scores of PESQ with 11 testing sequences show that the proposed switching method will not bring additional noise and can get higher objective evaluation of audio quality than single codec.

...read moreread less

Abstract: This paper presents a dual-mode switching method between time-domain codec and transform-domain codec of audio coding. It is a key technique of unified speech and audio (music) coding, since the replaying audio quality corresponds to the suitable codec selection and smooth switching between them. The proposed method consists of two steps, codec mode selection and switching. The binary decision trees (BDTs) algorithm is used to take a decision of mode selection, because of its advantages of high accuracy, low delay and low complexity. For smoothing transition between two codec, a pre-coding strategy is suggested in this paper. The classical speech codec, Algebraic Code Excited Linear Prediction (ACELP) and the Advanced Audio Coding (AAC) of MPEG are used for validating the proposed method. The scores of PESQ with 11 testing sequences show that the proposed switching method will not bring additional noise and can get higher objective evaluation of audio quality than single codec.

...read moreread less

Patent•

Multi codec player

[...]

Yooseok Kim¹, Kyoungjoung Kim¹, Younghun Jang¹, Youngkook Seo¹, Hyeyoung Hong¹ - Show less +1 more•Institutions (1)

LG Electronics¹

15 Dec 2010

Proceedings Article•DOI•

Joint optimization of the perceptual core and lossless compression layers in scalable audio coding

[...]

Ravelli Emmanuel¹, Vinay Melkote¹, Tejaswi Nanjundaswamy¹, Kenneth Rose¹•Institutions (1)

University of California, Santa Barbara¹

14 Mar 2010

TL;DR: This work proposes a trellis-based approach to directly optimize the trade-off between the quality of the AAC core and the lossless compression performance of SLS, and shows that such optimization can in fact achieve an AAC core of superior perceptual quality while maintaining state-of-the-art Lossless compression, all this in compliance with the HD-AAC standard.

...read moreread less

Abstract: MPEG-4 High-Definition Advanced Audio Coding (HD-AAC) enables scalable-to-lossless (SLS) audio coding with an Advanced Audio Coding (AAC) base layer, and fine-grained enhancements based on the MPEG SLS standard. While the AAC core offers better perceptual quality at lossy bit-rates, its inclusion has been observed to compromise the ultimate lossless compression performance as compared to the SLS ‘non-core’ (i.e., without an AAC base layer) codec. In contrast, the latter provides excellent lossless compression but with significantly degraded audio quality at low bit-rates. We propose a trellis-based approach to directly optimize the trade-off between the quality of the AAC core and the lossless compression performance of SLS. Simulations to test the effectiveness of the approach demonstrate the capability to adjust the trade-off to match application specific needs. Moreover, such optimization can in fact achieve an AAC core of superior perceptual quality while maintaining state-of-the-art (and surprisingly sometimes even better) lossless compression, all this in compliance with the HD-AAC standard.

...read moreread less

Journal Article•DOI•

Parametric Packet-Layer Model for Evaluation Audio Quality in Multimedia Streaming Services

[...]

Noritsugu Egi¹, Takanori Hayashi¹, Akira Takahashi¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Jun 2010-IEICE Transactions on Communications

TL;DR: The verification results indicate that audio quality estimated by the proposed parametric packet-layer model has a high correlation with perceived audio quality.

...read moreread less

Abstract: We propose a parametric packet-layer model for monitoring audio quality in multimedia streaming services such as Internet protocol television (IPTV). This model estimates audio quality of experience (QoE) on the basis of quality degradation due to coding and packet loss of an audio sequence. The input parameters of this model are audio bit rate, sampling rate, frame length, packet-loss frequency, and average burst length. Audio bit rate, packet-loss frequency, and average burst length are calculated from header information in received IP packets. For sampling rate, frame length, and audio codec type, the values or the names used in monitored services are input into this model directly. We performed a subjective listening test to examine the relationships between these input parameters and perceived audio quality. The codec used in this test was the Advanced Audio Codec-Low Complexity (AAC-LC), which is one of the international standards for audio coding. On the basis of the test results, we developed an audio quality evaluation model. The verification results indicate that audio quality estimated by the proposed model has a high correlation with perceived audio quality.

...read moreread less

Proceedings Article•DOI•

A 10mW stereo audio CODEC in 0.13µm CMOS

[...]

Xicheng Jiang¹, Jungwoo Song¹, Todd L. Brooks¹, Jianlong Chen¹, Vinay Chandrasekar¹, Felix Cheung¹, Sherif Galal¹, Darwin Cheung¹, Gil-Cho Ahn², Madhulatha Bonu³ - Show less +6 more•Institutions (3)

Broadcom¹, Sogang University², Texas Instruments³

18 Mar 2010

TL;DR: The paper presents a 1.5V 10mW full-featured stereo audio CODEC that is integrated with a Bluetooth radio and PMU on a single die that is optimized for low-voltage operation and low-power consumption.

...read moreread less

Abstract: Low-power and full-featured stereo audio CODECs are increasingly needed in wireless devices, such as Bluetooth headsets and smart phones. These portable devices are usually powered by low-voltage batteries with limited capacities. It is of particular importance that such CODECs be optimized for low-voltage operation and low-power consumption. The paper presents a 1.5V 10mW full-featured stereo audio CODEC that is integrated with a Bluetooth radio and PMU on a single die. As depicted in Fig. 4.5.1, the CODEC contains microphone PGAs, audio ΔΣ ADCs and DACs, speaker drivers and microphone bias generators.

...read moreread less

Patent•

Coding method, decoding method, codec method, codec system and relevant apparatuses

[...]

Haiting Li¹•Institutions (1)

Huawei¹

30 Dec 2010

TL;DR: In this paper, a coding method, decoding method, coding-decoding (codec) method, a codec system, and relevant apparatuses are disclosed, which includes: obtaining an amplitude vector and a length vector corresponding to a vector to be coded.

...read moreread less

Abstract: A coding method, a decoding method, a coding-decoding (codec) method, a codec system and relevant apparatuses are disclosed. The coding method includes: obtaining an amplitude vector and a length vector corresponding to a vector to be coded; sorting elements of the amplitude vector and elements of the length vector; and obtaining a position index value according to the sorted amplitude vector and the sorted length vector. A decoding method, a codec system, and relevant apparatuses are also provided.

...read moreread less

Proceedings Article•DOI•

Order Statistics for Voice Activity Detection in VoIP

[...]

R. Muralishankar¹, R. V. Prasad², S. Vijay², H. N. Shankar¹•Institutions (2)

CMR Institute of Technology¹, Delft University of Technology²

23 May 2010

TL;DR: A novel technique to identify the voice and silent regions of a speech stream that is very much suitable for VoIP calls is introduced, which uses an entropy measure, which is based on the spacings of order statistics of speech frames to differentiate the silence zones from the speech zones.

...read moreread less

Abstract: Realtime voice communication over the Internet has rapidly gained popularity. It is indeed essential to reduce the total bandwidth consumption to efficiently use the available bandwidth for the subscribers having low speed connectivity and even otherwise. In this paper we introduce a novel technique to identify the voice and silent regions of a speech stream that is very much suitable for VoIP calls. We use an entropy measure, which is based on the spacings of order statistics of speech frames to differentiate the silence zones from the speech zones. We developed an algorithm that uses an adaptive thresholding to minimize the misdetection. The performance of our approach is compared with the built-in VAD of AMR codec. Our approach yields comparatively better saving in bandwidth yet maintaining a good quality of the speech streams. Further, the proposed approach has improved voice detection compared to the AMR schemes under noisy conditions. The ideas presented in this paper has been identified novel during the WIPO international patent search.

...read moreread less

Journal Article•DOI•

Complexity Scalability Design in the Internet Low Bit Rate Codec (iLBC) for Speech Coding

[...]

Fu-Kun Chen¹, Kuo-Bao Kuo¹•Institutions (1)

National Taiwan University¹

01 May 2010-IEICE Transactions on Information and Systems

TL;DR: A complexity scalability design is proposed for the coding of the dynamic codebook search in the iLBC speech codec and results show that the computational complexity can be effectively reduced with imperceptible degradation of the speech quality.

...read moreread less

Abstract: Differing from the long-term prediction used in the modern speech codec, the standard of the internet low bit rate codec (iLBC) independently encodes the residual of the linear predictive coding (LPC) frame by frame. In this paper, a complexity scalability design is proposed for the coding of the dynamic codebook search in the iLBC speech codec. In addition, a trade-off between the computational complexity and the speech quality can be achieved by dynamically setting the parameter of the proposed approach. Simulation results show that the computational complexity can be effectively reduced with imperceptible degradation of the speech quality.

...read moreread less

Journal Article•DOI•

Type-2 fuzzy logic control of PQoS driven adaptive VoIP scheme

[...]

Emmanuel Jammeh¹, Is-Haka Mkwawa¹, Lingfen Sun¹, Emmanuel Ifeachor¹•Institutions (1)

University of Plymouth¹

21 Jan 2010-Electronics Letters

TL;DR: An interval Type-2 fuzzy logic controlled scheme for VoIP services that infers network state from average delivered perceived quality of service and its degradation due to network congestion and updates an AMR codec mode to match voice quality to available network bandwidth.

...read moreread less

Abstract: Adaptive VoIP schemes have potentially suboptimal performance owing to imprecision in the metrics used to infer network state. An interval Type-2 fuzzy logic controlled scheme for VoIP services is presented. It infers network state from average delivered perceived quality of service and its degradation due to network congestion and updates an AMR codec mode to match voice quality to available network bandwidth. Tests showed that the scheme maximised delivered voice quality and outperformed an existing adaptive scheme. The scheme achieves robust performance in the presence of input imprecision and can be implemented in VoIP terminals, and the fuzzy rule base is easy to understand and change by non-experts because of its similarity to the human decision-making process.

...read moreread less

Patent•

Multi-bus architecture for a video codec

[...]

Jorge Rubinstein¹, Albert Rooyakkers¹•Institutions (1)

Maxim Integrated¹

22 Apr 2010

TL;DR: In this paper, a multi-bus architecture within a video codec that discretely and efficiently transports video components within the codec is presented. But it does not address the specific characteristics of the video components or parameters being processed.

...read moreread less

Abstract: Embodiments of the present invention relate to a multi-bus architecture within a video codec that discretely and efficiently transports video components within the codec. This multi-bus architecture provides a relatively more efficient transport mechanism because the various buses are designed to specifically address unique characteristics of the video components or parameters being processed within the codec.

...read moreread less

Journal Article•DOI•

Wide-band audio coding based on frequency-domain linear prediction

[...]

Petr Motlicek¹, Sriram Ganapathy², Hynek Hermansky², Harinath Garudadri³•Institutions (3)

Idiap Research Institute¹, Johns Hopkins University², Qualcomm³

01 Jan 2010-Eurasip Journal on Audio, Speech, and Music Processing

TL;DR: This work revisits an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope and results in a codec that does not rely on the linear speech production model but rather uses well-accepted concept of frequency-selective auditory perception.

...read moreread less

Abstract: We revisit an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope. A recently developed technique, called frequency-domain linear prediction (FDLP), is applied for the efficient estimation of the envelope. The processing in the temporal domain allows for a straightforward emulation of the forward temporal masking. This, combined with an efficient nonuniform sub-band decomposition and application of noise shaping in spectral domain instead of temporal domain (a technique to suppress artifacts in tonal audio signals), yields a codec that does not rely on the linear speech production model but rather uses well-accepted concept of frequency-selective auditory perception. As such, the codec is not only specific for coding speech but also well suited for coding other important acoustic signals such as music and mixed content. The quality of the proposed codec at 66 kbps is evaluated using objective and subjective quality assessments. The evaluation indicates competitive performance with the MPEG codecs operating at similar bit rates.

...read moreread less

Journal Article•DOI•

A new error concealment method for consecutive frame loss based on CELP speech

[...]

Jie Yang, Sheng sheng Yu, Jingli Zhou, Yi Gao

01 Sep 2010-Computers & Electrical Engineering

TL;DR: Objective and subjective experimental results confirm that the proposed algorithm could achieve better speech quality and the value of pitch lag when consecutive frames are lost and the recovery of codebook gain for good frames after continuous bad frames are discussed.

...read moreread less

Proceedings Article•DOI•

The study of video acquisition and compression codec system based on DaVinci

[...]

Yakun Liu¹, Xiaodong Cheng¹•Institutions (1)

Inner Mongolia University¹

15 Nov 2010

TL;DR: This paper achieves the design of a video acquisition and compression codec system, which takes dual-core chips TMS320DM6446 as its core and Linux as its operating system because Linux can be reduced and transplanted.

...read moreread less

Abstract: This paper achieves the design of a video acquisition and compression codec system, which takes dual-core chips TMS320DM6446 as its core and Linux as its operating system. This is because Linux can be reduced and transplanted. The video capture device driver V4L2 and Codec Engine are introduced in detail, and through H.264 algorithm functions of video compression codec are successfully realized. Relevant experiments show that the codec algorithm has strong performance of anti-errors and the videos are clear and reliable after compression codec. Moreover the mount of the video data is largely reduced.

...read moreread less

Proceedings Article•DOI•

Performance analysis and evaluation of AVS-M audio coding

[...]

Zhang Wen¹, Zhang Tao¹, Zhao Liang¹, Zhang Hai¹•Institutions (1)

Tianjin University¹

01 Nov 2010

TL;DR: The result is that AVS-M audio performance is no worse than AMR-WB+ on average, and fixed-point version of AVs-M codec are implemented on DSP platform.

...read moreread less

Abstract: AVS-M audio standard, which targets at wireless network and mobile equipment, is now independently drawn up in China. Its framework is similar to that of AMR-WB+. The performance of AVS-M audio core algorithms is analyzed in this paper. In order to analyze its complexity fixed-point version of AVS-M codec are implemented on DSP platform. At last, a performance evaluation between AVS-M and AMR-WB+ is discussed. The result is that AVS-M audio performance is no worse than AMR-WB+ on average.

...read moreread less

Proceedings Article•DOI•

Two dimensional Singular Value Decomposition (2D-SVD) based video coding

[...]

Zhouye Gu¹, Weisi Lin¹, Bu-Sung Lee¹, Chiew Tong Lau¹, Manoranjan Paul¹ - Show less +1 more•Institutions (1)

Nanyang Technological University¹

03 Dec 2010

TL;DR: This paper proposes a low-complexity video codec based on two-dimensional Singular Value Decomposition (2D-SVD), which has higher coding efficiency and performs well to deal with packet loss that is unavoidable in error-prone transmission.

...read moreread less

Abstract: In this paper, we propose a low-complexity video codec based on two-dimensional Singular Value Decomposition (2D-SVD). We exploit the common temporal characteristics of video without resorting to motion estimation. It has been demonstrated that this codec has higher coding efficiency than the relevant existing low complexity codecs. Moreover, the proposed codec performs well to deal with packet loss that is unavoidable in error-prone transmission. Therefore it is with advantages and good potential for wireless video applications such as mobile video calls and wireless surveillance.

...read moreread less