scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2003"


MonographDOI
18 Apr 2003

257 citations


Patent
10 Oct 2003
TL;DR: In this paper, a source-controlled variable bit-rate multi-mode wideband (VMR-WB) speech codec is proposed for interoperation with an adaptive multi-rate wideband codec.
Abstract: A source-controlled Variable bit-rate Multi-mode WideBand (VMR-WB) speech codec, having a mode of operation that is interoperable with the Adaptive Multi-Rate wideband (AMR-WB) codec, the codec comprising: at least one Interoperable full-rate (1-FR) mode, having a first bit allocation structure based an one of a AMR-WB codec coding types; and at least one comfort noise generator (CNG) coding type for encoding inactive speech frame having a second bit allocation structure based on AMR-WB SIDlUPDATE coding type. Methods for i) digitally encoding a sound using a source-controlled Variable bit rate multi-mode wideband (VMR-WB) speech codec for interoperation with an adaptative multi-rate wideband (AMR-WB) codec, ii) translating a Variable bit rate multi-mode wideband (VMR-WB) speech codec-signal frame into an Adaptive Multi-Rate wideband (AMR-WB) speech signal frame, iii) translating an Adaptive Multi-Rate wideband (AMR-WB) speech signal frame into a Variable bit rate multi-mode wideband (VMR-WB) speech signal frame, and iv) translating an Adaptive Multi-Rate wideband (AMR-WB) speech signal frame into a Variable bit rate multi-mode wideband (VMR-WB) speech signal frame are also provided.

89 citations


Book ChapterDOI
01 Jan 2003
TL;DR: In this chapter those parts of the H.263 standard that make this codec more efficient than its predecessors will be explained.
Abstract: The H.263 Recommendation specifies a coded representation that can be used for compressing the moving picture components of audio-visual services at low bit rates. Detailed specifications of the first generation of this codec under the test model (TM) to verify the performance and compliance of this codec were finalised in 1995. The basic configuration of the video source algorithm in this codec is based on ITU-T Recommendation H.261, which is a hybrid of interpicture prediction to utilise temporal redundancy and transform coding of the residual signal to reduce spatial redundancy. However, during the course of the development of H.261 and the subsequent advances on video coding in MPEG-1 and MPEG-2 video codecs, substantial experience was gained, which has been exploited to make H.263 an efficient encoder. In this chapter those parts of the H.263 standard that make this codec more efficient than its predecessors will be explained.

82 citations


Patent
28 Feb 2003
TL;DR: In this paper, a video coding-decoding (CODEC) method in an error resilient mode, a computer readable medium having a computer program for the video CODEC method, and a video-coding apparatus.
Abstract: A video coding-decoding (CODEC) method in an error resilient mode, a computer readable medium having a computer program for the video CODEC method, and a video CODEC apparatus. The video CODEC method provides more resilience against channel error such that communications are less affected by error under conditions in which errors are a serious problem such as in a wireless communications channel. In the video CODEC method, a header data part (HDP) bit region, a motion vector data part (MVDP) bit region and a discrete cosine transform data part (DDP) bit regions are partitioned from each macro block of the video data in an error resilient mode, and then the partitioned bit regions are variable-length-coded. Then, the bit regions selected from the variable-length coded bit regions according to a predetermined priority for recovery are reversible-variable-length-coded, and markers are then inserted into the variable-length coded or reversible-variable-length-coded bit regions.

70 citations


Patent
Yang Gao1, Adil Benyassine1, Jes Thyssen1, Eyal Shlomot1, Huan-Yu Su1 
08 Apr 2003
TL;DR: In this article, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

64 citations


Patent
08 Jan 2003
TL;DR: In this paper, a method for transcoding a CELP-based compressed voice bitstream from source codec to destination codec is proposed, which includes processing a source codec input cELP bitstream to unpack at least one or more CELPs from the input bitstream and interpolating a plurality of unpacked cELPs.
Abstract: A method for transcoding a CELP based compressed voice bitstream from source codec to destination codec. The method includes processing a source codec input CELP bitstream to unpack at least one or more CELP parameters from the input CELP bitstream and interpolating one or more of the plurality of unpacked CELP parameters from a source codec format to a destination codec format if a difference of one or more of a plurality of destination codec parameters including a frame size, a subframe size, and/or sampling rate of the destination codec format and one or more of a plurality of source codec parameters including a frame size, a subframe size, or sampling rate of the source codec format exist. The method includes encoding the one or more CELP parameters for the destination codec and processing a destination CELP bitstream by at least packing the one or more CELP parameters for the destination codec.

50 citations


Patent
Yang Gao1, Adil Benyassine1, Jes Thyssen1, Eyal Shlomot1, Huan-Yu Su1 
08 Apr 2003
TL;DR: In this article, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

25 citations


Patent
08 Jan 2003
TL;DR: In this article, the authors propose a method to decode a CELP-based compressed voice bitstream from source to destination codec by unpacking the parameters from the input CelP bistream and interpolating the unpacked parameters from a difference of destination codec parameters and source codec parameters.
Abstract: Transcoding a CELP based compressed voice bitstream from source codec to destination codec relate to embodiments of a system and method. The method includes processing a source codec input bitstream to unpack (1) CELP parameters from the input CELP bistream and may interpolate (2) the unpacked CELP parameters from is a difference of destination codec parameters and source codec parameters exists. If the method maps (4) CELP from source codec format to a destination codec format, the parameter mapping strategy may be singly preset or selected (3). The method inludes encoding the CELP parameters for the destination codec and processing a destination CELP bitstream by packing (7) the CELP parameters for the destination codec.

24 citations


Proceedings ArticleDOI
03 Mar 2003
TL;DR: A new architecture for VASA, a single-chip MPEG-2 422P@HL CODEC LSI with multichip configuration for large scale processing beyond the HDTV level, is proposed and demonstrates its flexibility and usefulness.
Abstract: This paper proposes a new architecture for VASA, a single-chip MPEG-2 422P@HL CODEC LSI with multichip configuration for large scale processing beyond the HDTV level, and demonstrates its flexibility and usefulness. This architecture consists of triple encoding cores, a decoding core, a multiplexer/de-multiplexer core, and several dedicated application-specific hardware modules with a hierarchical flexible communication scheme for high-performance data transfer. VASA is the world's first single-chip full-specs MPEG-2 422P@HL CODEC LSI with a multi-chip configuration. The VASA implements MPEG-2 video and system CODEC with generic audio CODEC interfaces. An LSI incorporating the architecture was successfully fabricated using the 0.13 /spl mu/m eight-metal CMOS process. The architecture not only provides an MPEG-2 422P@HL CODEC but also large scale processing beyond the HDTV level for digital cinema and multi-view/-angled live TV applications with a multi-chip configuration. The VASA implementations will lead to a new dimension in future high-quality, high-resolution digital multimedia entertainment.

24 citations


Patent
30 Dec 2003
TL;DR: In this article, a media gateway controller for call set-up processing when different codecs are used and a method therefor are provided a storage unit stores a codec conversion table indicating a relationship between a first codec and a second codec in conversion from the first codec to the second codec.
Abstract: A media-gateway controller for call set-up processing when different codecs are used and a method therefor are provided A storage unit stores a codec conversion table indicating a relationship between a first codec and a second codec in conversion from the first codec to the second codec A receiver receives first call setting data including codec data of a caller from the caller and receives first call response data including codec data of a callee from the callee as a response to second call setting data having been transmitted to the callee A data transformer searches the codec conversion table for a first codec using the caller's codec data as an index and adds a second codec corresponding to the searched first codec to the first call setting data to generate the second call setting data In addition, the data transformer searches the codec conversion table for a second codec using the callee's codec data as an index and replaces the callee's codec data included in the first call response data with a first codec corresponding to the searched second codec to generate a second call response data A transmitter transmits the second call setting data to the callee and transmits the second call response data to the caller

20 citations


Proceedings ArticleDOI
06 Jul 2003
TL;DR: A practical low-complexity real-time video codec for mobile devices that can significantly reduce the computational cost, including a predictive algorithm for motion estimation, the integer discrete cosine transform (IntDCT), and a DCT/quantizer bypass technique is developed.
Abstract: Real-time software-based video codec is widely used on PCs with relatively strong computing capability. However, mobile devices, such as pocket PCs and handheld PCs, still suffer from weak computational power, short battery lifetime and limited display capability. We developed a practical low-complexity real-time video codec for mobile devices. Several methods that can significantly reduce the computational cost are adopted in this codec and described in this paper, including a predictive algorithm for motion estimation, the integer discrete cosine transform (IntDCT), and a DCT/quantizer bypass technique. A real-time video communication implementation of the proposed coded is also introduced. Experiments show that substantial computation reduction is achieved while the loss in video quality is negligible. The proposed codec is very suitable for scenarios where low-complexity computing is required.

Journal ArticleDOI
TL;DR: Experimental results indicate that the coding scheme ensures transparent coding of one channel CD-quality audio signals at bit rates below 64 kbps for most audio signals, and the results confirm that the best way to achieve maximum compression rate and transparent coding is the usage of perceptual-entropy-based decompositions.

Book ChapterDOI
TL;DR: A new subjective, Internet-based MOS (Mean Opinion Score) test methodology which allows rapid assessment of voice quality and proposes novel conversational intrusive and non-intrusive speech quality measurement methods, based on the ITU PESQ and E-model to extend the applicability of existing methods.
Abstract: The need to evaluate voice quality in VoIP (Voice over IP) applications is an important requirement for technical and commercial reasons. This may involve subjective and/or objective voice quality measurements, but existing methods may not always be appropriate for VoIP applications. The aims of the study reported in the paper are to investigate new subjective and objective measurement methods for VoIP applications. The contributions of the paper are two-fold. First, we present a new subjective, Internet-based MOS (Mean Opinion Score) test methodology which allows rapid assessment of voice quality. We conducted MOS tests using the new method as well as traditional MOS tests under different VoIP network conditions and compared the results using objective measurement methods. Preliminary results show that the Internet-based MOS test compares well with traditional MOS test (correlation coefficients of 0.95). Second, we propose novel conversational intrusive and non-intrusive speech quality measurement methods, based on the ITU PESQ and E-model to extend the applicability of existing methods. We illustrate the application of the novel approach to the derivation of model parameters for a new codec for VoIP applications (the AMR codec).

Patent
Jari Mäkinen1, Pasi Ojala1
30 Oct 2003
TL;DR: In this paper, a method for performing variable rate speech coding in the speech codec comprising a plurality of speech codec modes operating at different bit rates, the speech encoded by said speech codec being arranged for transmission in a telecommunications network is described.
Abstract: A method for performing variable rate speech coding in the speech codec comprising a plurality of speech codec modes operating at different bit rates, the speech encoded by said speech codec being arranged for transmission in a telecommunications network. Information on an active speech codec mode set to be supported is received from the telecommunications network, in response to which the supported speech codec modes that correspond to the active codec mode set determined in the telecommunications network will be activated. Thereafter, speech signals to be applied to the speech codec are encoded with the activated speech codec modes such that the speech codec mode of the substantially lowest bit rate is adapted to the speech frames comprised by the speech signals such that in view of the channel conditions in the telecommunications network the level of residual error in coding will be substantially minimized at the same time.

Book ChapterDOI
TL;DR: An experimental study that extends the current knowledge of the VAD/DTX codec influence in the transmission rate by considering the new GSM AMR codec as well as G.723.1 and G.729B.
Abstract: This article presents an experimental study that extends the current knowledge of the VAD/DTX codec influence in the transmission rate. It considers the new GSM AMR codec as well as G.723.1 and G.729B. The type of the encoded frames have been studied in order to determine the real bit rate and the SID frames effect. The influence of the number of frames per packet have also been addressed, showing that there are optimal values that minimize packet bandwidth consumption.

Patent
Jung-Hoe Kim1, Sang-Wook Kim1
18 Dec 2003
TL;DR: In this paper, a scalable stereo audio coding and decoding method and apparatus are provided, which includes transforming a first channel and a second channel audio samples, quantizing the transformed first and second channels audio samples up to a predetermined transition layer, and then interleavingly coding the quantized first-and second-channel audio samples with increasing a layer index from a layer succeeding the transition layer until coding for a predetermined plurality of layers.
Abstract: Scalable stereo audio coding and decoding method and apparatus are provided. The scalable stereo audio coding method includes transforming a first channel and a second channel audio samples; quantizing the transformed first channel and a second channel audio samples; and coding the quantized first channel audio samples up to a predetermined transition layer and then interleavingly coding the quantized first and second channel audio samples with increasing a layer index from a layer succeeding the transition layer, until coding for a predetermined plurality of layers is finished.

Proceedings ArticleDOI
06 Apr 2003
TL;DR: A family of 3GPP-standard noise suppressors based on the MMSE STSA algorithm, which enables continuous noise estimation even in the speech period by using weighted noisy speech, and a full set of evaluations specified by3GPP are presented.
Abstract: The paper presents a family of 3GPP-standard noise suppressors and the evaluation results. The family consists of a high-quality version and a low-complexity version. These noise suppressors are based on the MMSE STSA (minimum mean square error short time spectral amplitude) algorithm originally proposed by Y. Ephraim and D. Malah (see IEEE Trans. Acoust., Speech, Sig. Processing, vol.ASSP-32, no.6, p.1109-21, 1984). To meet the 3GPP requirements with better speech quality, weighted noise estimation, synthesis windowing, and pseudo noise injection are incorporated. Weighted noise estimation enables continuous noise estimation even in the speech period by using weighted noisy speech. The weight is controlled such that a higher estimated SNR gives a smaller weight. A synthesis window function is applied between inverse transform and overlap-add processing for smooth transition at frame boundaries. Pseudo noise injection, which is not available in the low-complexity version, modifies the spectral gain based on its nonlinearity. The whole family satisfies all the 3GPP requirements. Results of a full set of evaluations specified by 3GPP are presented for the high-quality version.

Proceedings ArticleDOI
06 Apr 2003
TL;DR: A novel transcoding algorithm for the adaptive multi rate (AMR) codec and the enhanced variable rate codec (EVRC) is proposed, which transcodes the parameters of one codec to the other without synthesizing the speech.
Abstract: A novel transcoding algorithm for the adaptive multi rate (AMR) codec and the enhanced variable rate codec (EVRC) is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm transcodes the parameters of one codec to the other without synthesizing the speech. The proposed algorithm decodes the parameters of source codec from the input bitstream, and based on frame classification and mode decision, it appropriately transforms the parameters of source codec to those of the target codec in the parametric domain. Finally, the transformed parameters are encoded into a bitstream that is decodable by the target codec. The parameters transcoded by the proposed algorithm are line-spectral pair (LSP), pitch delay, fixed codevector, codebook gains, and frame energy. Evaluation results show that while reducing both the computational complexity and delay by 50%, the proposed algorithm produces speech quality equivalent to that of produced by the tandem transcoding algorithm. The general idea is not restricted to the AMR and EVRC but is applicable to various other code-excited linear prediction (CELP) based codecs.

Journal ArticleDOI
TL;DR: This work develops a progressive syntax-rich multichannel audio codec (PSMAC) that not only supports fine grain bit rate scalability for the multich channel audio bitstream but also provides several other desirable functionalities.
Abstract: Being able to transmit the audio bitstream progressively is a highly desirable property for network transmission. MPEG-4 version 2 audio supports fine grain bit rate scalability in the generic audio coder (GAC). It has a bit-sliced arithmetic coding (BSAC) tool, which provides scalability in the step of 1 Kbps per audio channel. There are also several other scalable audio coding methods, which have been proposed in recent years. However, these scalable audio tools are only available for mono and stereo audio material. Little work has been done on progressive coding of multichannel audio sources. MPEG advanced audio coding (AAC) is one of the most distinguished multichannel digital audio compression systems. Based on AAC, we develop in this work a progressive syntax-rich multichannel audio codec (PSMAC). It not only supports fine grain bit rate scalability for the multichannel audio bitstream but also provides several other desirable functionalities. A formal subjective listening test shows that the proposed algorithm achieves an excellent performance at several different bit rates when compared with MPEG AAC.

Journal ArticleDOI
01 Aug 2003
TL;DR: A flexible audio coding system for use in the continuous transmission of high-quality stereo-audio data streams, either live or recorded, that provides both lossless and variable-level lossy quality; quality is selectable to suit a full range of wide- and narrow-band IP networks.
Abstract: We present a flexible audio coding system for use in the continuous transmission of high-quality (192-kHz sampling rate, up to 24-bit digitization) stereo-audio data streams, either live or recorded. The system provides both lossless and variable-level lossy quality; quality is selectable to suit a full range of wide- and narrow-band IP networks. As input signals for transmission at the server PC, less than nine PCM sound files in different formats are simultaneously encodable, and the efficiency of simultaneous compression is very high. The system is realized by software that runs on a typical PC. This gives everyone on an IP network the ability to transmit high-quality sound anywhere within the network. An MPEG-4 audio codec provides the core of the lossy coding module.

Proceedings ArticleDOI
V. Gurkhe1
15 Oct 2003
TL;DR: The results indicate that stereo MP3 at 44 kHz and 128 kbps can be decoded using 27 MIPS on the ARM9TDMI, and the output of the decoder is fully bit-compliant with the standard on the ISO test vectors.
Abstract: MPEG-1/2 audio layer-3 (MP3) is the must popular format for playback of high quality compressed audio for portable devices such as audio players and mobile phones. Typically these devices are based on either DSP or RISC processors. While the DSP architecture is more efficient for implementing the MP3 algorithm, the challenges a RISC implementation are lesser understood. This paper describes the challenges and optimization techniques useful for implementing the MP3 decoder algorithm on the RISC-based ARM9TDMI processor. Some of these techniques are generic and hence applicable to the any audio codec implementation on RISC-based platforms. Our results, which are among the best in the industry, indicate that stereo MP3 at 44 kHz and 128 kbps can be decoded using 27 MIPS on the ARM9TDMI. In addition, the output of our decoder is fully bit-compliant with the standard on the ISO test vectors.

Reference EntryDOI
15 Apr 2003
TL;DR: This article is focused on speech coding methods for achieving communication quality speech at bit rates of 4 kbit/s and lower, based on an all-pole model of the vocal tract.
Abstract: This article is focused on speech coding methods for achieving communication quality speech at bit rates of 4 kbit/s and lower. The speech coding techniques are based on an all-pole model of the vocal tract which may be implemented in the time domain with appropriately selected excitation functions or else may be fit to a spectral analysis of the speech signal. Three main types of coders are described below. Code-excited linear prediction (CELP) coders select their excitation from waveform codebooks using analysis-by-synthesis closed-loop techniques, which need to be supplemented by speech classification and open-loop parametric techniques for keeping up with quality at lower rates. The prototypical sinusoidal coder (SC) has a bank of oscillators for signal synthesis, driven by a model of the magnitude spectrum. However, phase regeneration is important in enhancing speech reconstruction at low rates. Waveform interpolation (WI) coders afford a wider time-frequency footprint for the representation of the excitation, showing a good potential for achieving toll quality at bit rates below 4 kbit/s. Keywords: low bit rate speech coding; vocoder; codec; rate-distortion function; code-excited linear prediction; CELP; algebraic CELP; ACELP; linear prediction; LP; linear predictive coding; LPC; sinusoidal coder; waveform interpolation; WI; complexity; bit rate; fidelity; distortion; speech synthesis

Proceedings ArticleDOI
18 Sep 2003
TL;DR: The fast codebook search method is proposed that is simply modified version of the depth first tree search method used in algebraic code book search in the AMR codec by using the fast search method and exploiting the DSP architecture and managing the memory structure efficiently.
Abstract: The adaptive multi-rate speech codec consists of eight source codecs with bit rates from 4.75 to 12.2 kbit/s. This paper presents an AMR implementation especially focused on reducing the computational complexity. In order to reduce the computational load, we propose the fast codebook search method that is simply modified version of the depth first tree search method used in algebraic codebook search in the AMR codec. For the AMR implementation we designed the 16 bit fixed-point DSP based on the TeakLite DSP core, which was tailored for the AMR implementation. The implemented AMR codec requires only 19.6 MIPS of computation for the highest complexity mode of the AMR by using the fast search method and exploiting the DSP architecture and managing the memory structure efficiently. It is verified with all the test vectors provided by 3GPP, and stable operation on the real-time testing board was also confirmed.

Proceedings ArticleDOI
30 Nov 2003
TL;DR: It is concluded that the use of dedicated speech recognition codecs, such as DSR, does not offer tangible benefits in real-world systems and services.
Abstract: In this paper, we investigate the usefulness of general-purpose speech codecs and dedicated speech recognition codecs for speech-enabled services. Specifically, we focus on 3rd generation WCDMA systems using the adaptive multi-rate (AMR) speech codec, in comparison with the distributed speech recognition (DSR) framework. Speech recognition experiments are carried out with the AMR speech codec in a simulated packet-switched network. The performance of the DSR codec is assumed to be unaffected by transmission errors. Experimental results in British English and Mandarin Chinese indicate that no significant performance difference can be observed between the AMRand DSR-based recognition systems. The gain from using the dedicated DSR codec is unlikely to provide a perceptible improvement in terms of quality of service for the end-users. In the light of the experimental results achieved, and other implementation and economical issues, it is concluded that the use of dedicated speech recognition codecs, such as DSR, does not offer tangible benefits in real-world systems and services.

Proceedings ArticleDOI
02 Jul 2003
TL;DR: A linear predictive coding procedure is developed to allow its implementation with number theoretic transforms and the use of fermat number transform can reduce the cost of linear predictive algorithm implantation on digital signal processor.
Abstract: This paper is about the reduction of the computational complexity of a speech codec. A linear predictive coding procedure is developed to allow its implementation with number theoretic transforms. The use of fermat number transform can reduce, in a significant way, the cost of linear predictive algorithm implantation on digital signal processor.

Proceedings ArticleDOI
14 Oct 2003
TL;DR: Subband coding based MPEG-1 audio layer III (MP3) is now useful for any system with limited channel capacity for its high quality to bit rate ratio.
Abstract: Content-based audio feature extraction is key to obtaining important message from audio information. Research in the past several years has focused on the use of speech recognition techniques that are not directly applicable to compressed audio bit stream. However, subband coding based MPEG-1 audio layer III (MP3) is now useful for any system with limited channel capacity for its high quality to bit rate ratio. It has been widely adopted in audio-on-demand, music link via ISDN and digital satellite broadcasting. Message collection is easier if audio content can be extract directly on subband domain. Several useful algorithms are proposed here to manifest this idea.

Proceedings ArticleDOI
06 Apr 2003
TL;DR: The algorithm uses the robust pitch detection and efficient voicing analysis to split the LPC excitation into two bands and fixed phase spectrum from a voiced segment generated by a male speaker is added into the uniform harmonic modeling of the excitation signal.
Abstract: This paper presents an algorithm for encoding a speech signal at 2.3 kbit/s based on a uniform harmonic modeling of the excitation signal. The algorithm uses the robust pitch detection and efficient voicing analysis to split the LPC excitation into two bands. The lower band is related to the voiced parts of speech, while the upper band represents unvoiced speech. A fixed phase spectrum from a voiced segment generated by a male speaker is added into the uniform harmonic modeling of the excitation signal. This kind of fixed phase reduced the buzz effectively and produced soft natural speech. A short-term post-filter is utilized at the decoder to enhance the quality of synthesized speech. Subjective testing in Chinese showed that the 2.3 kbit/s HE-LPC coder performance is better than that of the federal standard 2.4 kbit/s MELP coder.

Proceedings ArticleDOI
06 Jul 2003
TL;DR: A two-channel predictive MD video codec architecture based on the recently proposed WYZE-PMD framework is presented, indicating that the proposed codec provides efficient, drift-free predictive MD coding.
Abstract: The main hindrance to the development of efficient low-latency multiple description (MD) video coders are the problem of predictive mismatch. In this paper, we present a two-channel predictive MD video codec architecture based on the recently proposed WYZE-PMD framework. The proposed codec transmits coset information to curtail error-propagation caused by predictive mismatch, without requiring high latency or restrictive channel assumptions. MD scalar quantizers are used to generate multiple descriptions, low-density parity check (LDPC) codes are used to generate coset information, and the H.263 video coding standard is used for efficient motion compensation. The proposed codec is used to code descriptions of CIF video for communication over two erasure channels with independent failure probabilities. Results indicate that the proposed codec provides efficient, drift-free predictive MD coding.

Proceedings Article
01 Jan 2003
TL;DR: Object measurements have shown that the modified speech signal can be coded more efficiently than the original signal and the proposed irrelevancy removal technique can be used at the front end of a speech coder to achieve enhanced coding efficiency.
Abstract: A masking model originally designed for audio signals is applied to narrowband speech. The model is used to detect and remove the perceptually irrelevant simultaneously masked frequency components of a speech signal. Objective measurements have shown that the modified speech signal can be coded more efficiently than the original signal. Furthermore, it has been confirmed through perceptual evaluation that the removal of these frequency components does not cause significant degradation of the speech quality but rather, it has consistently improved the output quality of two standardized speech codecs. Thus, the proposed irrelevancy removal technique can be used at the front end of a speech coder to achieve enhanced coding efficiency.

Journal ArticleDOI
TL;DR: A very low-complexity audio codec that provides audio playback quality similar to the MPEG-I/audio level 3 codec at 64 Kbps for a monophonic-channel signal and an adaptive arithmetic coder with multiplication-free adaptation is presented.