scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 1994"


Journal ArticleDOI
01 Jun 1994
TL;DR: Current activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding, which offer a quality versus bit rate tradeoff that significantly exceeds most prior compression techniques.
Abstract: Speech and audio compression has advanced rapidly in recent years spurred on by cost-effective digital technology and diverse commercial applications. Recent activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding. These algorithms exploit models of speech production and auditory perception and offer a quality versus bit rate tradeoff that significantly exceeds most prior compression techniques for rates in the range of 4 to 16 kb/s. Techniques have also been emerging in recent years that offer enhanced quality in the neighborhood of 2.4 kb/s over traditional vocoder methods. Wideband audio compression is generally aimed at a quality that is nearly indistinguishable from consumer compact-disc audio. Subband and transform coding methods combined with sophisticated perceptual coding techniques dominate in this arena with nearly transparent quality achieved at bit rates in the neighborhood of 128 kb/s per channel. >

234 citations


Journal ArticleDOI
TL;DR: A toll quality speech codec at 8 kb/s suitable for the future personal communications system and can support a frame erasure rate up to 3% with a degradation in its performance that is still worse than the ITU-T requirements.
Abstract: A toll quality speech codec at 8 kb/s suitable for the future personal communications system is presented. The codec is currently under standardization by the ITU-T (successor of CCITT) where the codec terms of reference were mainly determined considering PCS application. The encoding algorithm is based on algebraic code-excited linear prediction (ACELP) and has a speech frame of 10 ms. Efficient pitch and codebook search strategies, along with efficient quantization procedures, have been developed to achieve toll quality encoded speech with a complexity implementable on current fixed-point DSP chips. Formal subjective listening tests, performed by ITU-T SG 12, showed that the codec quality is equivalent to that of G.726 ADPCM at 32 kb/s in error-free conditions and it outperforms G.726 under error conditions. The codec performs adequately under tandeming conditions, and can support a frame erasure rate up to 3% with a degradation in its performance that is still worse than the ITU-T requirements, and this is one subject of study for the next phase. The algorithm has been implemented on a single fixed-point DSP for the ITU-T subjective rest, and required about 29 MIPS. An optimized version, however, requires 24 MIPS without any speech quality degradation. >

110 citations


Journal ArticleDOI
01 Aug 1994
TL;DR: In this article, an integrated image compression digital signal processor (called VDSP2) was developed to make the MPEG2 digital video codec compact, which is capable of processing MPEG2 main profile at main level in real-time at broadcast resolutions.
Abstract: An MPEG2 digital video codec was developed. We estimated the amount of calculation power requested for MPEG2 and designed the architecture of the codec. In order to make the codec compact, we developed an integrated image compression digital signal processor (called VDSP2). The VDSP2 integrates four different types of processors in the architecture that allows them to operate in parallel. The device is capable of both encoding and decoding the MPEG2-based algorithm by changing programs on the same chip. We also developed new dedicated hardware for motion estimation, which consists of two-pixel precision estimation and full and half pixel precision estimation. The codec is capable of processing MPEG2 main profile at main level in real-time at broadcast resolutions. >

52 citations


Journal ArticleDOI
W.B. Kleijn1, J. Haagen1
TL;DR: The decomposition of the characteristic waveform is decomposed into a slowly evolving waveform and a rapidly evolving waveforms, representing the quasi-periodic and other components of speech, respectively, which allows efficient coding of voiced and unvoiced speech at bit rates between 2 and 8 kb/s.
Abstract: The speech signal is represented by an evolving characteristic waveform. The characteristic waveform is decomposed into a slowly evolving waveform and a rapidly evolving waveform, representing the quasi-periodic and other components of speech, respectively. These two evolving waveforms have fundamentally different quantization requirements. The decomposition allows efficient coding of voiced and unvoiced speech at bit rates between 2 and 8 kb/s. >

51 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: A toll quality speech codec at 8 kbit/s with a 10 ms speech-frame currently under standardization by the CCITT is presented and initial subjective tests showed that the codec quality is equivalent to that of G.726 ADPCM in error-free conditions and it performs adequately under tandeming conditions.
Abstract: A toll quality speech codec at 8 kbit/s with a 10 ms speech-frame currently under standardization by the CCITT is presented. The encoding algorithm is based on algebraic code-excited linear prediction (ACELP). Efficient pitch and codebook search strategies, along with efficient quantization procedures, have been developed to achieve toll quality encoded speech with a complexity implementable on current fixed-point DSP chips. Initial subjective tests showed that the codec quality is equivalent to that of G.726 ADPCM at 32 kbit/s in error-free conditions and it outperforms G.726 under error conditions. The codec can support a frame erasure rate up to 3% with slight degradation and performs adequately under tandeming conditions. The algorithm has been implemented on a single fixed-point DSP for the CCITT qualification test. It requires about 24 MIPS. >

45 citations


Proceedings ArticleDOI
08 Jun 1994
TL;DR: The paper summarizes the standardized PSI-CELP algorithm and the techniques used to improve speech quality, to reduce computational complexity, and to reduce memory requirements.
Abstract: A pitch synchronous innovation-CELP (PSI-CELP), proposed by NTT DoCoMo in 1993, is adopted as the Japanese half-rate PDC (personal digital cellular) speech codec standard. This algorithm is based on CELP (code excited linear prediction) with a pitch synchronized excitation source. It uses 3.45 kbit/s out of 5.6 kbit/s for speech coding and the remaining 2.15 kbit/s for error protection. The paper summarizes the standardized PSI-CELP algorithm. The techniques used to improve speech quality, to reduce computational complexity, and to reduce memory requirements are mentioned. A real time operating prototype based on this algorithm is also described. >

26 citations



Proceedings ArticleDOI
L. Cellario1, Daniele Sereno1, M. Giani, Peter Blöcher, K. Hellwig 
19 Apr 1994
TL;DR: This paper focuses on the design, implementation and testing of a variable rate (VR) CELP codec aimed to be used in the testbed of one RACE-II project: CoDiT (code division testbed).
Abstract: This paper focuses on the design, implementation and testing of a variable rate (VR) CELP codec aimed to be used in the testbed of one RACE-II project: CoDiT (code division testbed). The project has been conceived to demonstrate the potentiality of CDMA for the UMTS (universal mobile telecommunications system). Because of the flexibility permitted by CDMA to easily convey the information stream over a VR physical channel, the fixed-rate constraint has been removed from the speech coding algorithm design, in order to exploit the time-varying local character of speech. One major feature of the proposed algorithm is the possibility for the average rate to be either source-controlled or network-controlled. This is particularly appealing for cellular communications in order to cope with areas or cells with a high time-varying congestion. >

19 citations


Proceedings ArticleDOI
31 Oct 1994
TL;DR: Speech-assisted interpolation and speech-assisted coding of talking head video are proposed for solving problems related to lip synchronization in videotelephony and multimedia.
Abstract: We utilize speech information to improve the quality of audio/visual communications, such as videotelephony, videoconferencing, and multimedia. In particular, marriage of speech processing and image processing can solve problems related to lip synchronization. Two main techniques proposed in this paper are: speech-assisted interpolation and speech-assisted coding of talking head video. Audio/video sequences are presented to demonstrate our techniques. >

14 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: The ANT approach for the standardisation of the GSM half rate codec is presented and the use of error concealment at parameter level in the channel decoder as well as at signallevel in the speech decoder is discussed.
Abstract: The ANT approach for the standardisation of the GSM half rate codec is presented. The speech codec uses efficient scalar LSP quantization, joint optimization of adaptive and fixed codebook signals and works in two modes using different bit rates (5.7 and 6.15 kbit/s). The advantage of this dual mode scheme is an increase of the average error robustness without degradation of the intrinsic speech quality. The channel codec is based on rate compatible punctured codes. The channel decoder generates soft information for each decoded information bit. Bad frame detection is exclusively based on the exploitation of soft decision information. The use of error concealment at parameter level in the channel decoder as well as at signal level in the speech decoder is discussed. The complexity of speech and channel codec is 3.8 times the GSM full rate codec complexity. Subjective tests showed that the average Q-value of all test conditions is 1.7 db below the average GSM full rate Q-value. >

11 citations


Patent
21 Oct 1994
TL;DR: In this paper, the spectral components of the relevant short-time spectrum are formed for a data block with a given number of time input data, and the coded signal is formed on the basis of the spectral component of said data block using a psycho-acoustic model of the bit distribution for spectral components by quantifying and coding.
Abstract: In a process for the cascade coding and decoding of audio data, the spectral components of the relevant short-time spectrum are formed for a data block with a given number of time input data, the coded signal is formed on the basis of the spectral components of said data block using a psycho-acoustic model of the bit distribution for the spectral components by quantifying and coding, whereupon time output data are obtained by decoding at the end of each codec stage. To prevent a deterioration in the sound quality in codec cascades with a plurality of stages, an identification signal is added to the coded signal at an initial stage to mark the start of the data block, whereby the subsequent codec stages undertake the classification of the data blocks to be coded on the basis of said identification signal.

Proceedings ArticleDOI
13 Nov 1994
TL;DR: A new efficient adaptive bit-plane run-length coding of the wavelet transform coefficients of images outperforms the JPEG at low bit-rate and is designed for colour video conferencing applications.
Abstract: Novel wavelet transform based schemes for coding still images and image sequences are presented. The still image codec uses a new efficient adaptive bit-plane run-length coding of the wavelet transform coefficients of images. The main merit of this coding scheme is its simplicity requiring no training or storage of codebooks and it outperforms the JPEG at low bit-rate. For image sequence coding, a very low bit-rate sub-band motion estimation/compensation video codec designed for colour video conferencing applications is presented. A full system with buffer feedback control is designed. Simulation results of transmitting colour image sequences at 9.6 to 19.2 kbit/s are given. >

Proceedings ArticleDOI
08 Jun 1994
TL;DR: A speech coding negotiation schemes between the originating and terminating MS for the codec bypassed connection control and a non-voice service connection control scheme which is necessary to provideNon-voice services with the same quality as a full rate coding are proposed.
Abstract: Multi-rate speech coding is a considerable subject in digital cellular systems using highly efficient speech coding methods. The paper describes network functions for the application of the multi-rate speech coding network. As an example of multi-rate coding, functions of a half rate speech coding method in personal digital cellular (PDC) systems are mainly discussed. One of the functions is a codec bypassed connection control to reduce the degradation of voice quality in communication from a mobile station to a mobile station. We propose speech coding negotiation schemes between the originating and terminating MS for the codec bypassed connection control. We also propose a non-voice service connection control scheme which is necessary to provide non-voice services with the same quality as a full rate coding. >

Journal ArticleDOI
TL;DR: This emerging set of requirements for the next generation of very low-rate speech coding, based on the performance and characteristics of a wireline-quality 4-kb/s speech coding algorithm for network applications, is presented.
Abstract: The standardization of high-quality speech coding has intensified. In parallel, a number of novel applications are placing new demands on transmission efficiency and quality. In response to such challenges, standardization bodies have begun the definition of requirements for the next generation of very low-rate speech coding. Taking a lead in these activities, ANSI committee T1A1 and the ITU-T initiated the definition of the performance and characteristics of a wireline-quality 4-kb/s speech coding algorithm for network applications. This emerging set of requirements is presented. >

Journal ArticleDOI
TL;DR: A novel still image codec is presented that uses an efficient adaptive bit-plane run-length coding on the wavelet transform coefficients of images that outperforms the standard JPEG codec for low bitrate applications.
Abstract: A novel still image codec is presented that uses an efficient adaptive bit-plane run-length coding on the wavelet transform coefficients of images. The main attraction of this coding scheme is its simplicity in which no training and storage of codebooks are required. Also its high visual quality at high compression ratio outperforms the standard JPEG codec for low bitrate applications. A comparative performance between the new codec and the JPEG codec is given.

Proceedings ArticleDOI
08 Jun 1994
TL;DR: A channel coding method for a half-rate speech codec that includes new error control techniques that effectively integrates a speech codec with a channel codec and employs temporal correlation in encoded speech data as well as the redundancy of convolutional code, to improve its decoding performance.
Abstract: A channel coding method for a half-rate speech codec is proposed. This method includes new error control techniques that effectively integrates a speech codec with a channel codec. It utilizes bit swelling technique for an efficient unequal error protection. Highly significant bits are swollen and combined with other significant bits. Combined bits are coded with a convolutional code. In addition, the Viterbi decoder employs temporal correlation in encoded speech data as well ail the redundancy of convolutional code, to improve its decoding performance. The paper shows the error correction ability for this method is higher than that for a conventional method. A subjective test shows that the speech quality for a half-rate speech codec combined with the proposed channel codec is considerably better than that for the full-rate VSELP codec on a fading channel. >

Proceedings ArticleDOI
19 Apr 1994
TL;DR: The design of a speech and channel codec for the North American TDMA digital cellular half rate channel which meets the eligibility requirements of the Telecommunications Industry Association (TIA) contest is described.
Abstract: The design of a speech and channel codec for the North American TDMA digital cellular half rate channel which meets the eligibility requirements of the Telecommunications Industry Association (TIA) contest is described. Its design objectives are largely shaped by the selection criteria. The codec has been implemented in real time on two Tiger C40 boards. Informal listening tests using this real-time hardware seem to indicate that this candidate is in close proximity to the full rate standard. >

Proceedings ArticleDOI
01 Jan 1994
TL;DR: The statistical analysis of the outcome of subjective tests on the basic audio quality (BAQ) of several 5-channel audio bitrate reduction systems based on the ITU-R-Draft Recommendation “Methods for the Subjective Assessment of Small Impairments in Audio Systems including Multichannel Sound Systems” is discussed.
Abstract: The purpose of this paper is to discuss the statistical analysis of the outcome of subjective tests on the basic audio quality (BAQ) of several 5-channel audio bitrate reduction systems. The test conditions are based on the ITU-R-Draft Recommendation “Methods for the Subjective Assessment of Small Impairments in Audio Systems including Multichannel Sound Systems”. The report on the MPEG test contains a complete explanation and presentation of the results. The aims of the tests were the verification of the recent audio quality of 8 codec systems under test and the comparison between two types of codec systems: MPEG-codecs, backwards compatible (BC): Layer II and Layer III at 320 kbit/s; and non-backwards-compatible codecs (NBC): Dolby (denoted by NBC1) and AT&T (denoted by NBC2) at 320 kbit/s.


01 Jan 1994
TL;DR: An integrated image compression digital signal processor and dedicated hardware for motion estimation were developed, capable of both encoding and decoding the MPEG2-based algorithm by changing program on the same chip.
Abstract: A digital video codec was developed. To make the video codec practical, economical and to get better quality, an integrated image compression digital signal processor and dedicated hardware for motion estimation were developed. The newly developed image compression digital signal processor (called VDSP2) integrates four different types of processors in the architecture. These processors can operate in parallel. The VDSP2 is capable of both encoding and decoding the MPEG2-based algorithm by changing program on the same chip. The newly developed dedicated hardware for motion estimation consists of the two-pixel precision estimatior and the full & half pixel precision estimator. The codec is capable of processing h4PEG2 main profile at main level in real-time at broadcast resolution.

Patent
21 Oct 1994
TL;DR: In this paper, the spectral components of the relevant short-time spectrum are formed for a data block with a given number of time input data, and the coded signal is formed on the basis of the spectral component of said data block using a psycho-acoustic model of the bit distribution for spectral components by quantifying and coding.
Abstract: In a process for the cascade coding and decoding of audio data, the spectral components of the relevant short-time spectrum are formed for a data block with a given number of time input data, the coded signal is formed on the basis of the spectral components of said data block using a psycho-acoustic model of the bit distribution for the spectral components by quantifying and coding, whereupon time output data are obtained by decoding at the end of each codec stage. To prevent a deterioration in the sound quality in codec cascades with a plurality of stages, an identification signal is added to the coded signal at an initial stage to mark the start of the data block, whereby the subsequent codec stages undertake the classification of the data blocks to be coded on the basis of said identification signal.

Proceedings ArticleDOI
14 Nov 1994
TL;DR: This paper discusses the real-time implementation of a high-quality audio codec at a bit rate below 150 kbit/s per monophonic channel on a 24-bit fixed-point DSP (Motorola DSP56002) based hardware.
Abstract: This paper discusses the real-time implementation of a high-quality audio codec at a bit rate below 150 kbit/s per monophonic channel on a 24-bit fixed-point DSP (Motorola DSP56002) based hardware. The algorithm is an adaptive modified discrete cosine transform coding technique. Known human hearing characteristics are exploited in the adaptive bit allocation scheme. Both the hardware and software configurations are described, along with measured execution times and program and data memory usages. A high fidelity quality suitable for consumer applications has been achieved. >


Proceedings ArticleDOI
19 Apr 1994
TL;DR: A low bit-rate speech codec has been developed and the speech coder employs a PCELP (CELP with pulse codebook) algorithm, which uses a pulse-train excitation codebook to enhance the quality of voiced speech.
Abstract: A low bit-rate speech codec has been developed. The speech coder employs a PCELP (CELP with pulse codebook) algorithm, which uses a pulse-train excitation codebook to enhance the quality of voiced speech. The computational complexity for the pulse codebook search is greatly reduced by using truncated impulse responses of a weighted synthesis filter. An improved Viterbi decoding method using error detecting code information instead of tail bits has also been developed to increase the effective error-correction coding rate. A 5.6 kb/s speech codec has been implemented by incorporating the PCELP speech coding algorithm with the improved Viterbi decoding. Formal listening tests show that the synthetic speech quality of this codec is equivalent to that of 5.4-bit /spl mu/-law PCM in the absence of channel errors. >

01 Jan 1994
TL;DR: This paper acts as an introduction to speaker dependent coders highlighting their associated problems and possible techniques which may be used to accomplish single speaker coding.
Abstract: Speech coding is still a popular area for research and has received much interest over the past two decades. The thirst for low bit rate speech coding algorithms continues, fueled by the ever increasing number of subscribers to the mobile/domestic communication networks. This has biased the research conducted for speech coding towards this type of application, leaving less useful (or profitable?) applications in the background. Until the mid-eighties good quality synthesized speech at low bit rates eluded the speech coding fraternity. Even from the early days of speech coding, concentration on producing speech coders which could perform well when coding a variety of different speakers had been the main concern-little attention had been paid to a coder which could code just a single voice. The applications where such a coder could be employed are limited, but this does not mean that they are unimportant. This paper acts as an introduction to speaker dependent coders highlighting their associated problems. Possible techniques which may be used to accomplish single speaker coding are outlined and applications are described. >

Proceedings ArticleDOI
31 Oct 1994
TL;DR: The paper reviews current research into achieving higher speech quality at lower bit rates under a variety of environmental and transmission conditions and provides a short-range perspective on where significant progress is expected in the short-term.
Abstract: Personal communication systems will be judged primarily on the quality of speech communication they provide. The voice-call capacity depends directly on the bit-rate required to achieve the quality objectives of the specific service. To provide speech quality on wireless systems that is comparable to that attained on today's wireline systems, at least 16 kb/s are required. New algorithms are being considered for a standard to transmit speech at 8 kb/s in the presence of modest transmission errors. Mobile systems with slightly lower quality requirements in service today employ 8 kb/s (IS-54 and IS-95) and 13 kb/s (GSM), respectively. Efforts are under way to halve these values and thereby double the capacity. The paper reviews current research into achieving higher speech quality at lower bit rates under a variety of environmental and transmission conditions. It provides a short-range perspective on where significant progress is expected in the short-term. >

01 Jan 1994
TL;DR: In this article, two speech-coding algorithms at around 8 kbit/s were discussed, using pre-selection in the codebook search and training for the conjugate structured random codebook.
Abstract: This paper discusses two speech-coding algorithms at around 8 kbit/s. These algorithms both use pre-selection in the codebook search and training for the conjugate structured random codebook. One has short-delay time that can dispense with echo-control equipment. This algorithm is robust against channel errors by using LPC analysis separated from pitch excitation, although LPC parameters are derived from the locally decoded signal. The other algorithm has medium-delay time and meets all the requirements of the ITU-T (formerly CCITT) 8 kbit/s speech coding standard. The performance of these codecs is evaluated using signal to noise ratio (SNR), pair comparison tests with a modulated noise reference unit (MNRU), and Mean Opinion Score (MOS). The major applications of these codecs are personal handy phone systems and FPLMTS.

Proceedings ArticleDOI
A. Young1
03 Aug 1994
TL;DR: The requirements and selection criteria of software C ODEC algorithms in DVC are presented, and an overview of CODEC algorithms for DVC, such as Differential Pulse Code Modulation (DPCM), Discrete Cosine Transform (DCT), Wavelet, Vector Quantization, Fractal, Block Truncation Coding and Hybrid coding are presented.
Abstract: The computational power of desktop computers (e.g. Intel 486 PC) has reached the stage where software solutions for desktop videoconferencing (DVC) are worth considering. Here, coder/decoder (CODEC) refers to video compression/decompression. In this paper, the requirements and selection criteria of software CODEC algorithms in DVC are presented. An overview of CODEC algorithms for DVC, such as Differential Pulse Code Modulation (DPCM), Discrete Cosine Transform (DCT), Wavelet, Vector Quantization, Fractal, Block Truncation Coding and Hybrid coding will give us insight into their feasibility in DVC, particularly when implemented in software. Some of the driving factors in the proliferation of DVC are: cost, platform, interoperability, communications bandwidth, availability and integrity, performance and quality, and last but not the least, collaborative applications. Most of these factors, especially interoperability, are important in choosing a software CODEC algorithm and so the standard H.261 is worth paying attention to. A quantitative analysis of some CODEC techniques is presented in a matrix.