scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 1997"


Journal Article
TL;DR: Common features as well as differences between MPEG-I and MPEG-2 audio, other current audio coding systems currently in use, and the new work for MPEG-4 audio will be presented.
Abstract: Since 1988 MPEG has been working on the standardization of high-quality low-bit rate audio coding. In 1992 and 1994 the MPEG-I and MPEG-2 audio standards were completed. Current work in MPEG includes the MPEG-2 advanced audio coding (MPEG-2 AAC) 1 of stereo or multichannel sound material and the audio part of MPEG-4. Common features as well as differences between MPEG-I and MPEG-2 audio, other current audio coding systems currently in use, and the new work for MPEG-2 AAC and MPEG-4 audio will be presented.

151 citations


Proceedings ArticleDOI
21 Apr 1997
TL;DR: The GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system provides wireline quality not only for error-free conditions but also for the most typical error conditions.
Abstract: This paper describes the GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system. The GSM EFR codec has been jointly developed by Nokia and University of Sherbrooke. It provides speech quality at least equivalent to that of a wireline telephony reference (32 kbit/s ADPCM). The EFR codec uses 12.2 kbit/s for speech coding and 10.6 kbit/s for error protection. Speech coding is based on the ACELP algorithm (algebraic code excited linear prediction). The codec provides substantial quality improvement compared to the existing GSM full rate and half rate codecs. The old GSM codecs lack wireline quality even in error-free channel conditions, while the EFR codec provides wireline quality not only for error-free conditions but also for the most typical error conditions. With the EFR codec, wireline quality is also sustained in the presence of background noise and in tandem connections (mobile to mobile calls).

84 citations


Proceedings ArticleDOI
03 Jun 1997
TL;DR: A technique to perform speech recognition directly from audio files encoded using the MPEG/Audio coding standard is described, and results based on the recognition of a speaker-dependent, small vocabulary, and continuously spoken sentences shows accuracy as high as 99%.
Abstract: A technique to perform speech recognition directly from audio files encoded using the MPEG/Audio coding standard is described. The technique works in the compressed domain and does not require the MPEG/Audio file to be decompressed. Only the encoded subband samples are extracted and processed for training and recognition. The underlying speech recognition engine used is based on the hidden Markov model. The technique is applicable to layers I and II of MPEG/Audio, and training under one layer can be used to recognize the other. Results based on the recognition of a speaker-dependent, small vocabulary, and continuously spoken sentences shows accuracy as high as 99% using this technique.

62 citations


Proceedings ArticleDOI
21 Apr 1997
TL;DR: A new approach for optimum estimation of the speech codec parameters is developed, which can be applied to any speech codec standard if bit reliability information is provided by the demodulator or by the channel decoder.
Abstract: In digital mobile communication systems there is the need for reducing the subjective effects of residual bit errors which have not been eliminated by channel decoding by the use of error concealment techniques. Due to the fact that most standards do not specify these algorithms bit exactly, there is room for new solutions to improve the speech quality. This article develops a new approach for optimum estimation of the speech codec parameters. It can be applied to any speech codec standard if bit reliability information is provided by the demodulator (e.g. DECT), or by the channel decoder (e.g. soft-output Viterbi algorithm-SOVA in GSM). The proposed method includes an inherent muting mechanism leading to a graceful degradation of speech quality in case of adverse transmission conditions. Particularly the additional exploitation of the residual source redundancy, i.e. some a priori knowledge about the codec parameters gives a significant enhancement of the output speech quality. In the case of an error free channel, bit exactness as required by the standards can be preserved.

55 citations



Proceedings ArticleDOI
21 Apr 1997
TL;DR: The enhanced full rate (EFR) speech codec that has recently been standardised for the North American TDMA digital cellular system (IS-136) offers speech quality close to that of wireline telephony and provides a substantial improvement over the quality of the current speech channel.
Abstract: In this paper, we describe the enhanced full rate (EFR) speech codec that has recently been standardised for the North American TDMA digital cellular system (IS-136). The EFR codec, specified in the IS-641 standard, has been jointly developed by Nokia and University of Sherbrooke. The codec consists of 7.4 kbit/s speech (source) coding and 5.6 kbit/s channel coding (error protection) resulting in a 13.0 kbit/s gross bit-rate in the channel. Speech coding is based on the ACELP algorithm (algebraic code excited linear prediction). The codec offers speech quality close to that of wireline telephony (G.726 32 kbit/s ADPCM used as a wireline reference) and provides a substantial improvement over the quality of the current speech channel. The improved speech quality is not only achieved in error-free conditions, but also in typical cellular operating conditions including transmission errors, environmental noise, and tandeming of speech codecs.

44 citations


Patent
09 Dec 1997
TL;DR: In this article, a portable multimedia data input/output processor consisting of audio codec for compressing and decompressing audio data, video codec controller and multimedia processor for transmitting audio data to wireless communication controller and video data to video codec and to graphic processor.
Abstract: Potable multimedia terminal which is small and consumes low power, can process a large quantity of multimedia data such as video and audio data. Portable multimedia data input/output processor can be made smaller by using a pen as an input device and can also process a large quantity of multimedia data at a high speed by adopting a PCI bus as a local bus of a system. To retrieve, compress, and decompress multimedia data, main components of this portable multimedia data input/output processor are comprised of audio codec for compressing and decompressing audio data, video codec controller for compressing and decompressing video data, and multimedia processor for transmitting audio data to wireless communication controller and video data to video codec controller and to graphic processor. The method for retrieving multimedia data includes steps of receiving data, de-interleaving received data into audio, video, and graphic data, decompressing the data, and outputting the data to output device. The method for compressing data includes steps of inputting video data to video codec controller, compressing video and audio data at video codec controller and audio codec, interleaving the compressed data, and transmitting them to a remote system. The steps to decompress data are in reverse to the steps to compress data.

42 citations


Proceedings ArticleDOI
21 Apr 1997
TL;DR: This paper proposes a speech and audio coder which operates at 1 bit/sample, namely an 8 kbit/s coder for 8 kHz sampling or a 16 k bit/sCoder for 16 kHz sampling, inherited from a Twin VQ high-quality audio coding scheme.
Abstract: This paper proposes a speech and audio coder which operates at 1 bit/sample, namely an 8 kbit/s coder for 8 kHz sampling or a 16 kbit/s coder for 16 kHz sampling. The basic structure is inherited from a Twin VQ (transform domain weighted interleave vector quantization) high-quality audio coding scheme. A periodical component extraction scheme is newly added to the quantization of the MDCT coefficients. This scheme is found to be effective for reducing distortion and improving the robustness against channel errors. The qualities for music signals at 8 kbit/s are better than those of G.729 at the same bit rates, while they are worse for clean speech. The qualities at 16 kbit/s are comparable to or better than those of G.722 at 48 kbit/s.

34 citations


Journal ArticleDOI
TL;DR: How selective optimization of the codec structure allows robust performance using limited resources is discussed, some of the problems inherent in translating the abstractions of the standard into assembly code are highlighted, and further investigations of real-time implementations of communications standards are pointed towards.
Abstract: The MPEG-1 audio standard (ISO/IEC 11172-3) establishes guidelines for the compression of high-quality digital audio signals. The standard dictates the function of an encoder/decoder pair (codec), leaving the form intentionally vague to allow for competing implementations. A typical approach to real-time operation is to design an application-specific integrated circuit (ASIC) dedicated to encoding, decoding, or both. We present an alternative codec that makes use of the general-purpose digital signal processing (DSP) chips that are now common in multimedia-capable workstations and personal computers. We discuss how selective optimization of the codec structure allows robust performance using limited resources, highlight some of the problems inherent in translating the abstractions of the standard into assembly code, and point towards further investigations of real-time implementations of communications standards.

30 citations


PatentDOI
TL;DR: An audio coder/decoder that is suitable for real-time applications due to reduced computational complexity, and a novel adaptive sparse vector quantization (ASVQ) scheme and algorithms for general purpose data quantization, which provides low bit-rate compression for music and speech, while being applicable to higher bit- rate audio compression.
Abstract: An audio coder/decoder ("codec") that is suitable for real-time applications due to reduced computational complexity, and a novel adaptive sparse vector quantization (ASVQ) scheme and algorithms for general purpose data quantization. The codec provides low bit-rate compression for music and speech, while being applicable to higher bit-rate audio compression. The codec includes an in-path implementation of psychoacoustic spectral masking, and frequency domain quantization using the novel ASVQ scheme and algorithms specific to audio compression. More particularly, the inventive audio codec employs frequency domain quantization with critically sampled subband filter banks to maintain time domain continuity across frame boundaries. The input audio signal is transformed into the frequency domain in which in-path spectral masking can be directly applied. This in-path spectral masking usually results in sparse vectors. The ASVQ scheme is a vector quantization algorithm that is particularly effective for quantizing sparse signal vectors. In the preferred embodiment, ASVQ adaptively classifies signal vectors into six different types of sparse vector quantization, and performs quantization accordingly. The ASVQ technique applies to general purpose data quantization as well as to quantization in the context of audio compression. The invention also includes a "soft clipping" algorithm in the decoder as a post-processing stage. The soft clipping algorithm preserves the waveform shapes of the reconstructed time domain audio signal in a frame- or block-oriented stateless manner while maintaining continuity across frame or block boundaries. The invention includes related methods, apparatus, and computer programs.

29 citations


Patent
Kazuyoshi Kuwahara1, Koichi Kaji1
05 Feb 1997
TL;DR: In this paper, a sound codec has an encoded signal input/output terminal, which is selectively connected to a modem codec, and a microphone and a speaker are used as a transceiver of the speaker phone.
Abstract: A sound codec has an encoded signal input/output terminal. This terminal is selectively connected to a modem codec. Due to this selective connection, a microphone and a speaker, both connected to a sound signal input/output terminal of the sound codec, are used as a transceiver of the speaker phone, and the sound codec is controlled to function as a speaker phone codec.

Proceedings ArticleDOI
21 Apr 1997
TL;DR: A new approach to shape the coding noise in speech and audio coders, called spectral amplitude warping (SAW), consists essentially of a pre- and post-processing which apply a nonlinear transformation to the signal short-term spectrum prior to, and after, encoding.
Abstract: In this paper, we present a new approach to shape the coding noise in speech and audio coders. The approach, called spectral amplitude warping (SAW), consists essentially of a pre- and post-processing which apply a nonlinear transformation to the signal short-term spectrum prior to, and after, encoding. Since it is possible to view SAW as a separate entity from the coder, the noise shaping capability of an existing coder can be improved without modifying the coder itself. Using SAW as a pre- and post-process to the G.722 wideband speech coding standard, it was found in an informal listening test that the quality of the 64 kb/s operating mode can be achieved at only 48 kb/s. The price to be paid is an additional delay.

Patent
04 Apr 1997
TL;DR: In this article, a video transmission apparatus comprises a video codec for compression-encoding video data to be transmitted and decompression-decoding audio data as received; a first audio codec for decoding first audio data and then decoding second audio data.
Abstract: A video transmission apparatus comprises video codec for compression-encoding video data to be transmitted and decompression-decoding video data as received; a first audio codec for compression-encoding first audio data to be transmitted and decompression-decoding first audio data as received; a second audio codec for compression-encoding second audio data to be transmitted and decompression-decoding second audio data as received; a multiplexer-demultiplexer for including a three-channel buffer memory and processor; and an interface connectable to communications satellite network and ground network. In the multiplexer-demultiplexer, video data-first audio data and second audio data to be transmitted are multiplexed and converted to a packet based on a predetermined communication format, and packet as received is separated into video data-first audio data, and second audio data.

Proceedings ArticleDOI
21 Apr 1997
TL;DR: The WLP scheme is extended for processing complex valued signals (CWLP), and three different methods of converting a stereo signal to one complex valued signal are introduced.
Abstract: Bark-scale warped linear prediction (WLP) is a very potential core for a monophonic perceptual audio codec. In the current paper the WLP scheme is extended for processing complex valued signals (CWLP). Three different methods of converting a stereo signal to one complex valued signal are introduced. The philosophy behind the coding scheme is to integrate some aspects of modern wideband audio coding (e.g. perceptuality and stereo signal processing) into one computational element in order to find a more holistic and economic way of processing.

Proceedings ArticleDOI
08 Jun 1997
TL;DR: The GSM EFR speech codec provides substantial quality improvement compared to the current GSM full rate (FR) and half rate (HR) codecs, while the old GSM codecs lag far behind wireline quality even in error-free conditions.
Abstract: This paper describes the enhanced full rate (EFR) speech codec that has been standardized by ETSI for the GSM mobile communications system in 1996. The codec was developed jointly by Nokia and the University of Sherbrooke. It operates at 12.2 kbit/s speech coding (source coding) bit-rate and provides speech quality equivalent to that of wireline telephony (G.726 32 kbit/s ADPCM). The algorithm is based on the algebraic code-excited linear prediction (ACELP) technology, using 20 ms speech frames. The GSM EFR speech codec provides substantial quality improvement compared to the current GSM full rate (FR) and half rate (HR) codecs. The old GSM codecs lag far behind wireline quality even in error-free conditions, while the EFR codec provides wireline quality also for the most typical error conditions. With the EFR codec, wireline quality is also sustained in the presence of background noise and in tandem connections (mobile-to-mobile calls). The codec was defined using fixed-point basic operators with complexity estimated at 18 WMOPS (below that of the GSM half-rate codec).

Patent
29 Dec 1997
Abstract: A coding method and a coder of a D-VHS codec system for recording/reproducing a high-definition picture and a high-fidelity sound is disclosed In the recording of digital data which is compressed or processed, the coding method and the coder of the D-VHS codec system record digital data with forward error correction addition information in the format of a track for a forward error correction Consequently, an enormous quantity of computation which cannot be processed by programs, is performed in real time by an optimal data flow operation process method and the compressed or processed digital data is processed at a high speed Also, a hardware configuration of the codec which conventionally includes two or three pieces of circuit board implements one custom-made semiconductor chip and cuts a unit cost of the manufacturing of the D-VHS codec

01 Jan 1997
TL;DR: This package is based on the UNIX operating system and is dedicated to the standardization committees and their efforts in bringing the advancement of technology to everyone.
Abstract: This document is provided for general informational use. No responsibility is assumed by the authors with regard to the accuracy of the information contained herein. The documentation described herein may be used, copied, or transcribed without fee or prior agreement provided that acknowledgement or reference is made of the source. Sun is a trademark of Sun Microsystems, Inc. UNIX is a trademark of AT&T. DECstation is a trademark of Digital Equipment Corporation. This package is based on the UNIX operating system. We appreciate any comments or corrections-they can be sent to Andy C. Hung at achung@cs.stanford.edu. We cannot guarantee that any bugs will be corrected however. Funded by the Defense Advanced Research Projects Agency. I am especially grateful to Hewlett Packard and Storm Technology for their financial support during the earlier stages of codec development. Any errors in the code and documentation are my own. The following people are acknowledged for their advice and assistance. We apologize if we have missed anyone from this list. Thanks, one and all. Dedicated to the standardization committees and their efforts in bringing the advancement of technology to everyone.

Proceedings Article
01 Jan 1997
TL;DR: Two types of scalable codec are proposed: a separate one and a composite one that provides high quality for telephone-band speech and an additional adaptive codebook for predicting pitch, while maintaining scalability with the G.729 codec.
Abstract: A wideband speech scalable codec is proposed for improving the flexibility in telecommunication networks. This coder is scalable with G.729 (ITU 8-kbit/s standard). Its decoder can process the incoming bitstream at three bit rates (8, 12, and 16 kbit/s) and provide a choice of speech types (wideband and telephone-band). The codec has a split-band structure, where both bands are coded by analysis-by-synthesis techniques. This paper proposes two types of scalable codec: a separate one and a composite one. It also proposes a new method (an additional adaptive codebook) for predicting pitch, while maintaining scalability with the G.729 codec. Subjective testing for wideband speech showed that the quality of the proposed codec at 16-kbit/s is equivalent to that of the 64-kbit/s G.722, and at 12-kbit/s is better than that of the 48-kbit/s G.722. Testing has further demonstrated that the 8-kbit/s coder provides high quality for telephone-band speech.

Proceedings ArticleDOI
07 Sep 1997
TL;DR: A novel scheme for hybrid coding of speech signals using the excitation/filter model used extensively for speech coding for the transitory portions of the speech signal which cannot be adequately represented by either model.
Abstract: We present a novel scheme for hybrid coding of speech signals. This hybrid codec utilizes the excitation/filter model used extensively for speech coding. Similar to other modern vocoders, voiced speech is represented by a frequency domain harmonic model and unvoiced speech by a "noise-like" excitation. However, an analysis-by-synthesis time domain scheme is employed for the transitory portions of the speech signal which cannot be adequately represented by either model. Switching between the time domain and the frequency domain models requires careful handling of the reconstructed linear phase. The structure of a 4 kbps speech codec, based on the hybrid model, is outlined. The new codec shows promise of achieving toll quality at 4 kbps.

01 Oct 1997
TL;DR: In this paper, the authors considered using turbo channel coding to encode the 8 kbits/s output from the G.729 speech codec and transmitted the source and channel coded bits are then transmitted using a wideband Orthogonal Frequency Division Multiplexing (OFDM) system in the framework of the Mode-I FRAMES proposals.
Abstract: In this paper we have considered using turbo channel coding to encode the 8 kbits/s output from the G.729 speech codec. The source and channel coded bits are then transmitted using a wideband Orthogonal Frequency Division Multiplexing (OFDM) system in the framework of the Mode-I FRAMES proposals [I]. We illustrate the benefits of using OFDM with channel coding to alleviate some of the problems associated with wideband fading channels. Furthermore, we discuss how OFDM can be used in conjunction with the G.729 speech codec and half rate channel coding in order to utilise one speechldata FRAMES subbburst. Finally some of the issues and problems associated with using turbo coded OFDM in speech transmission systems are considered using the system characterised in Table 1. In Figure 6 a channel SNR of 6dB appears sufficiently high under the stipulated system conditions for near-unimpaired speech transmission.

Proceedings ArticleDOI
07 Sep 1997
TL;DR: The proposed method exploits the residual source redundancy and includes an inherent muting mechanism leading to a graceful degradation of speech quality in case of adverse transmission conditions in the case of an error free channel and bit exactness as required by the standards can be preserved.
Abstract: In digital mobile communication systems there is a need for error concealment techniques to reduce the subjective effects of residual bit errors which have not been eliminated by channel decoding. This contribution presents a new approach for optimum estimation of speech codec parameters. It can be applied to any speech codec standard if reliability information about the channel decoded bits is available (e.g., soft-output Viterbi algorithm-SOVA). The proposed method exploits the residual source redundancy and includes an inherent muting mechanism leading to a graceful degradation of speech quality in case of adverse transmission conditions. In the case of an error free channel, bit exactness as required by the standards can be preserved. This approach is applied here to the GSM full rate codec.

Proceedings Article
01 Jan 1997
TL;DR: A rule-based system maps segment identity and prosodic information to parameters suitable for driving a parallel formant speech synthesiser and Acoustic segment Hidden Markov Models (HMMs) are shown to perform as well as conventional phone HMMs during recognition.
Abstract: This paper describes a system for speech coding designed to operate at 300 bits/sec and below. A continuous speech recogniser is used to transcribe incoming speech as a sequence of sub-word units termed acoustic segments. Prosodic information is combined with segment identity to form a serial data stream suitable for transmission. A rule-based system maps segment identity and prosodic information to parameters suitable for driving a parallel formant speech synthesiser. Acoustic segment Hidden Markov Models (HMMs) are shown to perform as well as conventional phone HMMs during recognition. A segment error rate of 3.8 % was achieved in a speaker-dependent, task-dependent configuration. An average data rate of 262 bits/sec was obtained. Speech from the synthesiser was better than obtainable from a purely textual representation though not as good as 2400 bit/sec Linear Predictive Coding (LPC) vocoded speech.

Proceedings ArticleDOI
21 Apr 1997
TL;DR: Informal subjective testing (MOS) indicates that the proposed variable-rate CELP codec, at an average rate of less than 3.2 kb/s, achieves better quality than fixed rate standard codecs with rates in the range 4-4.8kb/s.
Abstract: This paper presents a variable-rate CELP codec which achieves good communications speech quality at an average rate of about 3 kb/s. The codec operates as a source-controlled variable rate coder with rates of 4.9 kb/s for voiced and transition sounds, 3.0 kb/s for unvoiced sounds and 670 b/s for silent frames. New techniques used in the codec include prediction of the fixed codebook target vector and joint optimization of the adaptive and fixed codebook search. The prediction of the fixed codebook target vector is based on fixed codebook selections in previous subframes and a running estimate for the fundamental frequency. Informal subjective testing (MOS) indicates that the proposed codec, at an average rate of less than 3.2 kb/s, achieves better quality than fixed rate standard codecs with rates in the range 4-4.8 kb/s.

Proceedings ArticleDOI
07 Sep 1997
TL;DR: This paper describes a wideband speech/audio codec at 16/24 kbit/s with 10 ms frames that shows that for speech signals, the codec performance is equivalent to G.722 at 48 and 56 k bit/s, respectively.
Abstract: This paper describes a wideband speech/audio codec at 16/24 kbit/s with 10 ms frames. The algorithm uses an ACELP model at 16 kbit/s and a switched ACELP/TCX model at 24 kbit/s. Adaptive preemphasis is used to improve the performance at high frequencies and a hybrid forward/backward LP filter is used to improve the performance of stationary signals. Subjective tests showed that for speech signals, the codec performance at 16 and 24 kbit is equivalent to G.722 at 48 and 56 kbit/s, respectively. For music signals, the performance of the codec at 24 kbit/s was equivalent to that of G.722 at 48 kbit/s.

Proceedings ArticleDOI
Weisi Lin, K. Goh, B.J. Tye, G. Powell, T. Ohya, S. Adachi 
26 Oct 1997
TL;DR: The implementation scheme and additionally, some programming techniques to exploit the internal architecture of the multimedia video processor (MVP) and some strategies introduced to reduce the computational complexity whilst minimizing degradation of picture quality are described.
Abstract: The H.263 video codec is an improved version of H.261 and is a constituent part of the H.324 video-conferencing codec suite. A system implementing a real-time ITU-T H.263 codec using a multi-processor DSP system has been configured by the authors. This paper describes the implementation scheme and additionally, some programming techniques to exploit the internal architecture of the multimedia video processor (MVP), and some strategies introduced to reduce the computational complexity whilst minimizing degradation of picture quality. The effects of these strategies have been tested and the codec performance are reported. The video codec is integrated with the G.723.1, H.223 and H.245 to form a full-duplex video-conferencing system.

Proceedings ArticleDOI
Pasi Ojala1
21 Apr 1997
TL;DR: A source controlled variable-rate CELP type speech codec that produces toll quality speech equal to that of the 32 kbit/s ADPCM (G.726) standard.
Abstract: This paper presents a source controlled variable-rate CELP type speech codec. First, a voice activity detection block distinguishes active speech frames from silence and background noise. The active speech is further classified into voiced and unvoiced frames. The voiced frames have variable bit-rate pitch-lag quantization based on the characteristics of the speech, whereas the unvoiced frames are coded without pitch information. A variable bit-rate fixed codebook excitation with a variable number of excitation pulses is determined for each speech frame. The performance of the linear analysis part of the codec as well as the input speech characteristics determine the excitation bit-rate. The average bit-rate of the codec is around 7.0 kbit/s for active speech, and the overall bit-rate ranges from 0 to 7.85 kbit/s. The described variable-rate codec produces toll quality speech equal to that of the 32 kbit/s ADPCM (G.726) standard.



Proceedings ArticleDOI
02 Dec 1997
TL;DR: This paper proposes an alternative approach to mixed speech/music coding, which uses a discriminator to separate music signals from speech, and codes them with the G.722 coder and a G. 723.1-based speech coder, respectively.
Abstract: In multimedia applications such as videoconferencing, users are demanding higher quality speech/audio transmission than the POTS can offer. 7kHz wideband speech/audio offers a good compromise between bandwidth and sound quality. It improves the intelligibility and naturalness of speech and adds a feeling of transparent communication. Currently the only existing international standard for coding such signals is the G.722 wideband speech/audio coder. While its coding quality is satisfactory it leaves much to be desired with its bit rate. The CELP-based approach has been very successful in telephone bandwidth speech coding, but is not suitable for coding non-speech signals because of the assumed signal production model. This paper proposes an alternative approach to mixed speech/music coding, which uses a discriminator to separate music signals from speech, and codes them with the G.722 coder and a G.723.1-based speech coder, respectively. Simulations shows very promising results.

Proceedings ArticleDOI
07 Sep 1997
TL;DR: The design, implementation and performance of a high quality low bit rate speech codec for wireless communication is presented, based on the CELP model, resulting in robustness to transmission errors and high quality across changing speech levels and background noise conditions.
Abstract: The design, implementation and performance of a high quality low bit rate speech codec for wireless communication is presented. The codec is based on the CELP model. Generalized analysis-by-synthesis, algebraic fixed codebooks, and multistage LSF techniques are used, resulting in robustness to transmission errors and high quality across changing speech levels and background noise conditions. The bit allocations for the quantization of LSF, pitch and the excitation are chosen in a mode specific manner based on a robust mode classification scheme. A 4.8 kb/s version has been implemented and subjective tests show speech quality that is equivalent to or better than most cellular standard codecs. Performance is also consistent across speech levels and transmission errors.