scispace - formally typeset
Search or ask a question

Showing papers by "Allen Gersho published in 1993"


Proceedings ArticleDOI
27 Apr 1993
TL;DR: Simulation results demonstrate that, with picture-adaptive quantization tables designed by the proposed algorithm, the JPEG DCT (discrete cosine transform) coder is able to compress images with better rate-distortion performance than that achievable with conventional empirically designed quantization table.
Abstract: A recursive algorithm is presented for generating quantization tables in JPEG (Joint Photographic Experts Group) baseline coders from the actual statistics of the input image. Starting from a quantization table with large step sizes, corresponding to low bit rate and high distortion, one entry of the quantization table is updated at a time so that, at each step, the ratio of decrease in distortion to increase in bit rate is approximately maximized. This procedure is repeated until a target bit rate is reached. Simulation results demonstrate that, with picture-adaptive quantization tables designed by the proposed algorithm, the JPEG DCT (discrete cosine transform) coder is able to compress images with better rate-distortion performance than that achievable with conventional empirically designed quantization tables. >

130 citations


Proceedings ArticleDOI
27 Apr 1993
TL;DR: An improved version of the voice activity detection scheme used in the GSM (Group Special Mobile) full-rate cellular standard tracks the bursty character of active speech.
Abstract: A speech coder based on variable rate phonetic segmentation (VRPS), operating at an average rate of 3 kbit/s and applicable to code division multiple access (CDMA) digital cellular systems, is presented. An improved version of the voice activity detection scheme used in the GSM (Group Special Mobile) full-rate cellular standard tracks the bursty character of active speech. Each frame of active speech is classified into one of a set of four phonetic categories. A distinct coding configuration and bit-rate are applied to each category. The tradeoff between subjective quality and average bit rate of VRPS for both clean speech and speech corrupted with vehicle noise is superior to that obtained by QCELP, a proposed TIA speech coding standard for wideband digital cellular networks. The average bit-rate of VRPS is generally lower than that of QCELP for the same input signals. >

65 citations


BookDOI
01 Jan 1993
TL;DR: Speech Coding for Wireless Transmission, a Beginner's Guide to Speech Coding, and Topics in speech Coding.
Abstract: I: Introduction. II: Low Delay Speech Coding. III: Speech Quality. IV: Speech Coding for Wireless Transmission. V: Audio Coding. VI: Speech Coding for Noisy Transmission Channels. VII: Topics in Speech Coding. Author Index. Index.

56 citations


Book ChapterDOI
01 Jan 1993
TL;DR: Variable rate coding can achieve a given level of quality at an average bit-rate R a that is substantially less than the bit rate R f that would be required by an equivalent quality fixed rate coder.
Abstract: A central objective in the design of a cellular network for mobile or personal communication is to maximize capacity while maintaining an acceptable level of voice quality under varying traffic and channel conditions. Conventional FDMA and TDMA techniques, dedicate a channel or time slot to one unidirectional speech signal regardless of the fact that a speaker is silent roughly 65% of the time in a two-way conversation. Furthermore, when speech is present, the short-term rate- distortion trade-off varies quite widely with the changing phonetic character. Thus, the number of bits needed to code a speech frame for a given perceived quality varies widely with time. The speech quality of coders operating at a fixed bit rate is largely determined by the worst-case speech segments, i.e., those that are the most difficult to code at that rate. Variable rate coding can achieve a given level of quality at an average bit-rate R a that is substantially less than the bit rate R f that would be required by an equivalent quality fixed rate coder. Efficient multiple-access systems, such as CDMA, directly translate this rate reduction into a corresponding increase in network capacity.

28 citations


Book ChapterDOI
01 Jan 1993
TL;DR: In speech coders based on linear prediction modeling it is important to accurately represent the spectral envelope of each frame to avoid degrading the quality of the synthesized speech.
Abstract: In speech coders based on linear prediction modeling it is important to accurately represent the spectral envelope of each frame to avoid degrading the quality of the synthesized speech. We generally aim for transparent quantization of the LPC parameters so that there is no audible difference between coded speech signals synthesized using quantized and unquantized LPC coefficients.

16 citations


Proceedings ArticleDOI
27 Apr 1993
TL;DR: The authors formulate the two separate problems of encoding and decoding as two individual problems and examine some possible ways to enhance performance over previous algorithms.
Abstract: With the establishment of video coding standards and the expanding use of video codecs, it is necessary to consider the designs of encoders and decoders as two individual problems instead of the conventional concurrent design approach. The authors formulate the two separate problems and examine some possible ways to enhance performance over previous algorithms. Given the decoding process and the channel, the encoding problem can be formulated as a constrained optimization problem. The dual of the encoding problem, to decode the compressed video data generated by an existing encoder, can be formulated as an optimal estimation problem given the encoded bit stream. Examination of these two problems provides the pathway to the design of enhanced compatible encoding and decoding schemes. >

16 citations


Proceedings ArticleDOI
22 Oct 1993
TL;DR: A substantial improvement is obtained over prior vector quantization based coders for multispectral data compression by applying a variable rate multistage vector quantizer for the compression of mult ispectral imagery.
Abstract: Multispectral satellite images of the earth consist of sets of images obtained by sensing electromagnetic radiation in different spectral bands for each geographical region. We have applied a variable rate multistage vector quantizer for the compression of multispectral imagery. Spectral and spatial correlation are simultaneously exploited by forming vectors from 3-dimensional data blocks. The wide variation in entropy across the data set is efficiently exploited by an adaptive bit allocation algorithm based on a greedy approach where the rate- distortion trade-off is locally optimized for each successive encoding stage. Simulation results on an image set acquired by a Thematic Mapper scanner are presented. A substantial improvement is obtained over prior vector quantization based coders for multispectral data compression.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

12 citations


Proceedings ArticleDOI
01 Nov 1993
TL;DR: A comprehensive trade-off analysis of major compression algorithms as applied to multispectral imagery and two different approaches of interest which emerge are vector quantization (VQ) and transform coding (TC).
Abstract: We provide a comprehensive trade-off analysis of major compression algorithms as applied to multispectral imagery. Our goal is to identify a real-time, low cost codec for multispectral imagery. Hence the major criteria used for performance and ranking of various algorithms emphasize implementation complexity in addition to compression performance and robustness. Multispectral imagery codecs are designed by suitably extending and modifying the state-of-art still image codecs so that the resultant codec simultaneously benefits from the spatial and spectral correlations inherent in multispectral imagery. In particular, two different approaches of interest which emerge are vector quantization (VQ) and transform coding (TC). This paper describes and analyzes some recently developed multispectral imagery codecs utilizing the above two approaches. For completeness, a brief overview of alternate approaches is also included. >

10 citations


Journal ArticleDOI
TL;DR: The LD-VXC coder integrates techniques such as vector quantization and analysis-by-synthesis coding, and requires no excessive buffering of speech samples by incorporating backward adaptation for the linear predictors, in contrast to forward adaptation.

9 citations


Proceedings ArticleDOI
17 Jan 1993

9 citations


Proceedings ArticleDOI
22 Oct 1993
TL;DR: In this article, a variable resolution motion estimator is proposed for sub-pixel motion estimation, which offers a significantly reduced complexity by eliminating the explicit interpolation of pixel values at a finer sampling grid and avoiding the distortion evaluation for each candidate subpixel motion vector.
Abstract: In this paper, we present a novel method for sub-pixel motion estimation. Compared to traditional methods, our technique offers a significantly reduced complexity by eliminating the explicit interpolation of pixel values at a finer sampling grid and avoiding the distortion evaluation for each candidate sub-pixel motion vector. Simulation results confirm that its performance is virtually indistinguishable from that of a traditional sub-pixel motion estimator. To illustrate applications of the new method we have developed a variable resolution motion estimator. The performance of the variable resolution motion estimator is evaluated by replacement of the fixed resolution motion estimator in Simulation Model 3 of MPEG-1 video compression standard by the variable resolution motion estimator.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: Computer simulation with both full-search VQ and pruned-tree-structured VQ encoders demonstrate that, compared to conventional VQ decoding, this new decoding technique reproduces images with not only higher SNR but also better perceptual quality.
Abstract: We present an improved decoding paradigm for vector quantization (VQ) of images. In this new decoding method, the dimension of the code vectors at the decoder is higher than the dimension of the input vectors at the encoder, so that the area covered by each output vector extends beyond the input block of pixels into its neighborhood. The image is reconstructed as an overlapping patchwork of output code vectors, where the pixel values in the lapped region are obtained by summing the corresponding elements of the overlapping code vectors. With a properly designed decoder code book, this lapped block-decoding technique is able to improve the performance of VQ by exploiting the interblock correlation at the decoder. We have developed a recursive algorithm for designing a locally optimal decoder code book from a training set of images, given a fixed VQ encoder. Computer simulation with both full-search VQ and pruned-tree-structured VQ encoders demonstrate that, compared to conventional VQ decoding, this new decoding technique reproduces images with not only higher SNR but also better perceptual quality.

Proceedings ArticleDOI
30 Mar 1993
TL;DR: An algorithm paradigm for the joint design of the feature codebooks constituting a GPC is described and the performance improvements over conventional 'greedy' design are essentially 'free' as the only cost is a moderate increase in design complexity.
Abstract: With respect to the generalized product code (GPC) model for structured vector quantization, multistage VQ (MSVQ) and tree-structured VQ are members of a family of summation product codes (SPCs), defined by the prototypical synthesis function x=f/sub 1/+...+f/sub s/, where f/sub i/, i=1, . . ., s are the residual vector features. The authors describe an algorithm paradigm for the joint design of the feature codebooks constituting a GPC. They specialize the paradigm to a joint design algorithm for the SPCs and exhibit experimental results for the MSVQ of simulated sources. The performance improvements over conventional 'greedy' design are essentially 'free' as the only cost is a moderate increase in design complexity. >

Journal ArticleDOI
TL;DR: A novel scheme is proposed for the efficient coding and compression of high-definition television (HDTV) sequences for recording on digital video tape that achieves a constant 8:1 compression ratio by exploiting both intraframe and interframe compression techniques and removing both statistical and psychovisual redundancies during the coding process.
Abstract: A novel scheme is proposed for the efficient coding and compression of high-definition television (HDTV) sequences for recording on digital video tape. The scheme achieves a constant 8:1 compression ratio by exploiting both intraframe and interframe compression techniques and removing both statistical and psychovisual redundancies during the coding process. However, a perceptually lossless image quality is still maintained after both single and multiple generations. This scheme also allows the recognizable playback of recorded material at speeds up to 50 times normal in both the forward and reverse directions. This high-performance picture-in-shuttle mode of operation is accomplished by employing a hierarchical data structure and exploiting the head-positioning capabilities of conventional digital video tape recorders. The performance of the overall recording scheme is evaluated by simulating the compression and reconstruction of three HDTV test sequences at 1, 10, 20 and 50 times normal layback speed. >

Proceedings ArticleDOI
13 Oct 1993
TL;DR: An improved version of the variable rate phonetic seg mentation (VRPS) algorithm for speech coding, applicable to CDMA digital cellular systems is presented, using a new class-dependent adaptive postfiltering which attenuates the noise in the valleys of the speech spectrum.
Abstract: An improved version of the variable rate phonetic seg mentation (VRPS) algorithm for speech coding, applicable to CDMA digital cellular systems is presented. The coder performs activity detection to distinguish speech from background noise. Each frame of active speech is classifted into one of four phonetic categories. A distinct coding configuration and bit rate is applied to each category. The perceptual quality of the coded speech has been improved through the use of a new class-dependent adaptive postfiltering which attenuates the noise in the valleys of the speech spectrum. The robustness of the phonetic classffication algorithm in the presence of back ground vehicle noise has also been improved, by adapting classilier parameters using properties of the noise. These two modifications combine to improve the perceptual quality of the coder, while reducing the average rate by 300-450 b/s.

Proceedings ArticleDOI
23 May 1993
TL;DR: An improved version of a previous 8-kb/s low-delay VXC (vector excitation coding) algorithm, which has been enhanced to maintain high-quality results under clean conditions while adding robustness to channel errors.
Abstract: The authors present an improved version of a previous 8-kb/s low-delay VXC (vector excitation coding) algorithm, which has been enhanced to maintain high-quality results under clean conditions while adding robustness to channel errors. Periodic resetting of the pitch predictor makes the coder more robust to channel errors but sacrifices some of the advantages of interframe coding in clean channels. A novel adaptive mean-removed pitch tracking scheme was introduced that provides robustness to bit errors while avoiding significant quality degradation under error-free conditions. >

Book ChapterDOI
01 Jan 1993
TL;DR: The “state-of-the-art” in wideband audio coding is exemplified by the audio component in the International Standards Organization’s (ISO) Moving Picture Expert Group (MPEG) standard for the coding of audiovisual information.
Abstract: The “state-of-the-art” in wideband (eg, 20 kHz) audio coding is exemplified by the audio component in the International Standards Organization’s (ISO) Moving Picture Expert Group (MPEG) standard for the coding of audiovisual information [1] This standard follows a basic paradigm for audio coding that prevails today The paradigm consists of (a) variable-rate coding of samples obtained from a time-frequency analysis of the audio signal, (b) bit allocation governed by an elaborate auditory masking model to control the time-frequency distribution of quantization distortion, and (c) a constant-rate channel bit stream maintained by a buffer control loop Virtually “transparent” compact disc (CD) quality can be obtained at 128 kb/s per channel of full 20 kHz bandwidth audio The MPEG coding algorithm employs entropy constrained scalar quantization which can be quite efficient in terms of rate- distortion performance [2]

Journal ArticleDOI
TL;DR: A new scheme is proposed for the efficient coding of highdefinition television (HDTV) sequences for digital video tape recording that allows the variable speed playback of recorded material at speeds up to 100 times normal in both the forward and reverse directions.
Abstract: A new scheme is proposed for the efficient coding of highdefinition television (HDTV) sequences for digital video tape recording. This scheme allows the variable speed playback of recorded material at speeds up to 100 times normal in both the forward and reverse directions. An 8:1 level of compression of the original HDTV data is obtained by exploiting both intraframe and interframe redundancies during the coding process, although the perceptual quality of the reconstructed sequences is uncompromised. To allow the high-quality playback of the recorded data at very high speeds, a hierarchical data structure is employed and the head-positioning capabilities of conventional digital video tape recorders (DVTRs) are exploited to enable the recovery of perceptually vital information at high tape speeds. The performance of the resuIting scheme is examined by simulating the compression and reconstruction of three HDTV sequences at 1, 10, 20, 50, and 100 times normal playback speed.

ReportDOI
01 Nov 1993
TL;DR: This assessment of recent data compression and coding research outside the United States examines fundamental and applied work in the basic areas of signal decomposition, quantization, lossless compression, and error control, as well as application development efforts in image/video compression and speech/audio compression.
Abstract: This assessment of recent data compression and coding research outside the United States examines fundamental and applied work in the basic areas of signal decomposition, quantization, lossless compression, and error control, as well as application development efforts in image/video compression and speech/audio compression. Seven computer scientists and engineers who are active in development of these technologies in US academia, government, and industry carried out the assessment. Strong industrial and academic research groups in Western Europe, Israel, and the Pacific Rim are active in the worldwide search for compression algorithms that provide good tradeoffs among fidelity, bit rate, and computational complexity, though the theoretical roots and virtually all of the classical compression algorithms were developed in the United States. Certain areas, such as segmentation coding, model-based coding, and trellis-coded modulation, have developed earlier or in more depth outside the United States, though the United States has maintained its early lead in most areas of theory and algorithm development. Researchers abroad are active in other currently popular areas, such as quantizer design techniques based on neural networks and signal decompositions based on fractals and wavelets, but, in most cases, either similar research is or has been going on in the United States,more » or the work has not led to useful improvements in compression performance. Because there is a high degree of international cooperation and interaction in this field, good ideas spread rapidly across borders (both ways) through international conferences, journals, and technical exchanges. Though there have been no fundamental data compression breakthroughs in the past five years--outside or inside the United State--there have been an enormous number of significant improvements in both places in the tradeoffs among fidelity, bit rate, and computational complexity.« less

Proceedings ArticleDOI
17 Jan 1993
TL;DR: By allowing different levels of the tree codebook to share a library of feature (residual) codebooks, this work was able to achieve in this experiment a 4 : 1 reduction of storage without compromising rate-distortion performance.
Abstract: Clustering algorithms can be applied to the design of N codebooks to be shared by M sources, 1< M < N. We previously introduced a constrained storage vector quantizations algorithm for this design problem. In this work, we extend the algorithm to additionally design simple parametric expandor functions to enhance codebook sharing efficiency. We apply the particular case of scaling expandor functions to the compression of tree structured vector quantization codebooks. By allowing different levels of the tree codebook to share a library of feature (residual) codebooks, we were able to achieve in our experiment a 4 : 1 reduction of storage without compromising rate-distortion performance. For very deep trees, an earlier design method which effects sharing only within each level of the tree is more effective.

Journal ArticleDOI
TL;DR: A new frequency-based sub-block activity measure is proposed as the basis of a bit allocation scheme for transform-based video coding algorithms, which provides a method for obtaining high multigeneration performance.
Abstract: The multigeneration performance of production-quality video compression algorithms has recently become an important area of research. Unfortunately, many existing fixed-rate video compression algorithms do not maintain sufficiently high image quality after multiple stages of generation. A new frequency-based sub-block activity measure is proposed as the basis of a bit allocation scheme for transform-based video coding algorithms. The scheme provides a method for obtaining high multigeneration performance which results in less than 1.0 dB drop in PSNR over 30 generations for both interframe and intraframe coded sequences. >

Proceedings ArticleDOI
08 Jun 1993
TL;DR: A new scheme is proposed for the efflcient coding and compression of high definition television sequences for recording on digital video tape that achieves a constant 8:l level of compression while maintaining a subjectively lossless image quality at normal playback speed after both single and multiple generations.
Abstract: A new scheme is proposed for the efflcient coding and compression of high definition television (HDTV) sequences for recording on digital video tape. The scheme achieves a constant 8:l level of compression while maintaining a subjectively lossless image quality at normal playback speed after both single and multiple generations. The proposed scheme also allows the recognizable playback of recorded material at speeds up to Wtimes normal in both the forward and reverse directions.