scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 1996"


Journal Article•DOI•
TL;DR: The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods.
Abstract: Embedded zerotree wavelet (EZW) coding, introduced by Shapiro (see IEEE Trans. Signal Processing, vol.41, no.12, p.3445, 1993), is a very effective and computationally simple technique for image compression. We offer an alternative explanation of the principles of its operation, so that the reasons for its excellent performance can be better understood. These principles are partial ordering by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission, and exploitation of self-similarity across different scales of an image wavelet transform. Moreover, we present a new and different implementation based on set partitioning in hierarchical trees (SPIHT), which provides even better performance than our previously reported extension of EZW that surpassed the performance of the original EZW. The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods. In addition, the new coding and decoding procedures are extremely fast, and they can be made even faster, with only small loss in performance, by omitting entropy coding of the bit stream by the arithmetic code.

5,890 citations


Journal Article•DOI•
TL;DR: Simulation results show that the proposed 4SS performs better than the well-known three- step search and has similar performance to the new three-step search (N3SS) in terms of motion compensation errors.
Abstract: Based on the real world image sequence's characteristic of center-biased motion vector distribution, a new four-step search (4SS) algorithm with center-biased checking point pattern for fast block motion estimation is proposed in this paper. A halfway-stop technique is employed in the new algorithm with searching steps of 2 to 4 and the total number of checking points is varied from 17 to 27. Simulation results show that the proposed 4SS performs better than the well-known three-step search and has similar performance to the new three-step search (N3SS) in terms of motion compensation errors. In addition, the 4SS also reduces the worst-case computational requirement from 33 to 27 search points and the average computational requirement from 21 to 19 search points, as compared with N3SS.

1,619 citations


Journal Article•DOI•
Lurng-Kuo Liu1, Ephraim Feig1•
TL;DR: The proposed block-based gradient descent search (BBGDS) algorithm is proposed to perform block motion estimation in video coding and provides competitive performance with reduced computational complexity.
Abstract: A block-based gradient descent search (BBGDS) algorithm is proposed in this paper to perform block motion estimation in video coding. The BBGDS evaluates the values of a given objective function starting from a small centralized checking block. The minimum within the checking block is found, and the gradient descent direction where the minimum is expected to lie is used to determine the search direction and the position of the new checking block. The BBGDS is compared with full search (FS), three-step search (TSS), one-at-a-time search (OTS), and new three-step search (NTSS). Experimental results show that the proposed technique provides competitive performance with reduced computational complexity.

638 citations


Journal Article•DOI•
TL;DR: An efficient solution is proposed in which the optimum combination of macroblock modes and the associated mode parameters are jointly selected so as to minimize the overall distortion for a given bit-rate budget, and is successfully applied to the emerging H.263 video coding standard.
Abstract: This paper addresses the problem of encoder optimization in a macroblock-based multimode video compression system. An efficient solution is proposed in which, for a given image region, the optimum combination of macroblock modes and the associated mode parameters are jointly selected so as to minimize the overall distortion for a given bit-rate budget. Conditions for optimizing the encoder operation are derived within a rate-constrained product code framework using a Lagrangian formulation. The instantaneous rate of the encoder is controlled by a single Lagrange multiplier that makes the method amenable to mobile wireless networks with time-varying capacity. When rate and distortion dependencies are introduced between adjacent blocks (as is the case when the motion vectors are differentially encoded and/or overlapped block motion compensation is employed), the ensuing encoder complexity is surmounted using dynamic programming. Due to the generic nature of the algorithm, it can be successfully applied to the problem of encoder control in numerous video coding standards, including H.261, MPEG-1, and MPEG-2. Moreover, the strategy is especially relevant for very low bit rate coding over wireless communication channels where the low dimensionality of the images associated with these bit rates makes real-time implementation very feasible. Accordingly, in this paper, the method is successfully applied to the emerging H.263 video coding standard with excellent results at rates as low as 8.0 Kb per second. Direct comparisons with the H.263 test model, TMN5, demonstrate that gains in peak signal-to-noise ratios (PSNR) are achievable over a wide range of rates.

408 citations


Journal Article•DOI•
Wei Ding1, Bede Liu1•
TL;DR: A feedback re-encoding method with a rate-quantization model, which can be adapted to changes in picture activities, is developed and used for quantization parameter selection at the frame and slice level.
Abstract: For MPEG video coding and recording applications, it is important to select the quantization parameters at slice and macroblock levels to produce consistent quality image for a given bit budget. A well-designed rate control strategy can improve the overall image quality for video transmission over a constant-bit-rate channel and fulfil the editing requirement of video recording, where a certain number of new pictures are encoded to replace consecutive frames on the storage media using, at most, the same number of bits. We developed a feedback re-encoding method with a rate-quantization model, which can be adapted to changes in picture activities. The model is used for quantization parameter selection at the frame and slice level. The extra computations needed are modest. Experiments show the accuracy of the model and the effectiveness of the proposed rate control method. A new bit allocation algorithm is then proposed for MPEG video coding.

377 citations


Journal Article•DOI•
TL;DR: This paper presents several bitstream scaling methods for the purpose of reducing the rate of constant bit rate (CBR) encoded bitstreams and shows typical performance trade-offs of the methods.
Abstract: The idea of moving picture expert group (MPEG) bitstream scaling relates to altering or scaling the amount of data in a previously compressed MPEG bitstream. The new scaled bitstream conforms to constraints that are not known nor considered when the original preceded bitstream was constructed. Numerous applications for video transmission and storage are being developed based on the MPEG video coding standard. Applications such as video on demand, trick-play track on digital video tape recorders (VTR's) and extended-play recording on VTR's motivate the idea of bitstream scaling. In this paper, we present several bitstream scaling methods for the purpose of reducing the rate of constant bit rate (CBR) encoded bitstreams. The different methods have varying hardware implementation complexity and associated trade-offs in resulting image quality. Simulation results on MPEG test sequences demonstrate the typical performance trade-offs of the methods.

351 citations


Journal Article•DOI•
TL;DR: This paper compares the performance of these techniques (excluding temporal scalability) under various loss rates using realistic length material and discusses their relative merits.
Abstract: Transmission of compressed video over packet networks with nonreliable transport benefits when packet loss resilience is incorporated into the coding. One promising approach to packet loss resilience, particularly for transmission over networks offering dual priorities such as ATM networks, is based on layered coding which uses at least two bitstreams to encode video. The base-layer bitstream, which can be decoded independently to produce a lower quality picture, is transmitted over a high priority channel. The enhancement-layer bitstream(s) contain less information, so that packet losses are more easily tolerated. The MPEG-2 standard provides four methods to produce a layered video bitstream: data partitioning, signal-to-noise ratio scalability, spatial scalability, and temporal scalability. Each was included in the standard in part for motivations other than loss resilience. This paper compares the performance of these techniques (excluding temporal scalability) under various loss rates using realistic length material and discusses their relative merits. Nonlayered MPEG-2 coding gives generally unacceptable video quality for packet loss ratios of 10/sup -3/ for small packet sizes. Better performance can be obtained using layered coding and dual-priority transmission. With data partitioning, cell loss ratios of 10/sup -4/ in the low-priority layer are definitely acceptable, while for SNR scalable encoding, cell loss ratios of 10/sup -3/ are generally invisible. Spatial scalable encoding can provide even better visual quality under packet losses; however, it has a high implementation complexity.

227 citations


Journal Article•DOI•
TL;DR: This work considers the transmission of QCIF resolution (176/spl times/144 pixels) video signals over wireless channels at transmission rates of 64 kb/s and below and proposes an automatic repeat request (ARQ) error control technique to retransmit erroneous data-frames.
Abstract: We consider the transmission of QCIF resolution (176/spl times/144 pixels) video signals over wireless channels at transmission rates of 64 kb/s and below. The bursty nature of the errors on the wireless channel requires careful control of transmission performance without unduly increasing the overhead for error protection. A dual-rate source coder is presented that adaptively selects a coding rate according to the current channel conditions. An automatic repeat request (ARQ) error control technique is employed to retransmit erroneous data-frames. The source coding rate is selected based on the occupancy level of the ARQ transmission buffer. Error detection followed by retransmission results in less overhead than forward error correction for the same quality. Simulation results are provided for the statistics of the frame-error bursts of the proposed system over code division multiple access (CDMA) channels with average bit error rates of 10/sup -3/ to 10/sup -4/.

176 citations


Journal Article•DOI•
TL;DR: The algorithm has been designed mainly for 50 Hz to 75 Hz frame rate up-conversion with applications in a multimedia environment, but it can also be used in advanced television receivers to remove artifacts due to low scan rate.
Abstract: A frame interpolation algorithm for frame rate up-conversion of progressive image sequences is proposed. The algorithm is based on simple motion compensation and linear interpolation. A motion vector is searched for each pixel in the interpolated image and the resulting motion field is median filtered to remove inconsistent vectors. Averaging along the motion trajectory is used to produce the interpolated pixel values. The main novelty of the proposed method is the motion compensation algorithm which has been designed with low computational complexity as an important criterion. Subsampled blocks are used in block matching and the vector search range is constrained to the most likely motion vectors. Simulation results show that good visual quality has been obtained with moderate complexity. The algorithm has been designed mainly for 50 Hz to 75 Hz frame rate up-conversion with applications in a multimedia environment, but it can also be used in advanced television receivers to remove artifacts due to low scan rate.

169 citations


Journal Article•DOI•
TL;DR: Simulation results show that acceptable visual quality can be maintained in transmitting video sequences with low bit rates over the wireless channel of high error rates, and the distortion due to erroneous transmission of coded data can be effectively suppressed.
Abstract: Visual communication over wireless channels is becoming important in multimedia. Because of the limited bandwidth and high error rates of the wireless channel, the video codec should be designed to have high coding efficiency in maintaining acceptable visual quality at low bit rates and robustness to suppress the distortion due to transmission errors. The coding efficiency of a 3D subband video codec is optimized by removing not only the redundancy due to spatial and temporal correlation but also perceptually insignificant components from video signals. Unequal error protection is applied to the source code bits of different perceptual importance. An error concealment method is employed to hide the distortion due to erroneous transmission of perceptually important signals. The evaluation of each signal's perceptual importance is made first from measuring the just-noticeable distortion (JND) profile as the perceptual redundancy inherent in video signals, and then from allocating JND energy to signals of different subbands according to the sensitivity of human visual responses to spatio-temporal frequencies. Simulation results show that acceptable visual quality can be maintained in transmitting video sequences with low bit rates (<64 kbps) over the wireless channel of high error rates (up to BER=10/sup -2/), and the distortion due to erroneous transmission of coded data can be effectively suppressed. In the simulation, the noisy channel is assumed to be corrupted by the random errors depending on the average strength of the received wave and the burst errors due to Rayleigh fading.

164 citations


Journal Article•DOI•
TL;DR: A new image coding approach in which a 4-ary arithmetic coder is used to represent significant coefficient values and the lengths of zero runs between coefficients, which involves much lower addressing complexity than other algorithms such as zerotree coding.
Abstract: We describe a new image coding approach in which a 4-ary arithmetic coder is used to represent significant coefficient values and the lengths of zero runs between coefficients. This algorithm works by raster scanning within subbands, and therefore involves much lower addressing complexity than other algorithms such as zerotree coding that require the creation and maintenance of lists of dependencies across different decomposition levels. Despite its simplicity, and the fact that these dependencies are not explicitly utilized, the algorithm presented here is competitive with the best enhancements of zerotree coding. In addition, it performs comparably with adaptive subband splitting approaches that involve much higher implementation complexity.

Journal Article•DOI•
TL;DR: This paper select the most representative pixels based on image content in each block for the matching criterion, due to the fact that high activity in the luminance signal such as edges and texture mainly contributes to the match criterion.
Abstract: A new adaptive technique based on pixel decimation for the estimation of motion vector is presented. In a traditional approach, a uniform pixel decimation is used. Since part of the pixels in each block do not enter into the matching criterion, this approach limits the accuracy of the motion vector. In this paper, we select the most representative pixels based on image content in each block for the matching criterion. This is due to the fact that high activity in the luminance signal such as edges and texture mainly contributes to the matching criterion. Our approach can compensate the drawback in standard pixel decimation techniques. Computer simulations show that this technique is close to the performance of the exhaustive search with significant computational reduction.

Journal Article•DOI•
TL;DR: This work presents a block motion estimation scheme which is based on matching of integral projections of motion blocks with those of the search area in the previous frame, and takes advantage of the similarity of motion vectors in adjacent blocks in typical imagery by subsampling the motion vector field.
Abstract: Several efficient techniques have previously been proposed to reduce the computational burden of block matching for motion estimation in video coding. The goal is efficient motion estimation with minimal error in the motion-compensated predicted image. We present a block motion estimation scheme which is based on matching of integral projections of motion blocks with those of the search area in the previous frame. Like many other techniques, ours operates in a sequence of decreasing search radii, but it performs an exhaustive search at each level of the hierarchy. The projection method is much less computationally costly than block matching and has a prediction accuracy of competitive quality with both full block matching and other efficient techniques. Our algorithm also takes advantage of the similarity of motion vectors in adjacent blocks in typical imagery by subsampling the motion vector field. It has the added advantage of allowing parallel computation of vertical and horizontal displacements.

Journal Article•DOI•
TL;DR: An adaptive technique for scanning rate conversion and interpolation that performs better than the edge-based line average algorithm, especially for an image with more horizontal edges is proposed.
Abstract: An adaptive technique for scanning rate conversion and interpolation is proposed. This technique performs better than the edge-based line average algorithm, especially for an image with more horizontal edges. Moreover, it is easy to implement and a simple VLSI architecture is proposed. Computer simulation shows that a 37.0 dB image can be obtained via our proposed technique, while edge-based line average algorithm only achieves 35.2 dB.

Journal Article•DOI•
C. Brown1, K. Feher•
TL;DR: The result is that an average video frame rate of up 14 frames per second can be supported within a single TDMA time slot, doubling the number of conventional GMSK multimedia transmissions per GSM channel.
Abstract: A reconfigurable global mobile standard (GSM) compatible radio modem interface which doubles the number of simultaneous video and voice transmissions per channel is presented for personal communication systems (PCSs). The result is that an average video frame rate of up 14 frames per second (fps) can be supported within a single TDMA time slot, doubling the number of conventional GMSK multimedia transmissions per GSM channel. The design employs an in-circuit reconfigurable (ICR) cross-correlated Feher's (see Englewood Cliffs, NJ: Prentice-Hall, 1995). quadrature phase shift keyed (FQPSK) signal processing technique to support a data rate of 357.5 kb/s in a 200 kHz bandlimited GSM channel. This innovative new network modem is reconfigured in-circuit for high bit rate data transmission or voice operation that is compatible with existing GSM equipment. Spectrum efficiency /spl eta//sub f/ (b/s/Hz) is investigated in a nonlinear amplified (NLA) environment-providing a 6 to 9 dB advantage in power efficiency for increased battery life of hand-held terminals. Results show that RF power efficient nonlinear amplified spectrum efficiency is increased from 1.35 b/s/Hz to 1.85 b/s/Hz. Bit error rate (BER) performance is evaluated in a Gaussian and Rayleigh fading channel, and the merits of coherent demodulation in microcellular PCS are examined. Our results show that the GSM compatible configuration of a specific cross correlated FQPSK-KF implementation offers up to 2 dB improvement in E/sub b//N/sub 0/ over conventional GMSK BT/sub b/=0.3. The PCS network capacity /spl eta//sub T/ (Erlangs/Hz m/sup 2/) may be increased 37% over GMSK BT/sub b/=0.3.

Journal Article•DOI•
TL;DR: This paper explores the use of the deformable mesh structure for motion/shape analysis and synthesis in an image sequence, including scene-adaptive mesh generation and node tracking over successive frames, and presents algorithms for the analysis problem.
Abstract: For pt.I see ibid., vol.6, no.6, p.636-46 (1996). This paper explores the use of the deformable mesh structure for motion/shape analysis and synthesis in an image sequence. We present algorithms for the analysis problem, including scene-adaptive mesh generation and node tracking over successive frames. We also describe a region-based video coder that integrates the analysis and synthesis algorithms presented. The coder describes each region by an ensemble of connected quadrilateral elements embedded in a mesh structure. For each region, its shape and texture are described by the nodal positions and image functions of the elements in this region in an initial frame, while its motion (including shape deformation) is characterized by the nodal trajectories in the following frames, which are in turn specified by a few motion parameters. This coder has been applied to a typical common intermediate format (CIF) resolution, head-and-shoulder type sequence. The visual quality is significantly better than the H.263-TMN4 algorithm at about 50 kb/s (for the luminance component only, 30 Hz).

Journal Article•DOI•
Yao Wang, O. Lee1•
TL;DR: It is shown that the concepts of shape functions and master elements are crucial for developing computationally efficient algorithms for both the analysis and synthesis problems.
Abstract: This paper explores the use of a deformable mesh (also known as the control grid) structure for motion analysis and synthesis in an image sequence. We focus on the synthesis problem, i.e., how to interpolate an image function given nodal positions and values and how to predict a present image frame from a reference one given nodal displacements between the two images. For this purpose, we review the fundamental theory and numerical techniques that have been developed in the finite element method for function approximation and mapping using a mesh structure. Specifically, we focus on (i) the use of shape functions for node-based function interpolation and mapping; and (ii) the use of regular master elements to simplify numerical calculations involved in dealing with irregular mesh structures. In addition to a general introduction that is applicable to an arbitrary mesh structure, we also present specific results for triangular and quadrilateral mesh structures, which are the most useful two-dimensional (2-D) meshes. Finally, we describe how to apply the above results for motion compensated frame prediction and interpolation. It is shown that the concepts of shape functions and master elements are crucial for developing computationally efficient algorithms for both the analysis and synthesis problems.

Journal Article•DOI•
TL;DR: A fast hierarchical feature matching-motion estimation scheme (HFM-ME) that can be used in H.263, H.261, MPEG 1, MPEG 2, and HDTV applications, where the sign truncated feature is defined and used for block template matching, as opposed to the pixel intensity values used in conventional block matching methods.
Abstract: This paper presents a fast hierarchical feature matching-motion estimation scheme (HFM-ME) that can be used in H.263, H.261, MPEG 1, MPEG 2, and HDTV applications. In the HFM-ME scheme, the sign truncated feature (STF) is defined and used for block template matching, as opposed to the pixel intensity values used in conventional block matching methods. The STF extraction process can be considered as a zero-crossing phase detection with the mean as the bias and binary sign pattern as the phase deviation. Using the STF definition, a data block can be represented by a mean and a set of binary features with a much reduced data set. The block matching motion estimation is then divided into mean matching and binary phase matching. The proposed technique enables a significant reduction in computational complexity compared with the conventional full-search block matching ME because binary phase matching only involves Boolean logic operations. This feature also significantly reduces the data transfer time between the frame buffer and motion estimator. The proposed HFM-ME algorithm is implemented and compared with the conventional full-search block matching schemes. Our test results using three full-motion MPEG sequences indicate that the performance of the HFM-ME is comparable with the full-search block matching under the same search ranges, however, HFM-ME can be implemented about 64 times faster than the conventional full-search schemes. The proposed scheme can be combined with other fast algorithms to further reduce the computational complexity, at the expense of picture quality.

Journal Article•DOI•
TL;DR: This paper investigates a modified DCT computation scheme, to be called the subband DCT (SB-DCT), that provides a simple, efficient solution to the reduction of the block artifacts while achieving faster computation.
Abstract: The discrete cosine transform (DCT) is well known for its highly efficient coding performance and is widely used in many image compression applications. However, in low bit rate coding, it produces undesirable block artifacts that are visually not pleasing. In addition, in many practical applications, faster computation and easier VLSI implementation of DCT coefficients are also important issues. The removal of the block artifacts and faster DCT computation are therefore of practical interest. In this paper, we investigate a modified DCT computation scheme, to be called the subband DCT (SB-DCT), that provides a simple, efficient solution to the reduction of the block artifacts while achieving faster computation. We have applied the new approach for the low bit rate coding and decoding of images. Simulation results on real images have verified the improved performance obtained using the proposed method over the standard JPEG method.

Journal Article•DOI•
TL;DR: Data structures for highly scalable compressed video are described, which are able to support simple, generic scaling approaches for both constant bit rate and constant distortion scaling criteria, and the performance of the proposed scaling methodologies is experimentally investigated.
Abstract: Scalability refers to the ability to modify the resolution and/or bit rate associated with an already compressed data source in order to satisfy requirements which could not be foreseen at the time of compression. A number of researchers have already demonstrated the feasibility of efficient scalable image and video compression. The principle focus of this paper is to describe data structures for highly scalable compressed video, which are able to support simple, generic scaling approaches for both constant bit rate and constant distortion scaling criteria. Interactive video material presents particular challenges when the data stream is to be scaled to maintain an approximately constant level of distortion, rather than just a constant bit rate. Special attention is paid, therefore, to the development of generic, robust scaling algorithms for such applications. The data structures and scaling methodologies developed are particularly appealing for the distribution of highly scalable compressed video over heterogeneous media, because they simultaneously support both variable bit rate (VBR) and constant bit rate (CBR) services with a wide range of available service qualities, using only simple, generic mechanisms for scaling. The performance of the proposed scaling methodologies is experimentally investigated using a highly scalable video compression algorithm, which is able to achieve comparable compression performance to that of the inherently nonscalable MPEG-1 compression standard.

Journal Article•DOI•
TL;DR: This paper addresses the problem of noise attenuation for multichannel data by utilizing adaptively determined data-dependent coefficients based on a novel distance measure which combines vector directional with vector magnitude filtering.
Abstract: This paper addresses the problem of noise attenuation for multichannel data. The proposed filter utilizes adaptively determined data-dependent coefficients based on a novel distance measure which combines vector directional with vector magnitude filtering. The special case of color image processing is studied as an important example of multichannel signal processing.

Journal Article•DOI•
S. Dutta1, Wayne Wolf1•
TL;DR: A novel architecture that offers the flexibility of implementing widely varying motion-estimation algorithms by employing multiple processing elements which communicate with multiple memory banks via a multistage interconnection network is described.
Abstract: This paper describes a novel architecture that offers the flexibility of implementing widely varying motion-estimation algorithms. To achieve real-time performance, we employ multiple processing elements (PE's) which communicate with multiple memory banks via a multistage interconnection network. Three different block-matching algorithms-full search, three-step search, and conjugate-direction search-have been mapped onto this architecture to illustrate its programmability. We schedule the desired operations and design the required data-flow in such a way that processor utilization is high and memory bandwidth is at a feasible level. The details regarding the flow of the pixel data and the scheduling and allocation of the desired ALU operations (which pixels are processed on which processors in which clock cycles) are described in the paper. We analyze the performance of the proposed architecture for several different interconnection networks and data-memory organizations.

Journal Article•DOI•
TL;DR: Subjective results confirm the efficacy of the proposed classified coder over the RMS based H.261 coder in two ways: it consistently produces better quality sequences and achieves a bit rate saving of 35% when measuring at the same picture quality.
Abstract: A new technique of adaptively classifying the scene content of an image block has been developed in the proposed perceptual coder. It measures the texture masking energy of an image block and classifies it into one of four perceptual classes: flat, edge, texture, and fine-texture. Each class has an associated factor to adapt the quantizer with the aim of achieving constant quality across an image. A second feature of the perceptual coder is the visual thresholding, a process that reduces bit rate by discarding subthreshold discrete cosine transform (DCT) coefficients without degrading the image perceived quality. Finally, further quality gain is achieved by an improved reference model 8 (RM8) intramode decision, which removes sticking noise artifacts from newly uncovered background found in H.261 coded sequences. Subjective viewing tests, guided by Rec. 500-5, were conducted with 30 subjects. Subjective results confirm the efficacy of the proposed classified coder over the RMS based H.261 coder in two ways: (i) it consistently produces better quality sequences (with a mean opinion score, MOS, of approximately 2.0) when comparing at any fix bit rate; and (ii) it achieves a bit rate saving of 35% when measuring at the same picture quality (i.e., same MOS).

Journal Article•DOI•
Sheila S. Hemami1•
TL;DR: This paper presents and solves the dual problem-a block-based coding technique, namely a family of lapped orthogonal transforms (LOTs) is designed to maximize the reconstruction performance of a specified reconstruction algorithm.
Abstract: Wireless transmission of compressed visual information presents new challenges in image coding and reconstruction techniques. Wireless channels do not offer guaranteed transmission, and data loss over such channels can result in catastrophic errors in the decoded visual information. Visual data, however, can be reconstructed using lossy signal processing techniques. To date, reconstruction algorithms have been developed for fixed coding techniques. This paper presents and solves the dual problem-a block-based coding technique, namely a family of lapped orthogonal transforms (LOTs) is designed to maximize the reconstruction performance of a specified reconstruction algorithm. Mean-reconstruction, in which a missing coefficient block is replaced with the average of its available neighbors, is selected for its simplicity and ease of implementation. A reconstruction criterion is defined as the equal distribution of reconstruction errors across all transform coefficients, and a family of LOTs is then designed to meet the reconstruction criterion as well as consider the transform coding gain. Reconstruction capability and coding gain are traded off, and the LOT family consists of transforms that provide increasing reconstruction capability with lower coding gain. The reconstruction-optimized LOT family provides excellent reconstruction capability, and a transform can be selected based on the loss characteristics of the channel, the desired reconstruction performance, and the desired compression.

Journal Article•DOI•
Xiaoming Li1, C.A. Gonzales1•
TL;DR: This locally quadratic functional model decomposes the motion estimation optimization at subpixel resolutions into a two-stage pipelinable processes: full-search at full-pixel resolution and interpolation at any subpixel resolution.
Abstract: Accurate motion estimation is essential to effective motion compensated video signal processing, and subpixel resolutions are required for high quality applications. It is observed that around the optimum point of the motion estimation process the error criterion function is well modeled as a quadratic function with respect to the motion vector offsets. This locally quadratic functional model decomposes the motion estimation optimization at subpixel resolutions into a two-stage pipelinable processes: full-search at full-pixel resolution and interpolation at any subpixel resolution. Practical approximation formulas lead to the explicit computations of both motion vectors and error criterion functional values at subpixel resolutions.

Journal Article•DOI•
TL;DR: It is shown that the VLSI implementation of this class of DCT/IDCT algorithms can easily meet the high-speed requirements of high-definition television (HDTV) due to its modularity, regularity, local connectivity, and scalability.
Abstract: In this paper we present a full-custom VLSI design of high-speed 2-D DCT/IDCT processor based on the new class of time-recursive algorithms and architectures which has never been implemented to demonstrate its performance. We show that the VLSI implementation of this class of DCT/IDCT algorithms can easily meet the high-speed requirements of high-definition television (HDTV) due to its modularity, regularity, local connectivity, and scalability. Our design of the 8/spl times/8 DCT/IDCT can operate at 50 MHz (or have a 50 MSamples/s throughput) based on a very conservative estimate under 1.2 /spl mu/ CMOS technology. In comparison to the existing designs, our approach offers many advantages that can be further explored for even higher performance.

Journal Article•DOI•
Mohammed Ghanbari1•
TL;DR: It is shown that although the picture quality due to cell loss is temporarily degraded, it is immediately brought back to its original quality upon the reception of the late cells, as if no loss has occurred.
Abstract: A method for preventing accumulation of image artifacts due to cell loss in packet video is presented. At each ATM switching node an auxiliary buffer is used to store the overflow traffic of the main switching buffer. The main buffer is served with an absolute priority over the auxiliary buffer. Decoded pictures are normally reconstructed from the cells of the main buffer. The late cells received from the auxiliary buffer are processed and properly added to the current decoded picture. Postprocessing of these cells for the standard video codecs such as H.261 and MPEG is presented. It is shown that although the picture quality due to cell loss is temporarily degraded, it is immediately brought back to its original quality upon the reception of the late cells, as if no loss has occurred.

Journal Article•DOI•
TL;DR: The proposed hardware architectures for the two-stage BMA and FS BMA are faster than the conventional hardware architectures with lower hardware complexity and the functional validity of the proposed architecture is shown.
Abstract: We investigate hardware implementation of block matching algorithms (BMAs) for motion estimation of moving sequences. Using systolic arrays, we propose VLSI architectures for the two-stage BMA and full search (FS) BMA. The two-stage BMA using integral projections reduces greatly the computational complexity with its performance comparable to that of the FS BMA. The proposed hardware architectures for the two-stage BMA and FS BMA are faster than the conventional hardware architectures with lower hardware complexity. Also, the proposed architecture of the first stage of the two-stage BMA is modeled in VHDL and simulated. Simulation results show the functional validity of the proposed architecture.

Journal Article•DOI•
TL;DR: The architecture of a highly parallel DSP (HiPAR-DSP) as a flexible and programmable processor for image and video processing is proposed, based on an analysis of image processing algorithms in terms of available parallelization resources, demands on program control, and required data access mechanisms.
Abstract: We propose the architecture of a highly parallel DSP (HiPAR-DSP) as a flexible and programmable processor for image and video processing. The design is based on an analysis of image processing algorithms in terms of available parallelization resources, demands on program control, and required data access mechanisms. This led to a very long instruction word (VLIW)-controlled ASIMD RISC-architecture with four or sixteen data paths, employing data-level parallelism, parallel instructions, micro-instruction pipelining, and data transfer concurrently to data processing. Common data access patterns for image processing algorithms are supported by use of a shared on-chip memory with parallel matrix type access patterns and a separate data-cache per data path. By properly balancing processing and controlling capabilities as internal and external memory bandwidth, this approach is optimized to make the best use of currently available silicon resources. A high clock frequency is achieved by implementation of classic RISC features. The architecture fully supports high level language programming. With the 16 data path version and a 100 MHz clock, a sustained performance of more than 2 billion arithmetic operations per second (GOPS) is achieved for a wide range of algorithms. The examples show the parallel implementation of image processing algorithms like histogramming, Hough transform, or search in a sorted list with efficient use of the processor resources. A prototype of the architecture with four parallel data paths is available, using a 0.6 /spl mu/m CMOS technology.

Journal Article•DOI•
TL;DR: An efficient video coding scheme is presented as an extension of the MPEG-2 standard to accommodate the transmission of multiple viewpoint sequences on bandwidth-limited channels, thus providing fast and accurate constructions of multiple perspectives.
Abstract: An efficient video coding scheme is presented as an extension of the MPEG-2 standard to accommodate the transmission of multiple viewpoint sequences on bandwidth-limited channels. With the goal of compression and speed, the proposed approach incorporates a variety of existing computer graphics tools and techniques. Construction of each viewpoint image is predicted using a combination of perspective projection of three-dimensional (3-D) models, texture mapping, and digital image warping. Immediate application of the coding specification is foreseeable in systems with hardware-based real-time rendering capabilities, thus providing fast and accurate constructions of multiple perspectives.