scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 1997"


Journal ArticleDOI
TL;DR: A closed form solution for the target bit allocation which includes the MPEG-2 TM5 rate control scheme as a special case and the fluctuations of the bit counts are significantly reduced by 20-65% in the standard deviation of thebit count while the picture quality remains the same.
Abstract: A new rate control scheme is used to calculate the target bit rate for each frame based on a quadratic formulation of the rate distortion function. The distortion measure is assumed to be the average quantization scale of a frame. The rate distortion function is modeled as a second-order function of the inverse of the distortion measure. We present a closed form solution for the target bit allocation which includes the MPEG-2 TM5 rate control scheme as a special case. The model parameters are estimated using statistical linear regression analysis. Since the estimation uses the past encoded frames of the same picture prediction type (I, P, B pictures), the proposed approach is a single pass rate control technique. Because of the improved accuracy of the rate distortion function, the fluctuations of the bit counts are significantly reduced by 20-65% in the standard deviation of the bit count while the picture quality remains the same. Thus, the buffer requirement is reduced at a small increase in complexity. This technique has been adopted by the MPEG committee as part of VM5.0 in November 1996.

711 citations


Journal ArticleDOI
TL;DR: The scope of the MPEG-4 video standard is described and the structure of the video verification model under development is outlined, to provide a fully defined core video coding algorithm platform for the development of the standard.
Abstract: The MPEG-4 standardization phase has the mandate to develop algorithms for audio-visual coding allowing for interactivity, high compression, and/or universal accessibility and portability of audio and video content. In addition to the conventional "frame"-based functionalities of the MPEG-1 and MPEG-2 standards, the MPEG-4 video coding algorithm will also support access and manipulation of "objects" within video scenes. The January 1996 MPEG Video Group meeting witnessed the definition of the first version of the MPEG-4 video verification model-a milestone in the development of the MPEG-4 standard. The primary intent of the video verification model is to provide a fully defined core video coding algorithm platform for the development of the standard. As such, the structure of the MPEG-4 video verification model already gives some indication about the tools and algorithms that will be provided by the final MPEG-4 standard. The paper describes the scope of the MPEG-4 video standard and outlines the structure of the MPEG-4 video verification model under development.

670 citations


Journal ArticleDOI
Minerva M. Yeung1, Boon-Lock Yeo1
TL;DR: This work proposes techniques to analyze video and build a compact pictorial summary for visual presentation and presents a set of video posters, each of which is a compact, visually pleasant, and intuitive representation of the story content.
Abstract: Digital video archives are likely to be accessible on distributed networks which means that the data are subject to network congestion and bandwidth constraints. To enable new applications and services of digital video, it is not only important to develop tools to analyze and browse video, view query results, and formulate better searches, but also to deliver the essence of the material in compact forms. Video visualization describes the joint process of analyzing video and the subsequent derivation of representative visual presentation of the essence of the content. We propose techniques to analyze video and build a compact pictorial summary for visual presentation. A video sequence is thus condensed into a few images-each summarizing the dramatic incident taking place in a meaningful segment of the video. In particular, we present techniques to differentiate the dominance of the content in subdivisions of the segment based on analysis results, select a graphic layout pattern according to the relative dominances, and create a set of video posters, each of which is a compact, visually pleasant, and intuitive representation of the story content. The collection of video posters arranged in temporal order then forms a pictorial summary of the sequence to tell the underlying story. The techniques and compact presentations proposed offer valuable tools for new applications and services of digital video including video browsing, query, search, and retrieval in the digital libraries and over the Internet.

369 citations


Journal ArticleDOI
TL;DR: A highly efficient system that can rapidly detect human face regions in MPEG video sequences by detecting faces directly in the compressed domain, and there is no need to carry out the inverse DCT transform, so that the algorithm can run faster than the real time.
Abstract: Human faces provide a useful cue in indexing video content. We present a highly efficient system that can rapidly detect human face regions in MPEG video sequences. The underlying algorithm takes the inverse quantized discrete cosine transform (DCT) coefficients of MPEG video as the input, and outputs the locations of the detected face regions. The algorithm consists of three stages, where chrominance, shape, and frequency information are used, respectively. By detecting faces directly in the compressed domain, there is no need to carry out the inverse DCT transform, so that the algorithm can run faster than the real time. In our experiments, the algorithm detected 85-92% of the faces in three test sets, including both intraframe and interframe coded image frames from news video. The average run time ranges from 13-33 ms per frame. The algorithm can be applied to JPEG unconstrained images or motion JPEG video as well.

347 citations


Journal ArticleDOI
TL;DR: A matching-pursuit based motion residual coder which uses an inner-product search to decompose motion residual signals on an overcomplete dictionary of separable Gabor functions, providing detailed reconstructions without block artifacts.
Abstract: We present a video compression algorithm which performs well on generic sequences at very low bit rates. This algorithm was the basis for a submission to the November 1995 MPEG-4 subjective tests. The main novelty of the algorithm is a matching-pursuit based motion residual coder. The method uses an inner-product search to decompose motion residual signals on an overcomplete dictionary of separable Gabor functions. This coding strategy allows residual bits to be concentrated in the areas where they are needed most, providing detailed reconstructions without block artifacts. Coding results from the MPEG-4 Class A compression sequences are presented and compared to H.263. We demonstrate that the matching pursuit system outperforms the H.263 standard in both peak signal-to-noise ratio (PSNR) and visual quality.

319 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed algorithm is simple and efficient and requires about one half of the computation for the TSS while keeping the same regularity and good performance.
Abstract: The three-step search (TSS) algorithm for block-matching motion estimation, due to its simplicity, significant computational reduction, and good performance, has been widely used in real-time video applications. A new search algorithm is proposed for further reduction of computational complexity for motion estimation. It is shown that the proposed algorithm is simple and efficient and requires about one half of the computation for the TSS while keeping the same regularity and good performance.

302 citations


Journal ArticleDOI
TL;DR: A source model describing the relationship between bits, distortion, and quantization step sizes of a large class of block-transform video coders is proposed and the nonideal factors in real signals and systems are identified and their mathematical expressions are derived from empirical data.
Abstract: A source model describing the relationship between bits, distortion, and quantization step sizes of a large class of block-transform video coders is proposed. This model is initially derived from the rate-distortion theory and then modified to match the practical coders and real image data. The realistic constraints such as quantizer dead-zone and threshold coefficient selection are included in our formulation. The most attractive feature of this model is its simplicity in its final form. It enables us to predict the bits needed to encode a picture at a given distortion or to predict the quantization step size at a given bit rate. There are two aspects of our contribution: one, we extend the existing results of rate-distortion theory to the practical video coders, and two, the nonideal factors in real signals and systems are identified, and their mathematical expressions are derived from empirical data. One application of this model, as shown in the second part of this paper, is the buffer/quantizer control on a CCITT P/spl times/64 k coder with the advantage that the picture quality is nearly constant over the entire picture sequence.

297 citations


Journal ArticleDOI
TL;DR: The main idea is to effectively exploit the information obtained from the corresponding block at a coarser resolution level and spatio-temporal neighboring blocks at the same level in order to select a good set of initial MV candidates and then perform further local search to refine the MV result.
Abstract: We propose a new fast algorithm for block motion vector (MV) estimation based on the correlations of the MVs existing in spatially and temporally adjacent as well as hierarchically related blocks. We first establish a basic framework by introducing new algorithms based on spatial correlation and then spatio-temporal correlations before integrating them with a multiresolution scheme for the ultimate algorithm. The main idea is to effectively exploit the information obtained from the corresponding block at a coarser resolution level and spatio-temporal neighboring blocks at the same level in order to select a good set of initial MV candidates and then perform further local search to refine the MV result. We show with experimental results that, in comparison with the full search algorithm, the proposed algorithm achieves a speed-up factor ranging from 150 to 310 with only 2-7% mean square error (MSE) increase and a similar rate-distortion performance when applied to typical test video sequences.

279 citations


Journal ArticleDOI
TL;DR: An algorithm and a hardware architecture for block-based motion estimation that involves transforming video sequences from a multibit to a one-bit/pixel representation and then applying conventional motion estimation search strategies results in substantial reductions in arithmetic and hardware complexity and reduced power consumption.
Abstract: We present an algorithm and a hardware architecture for block-based motion estimation that involves transforming video sequences from a multibit to a one-bit/pixel representation and then applying conventional motion estimation search strategies. This results in substantial reductions in arithmetic and hardware complexity and reduced power consumption, while maintaining good compression performance. Experimental results and a custom hardware design using a linear array of processing elements are also presented.

273 citations


Journal ArticleDOI
TL;DR: This paper introduces a new approach to deblocking of JPEG compressed images using overcomplete wavelet representations using cross-scale correlations among wavelet coefficients, which is capable of achieving the same peak signal-to-noise ratio (PSNR) improvement as the best iterative method and giving visually very pleasing images as well.
Abstract: This paper introduces a new approach to deblocking of JPEG compressed images using overcomplete wavelet representations. By exploiting cross-scale correlations among wavelet coefficients, edge information in the JPEG compressed images is extracted and protected, while blocky noise in the smooth background regions is smoothed out in the wavelet domain. Compared with the iterative methods reported in the literature, our simple wavelet-based method has much lower computational complexity, yet it is capable of achieving the same peak signal-to-noise ratio (PSNR) improvement as the best iterative method and giving visually very pleasing images as well.

231 citations


Journal ArticleDOI
TL;DR: This paper describes a hybrid motion-compensated wavelet transform coder designed for encoding video at very low bit rates that outperforms the VM of MPEG-4 for coding of I-frames and matches the performance of the VM for P-frames while providing a path to spatial scalability, object scalable, and bitstream scalability.
Abstract: This paper describes a hybrid motion-compensated wavelet transform coder designed for encoding video at very low bit rates. The coder and its components have been submitted to MPEG-4 to support the functionalities of compression efficiency and scalability. Novel features of this coder are the use of overlapping block motion compensation in combination with a discrete wavelet transform followed by adaptive quantization and zerotree entropy coding, plus rate control. The coder outperforms the VM of MPEG-4 for coding of I-frames and matches the performance of the VM for P-frames while providing a path to spatial scalability, object scalability, and bitstream scalability.

Journal ArticleDOI
TL;DR: To support a good interface between the FPA and downstream signal processing stage, both conventional and CMOS readout techniques are presented and discussed and future development directions including the smart focal plane concept are introduced.
Abstract: A discussion of CMOS readout technologies for infrared (IR) imaging systems is presented. First, the description of various types of IR detector materials and structures is given. The advances of detector fabrication technology and microelectronics process technology have led to the development of large format array of IR imaging detectors. For such large IR FPAs which is the critical component of the advanced infrared imaging system, general requirement and specifications are described. To support a good interface between the FPA and downstream signal processing stage, both conventional and CMOS readout techniques are presented and discussed. Finally, future development directions including the smart focal plane concept are also introduced.

Journal ArticleDOI
TL;DR: The problem of robust video transmission in error prone environments is addressed utilizing a feedback channel between transmitter and receiver carrying acknowledgment information and a low complexity algorithm for real-time reconstruction of spatio-temporal error propagation is described in detail.
Abstract: In this paper we address the problem of robust video transmission in error prone environments. The approach is compatible with the ITU-T video coding standard H.263. Fading situations in mobile networks are tolerated and the image quality degradation due to spatio-temporal error propagation is minimized utilizing a feedback channel between transmitter and receiver carrying acknowledgment information. In a first step, corrupted group of blocks (GOB's) are concealed to avoid annoying artifacts caused by decoding of an erroneous bit stream. The GOB and the corresponding frame number are reported to the transmitter via the back channel. The encoder evaluates the negative acknowledgments and reconstructs the spatial and temporal error propagation. A low complexity algorithm for real-time reconstruction of spatio-temporal error propagation is described in detail. Rapid error recovery is achieved by INTRA refreshing image regions (macroblocks) bearing visible distortion. The feedback channel method does not introduce additional delay and is particularly relevant for real-time conversational services in mobile networks. Experimental results with bursty bit error sequences simulating a Digital European Cordless Telephony (DECT) channel are presented with different combinations of forward error correction (FEC), automatic repeat on request (ARQ), and the proposed error compensation technique, Compared to the case where FEC and ARQ are used for error correction, a gain of up to 3 dB peak signal-to-noise ratio (PSNR) is observed if error compensation is employed additionally.

Journal ArticleDOI
L. Chiariglione1
TL;DR: The MPEG-4 standard, with its network-independent nature and application-level features, is poised to become the enabling technology for multimedia communications and will therefore contribute to solve the problems that are hindering multimedia communications.
Abstract: Digital television is a reality today, but multimedia communications, after years of hype, is still a catchword. Lack of suitable multi-industry standards supporting it is one reason for the unfulfilled promise. The MPEG committee which originated the MPEG-1 and MPEG-2 standards that made digital television possible is currently developing MPEG-4 with wide industry participation. This paper describes how the MPEG-4 standard, with its network-independent nature and application-level features, is poised to become the enabling technology for multimedia communications and will therefore contribute to solve the problems that are hindering multimedia communications.

Journal ArticleDOI
TL;DR: The goal is to improve video coding efficiency by exploiting the layering of video and to support content-based functionality using a sprite technique and an affine motion model on a per-object basis.
Abstract: A layered video object coding system is presented in this paper. The goal is to improve video coding efficiency by exploiting the layering of video and to support content-based functionality. These two objectives are accomplished using a sprite technique and an affine motion model on a per-object basis. Several novel algorithms have been developed for mask processing and coding, trajectory coding, sprite accretion and coding, locally affine motion compensation, error signal suppression, and image padding. Compared with conventional frame-based coding methods, better experimental results on both hybrid and natural scenes have been obtained using our coding scheme. We also demonstrate content-based functionality which can be easily achieved in our system.

Journal ArticleDOI
TL;DR: This work describes an alternative approach wherein the compressed stream is processed in the compressed, discrete cosine transform (DCT) domain without explicit decompression and spatial domain processing, so that the output compressed stream, corresponding to the output image, conforms to the standard syntax of 8/spl times/8 blocks.
Abstract: Straightforward techniques for spatial domain processing of compressed video via decompression and recompression are computationally expensive. We describe an alternative approach wherein the compressed stream is processed in the compressed, discrete cosine transform (DCT) domain without explicit decompression and spatial domain processing, so that the output compressed stream, corresponding to the output image, conforms to the standard syntax of 8/spl times/8 blocks. We propose computation schemes for downsampling and for inverse motion compensation that are applicable to any DCT-based compression method. Worst case estimates of computation savings vary between 37% and 50% depending on the task. For typically sparse DCT blocks, the reduction in computations is more dramatic. A by-product of the proposed approach is improvement in arithmetic precision.

Journal ArticleDOI
TL;DR: A new error concealment algorithm for recovering the lost or erroneously received motion vectors is presented, which combines the overlapped motion compensation and the side match criterion to make the effect of lost motion vectors subjectively imperceptible.
Abstract: Most video sequence coding systems use block motion compensation to remove temporal redundancy for video compression due to the regularity and simplicity. A new error concealment algorithm for recovering the lost or erroneously received motion vectors is presented. It combines the overlapped motion compensation and the side match criterion to make the effect of lost motion vectors subjectively imperceptible. The side match criterion takes advantage of the spatial contiguity and interpixel correlation of image to select the best-fit replacement among the motion vectors of spatially contiguous candidate blocks. Particularly, to mask the blocking artifacts, we incorporate an overlapping technique to create a subjectively closer approximation to the true error-free image.

Journal ArticleDOI
TL;DR: A novel error concealment technique based on the discrete cosine transform (DCT) coefficients recovery and its application to the MPEG-2 bit stream error, requiring much lower computational load and simpler hardware structure than existing algorithms, while providing adequate performances.
Abstract: This paper presents a novel error concealment technique based on the discrete cosine transform (DCT) coefficients recovery and its application to the MPEG-2 bit stream error. Assuming a smoothness constraint on image intensity, an object function which describes the intersample variations at the boundaries of the lost block and the adjacent blocks is defined, and the corrupted DCT coefficients are recovered by solving a linear equation. Our approach can be regarded as a special case of Wang et al.'s (1991). However, we show that the linear equation in the proposed algorithm can be decomposed into four independent subequations, requiring much lower computational load and simpler hardware structure than existing algorithms, while providing adequate performances. To develop a generic error concealment (EC) system, the blocks corrupted by the random bit errors are identified by a multistage error detection algorithm. Thus, the proposed EC system can be applied to more realistic environments, such as concealment of random bit error in MPEG-2 bit stream. Computer simulation results show that the quality of a recovered image is significantly improved even at a bit error rate as high as 10/sup -5/.

Journal ArticleDOI
TL;DR: An object-based coding scheme is proposed for the coding of a stereoscopic image sequence using motion and disparity information and the use of the depth map information for the generation of intermediate views at the receiver is discussed.
Abstract: An object-based coding scheme is proposed for the coding of a stereoscopic image sequence using motion and disparity information. A hierarchical block-based motion estimation approach is used for initialization, while disparity estimation is performed using a pixel-based hierarchical dynamic programming algorithm. A split-and-merge segmentation procedure based on three-dimensional (3-D) motion modeling is then used to determine regions with similar motion parameters. The segmentation part of the algorithm is interleaved with the estimation part in order to optimize the coding performance of the procedure. Furthermore, a technique is examined for propagating the segmentation information with time. A 3-D motion-compensated prediction technique is used for both intensity and depth image sequence coding. Error images and depth maps are encoded using discrete cosine transform (DCT) and Huffman methods. Alternately, an efficient wireframe depth modeling technique may be used to convey depth information to the receiver. Motion and wireframe model parameters are then quantized and transmitted to the decoder along with the segmentation information. As a straightforward application, the use of the depth map information for the generation of intermediate views at the receiver is also discussed. The performance of the proposed compression methods is evaluated experimentally and is compared to other stereoscopic image sequence coding schemes.

Journal ArticleDOI
TL;DR: This paper presents a generic video coding algorithm allowing the content-based manipulation of objects thanks to the definition of a spatiotemporal segmentation of the sequences and offers a good compromise between the ability to track and manipulate objects and the coding efficiency.
Abstract: This paper presents a generic video coding algorithm allowing the content-based manipulation of objects. This manipulation is possible thanks to the definition of a spatiotemporal segmentation of the sequences. The coding strategy relies on a joint optimization in the rate-distortion sense of the partition definition and of the coding techniques to be used within each region. This optimization creates the link between the analysis and synthesis parts of the coder. The analysis defines the time evolution of the partition, as well as the elimination or the appearance of regions that are homogeneous either spatially or in motion. The coding of the texture as well as of the partition relies on region-based motion compensation techniques. The algorithm offers a good compromise between the ability to track and manipulate objects and the coding efficiency.

Journal ArticleDOI
TL;DR: Simulation results show that the proposed video coding technique using the two-stage MC significantly outperforms H.263 under identical conditions, especially for sequences with fast camera motion.
Abstract: This paper describes a high-efficiency video coding method based on ITU-T H.263. To improve the coding efficiency of H.263, a two-stage motion compensation (MC) method is proposed, consisting of global MC (GMC) for predicting camera motion and local MC (LMC) for macroblock prediction. First, global motion such as panning, tilting, and zooming is estimated, and the global-motion-compensated image is produced for use as a reference in LMC. Next, LMC is performed both for the global-motion-compensated reference image and for the image without GMC. LMC employs an affine motion model in the context of H.263's overlapped block motion compensation. Using the overlapped block affine MC, rotation and scaling of small objects can be predicted, in addition to translational motion. In the proposed method, GMC is adaptively turned on/off for each macroblock since GMC cannot be used for prediction in all regions in a frame. In addition, either an affine or a translational motion model is adaptively selected in LMC for each macroblock. Simulation results show that the proposed video coding technique using the two-stage MC significantly outperforms H.263 under identical conditions, especially for sequences with fast camera motion. The performance improvements in peak-to-peak SNR (PSNR) are about 3 dB over the original H.263, which does not use the two-stage MC.

Journal ArticleDOI
Wei Ding1
TL;DR: It is argued that a rate control scheme has to balance both issues of consistent video quality on the encoder side and bitstream smoothness for the SMG on the network side and the effectiveness of the proposed algorithm is shown.
Abstract: Rate control is considered an important issue in video coding, since it significantly affects video quality. We discuss joint encoder and channel rate control for variable bit-rate (VBR) video over packet-switched ATM networks. Since variable bit-rate traffic is allowed in such networks, an open-loop encoder without rate control can generate consistent quality video. However, in order to improve the statistical multiplexing gain (SMG), an encoder buffer is essential to meet traffic constraints imposed by networks and to smooth the highly variable video bitstream. Due to the finite buffer size, some forms of encoder rate control have to be enforced, and consequently, video quality varies. We argue that a rate control scheme has to balance both issues of consistent video quality on the encoder side and bitstream smoothness for the SMG on the network side. We present a joint encoder and channel rate control algorithm for ATM networks with leaky buckets as open-loop source flow control models. The algorithm considers constraints imposed by the encoder and decoder buffers, the leaky bucket control, traffic smoothing, and rate control. The encoder rate control is separated into a sustainable-rate control and a unidirectional instantaneous-rate control. It can improve the problem of leaky bucket saturation exhibited in previous works. Experimental results with MPEG video are presented. The results verify our analysis and show the effectiveness of the proposed algorithm.

Journal ArticleDOI
TL;DR: A variable frame rate coding algorithm is developed that could predict the quality and bits of coded images without going through the entire real-coding process, and skip the right number of picture frames to accomplish the goal of constant image quality.
Abstract: In the first part of this paper, we derived a source model describing the relationship between bits, distortion, and quantization step size for transform coders. Based on this source model, a variable frame rate coding algorithm is developed. The basic idea is to select a proper picture frame rate to ensure a minimum picture quality for every frame. Because our source model can predict approximately the number of coded bits when a certain quantization step size is used, we could predict the quality and bits of coded images without going through the entire real-coding process. Therefore, we could skip the right number of picture frames to accomplish the goal of constant image quality. Our proposed variable frame rate coding schemes are simple but quite effective as demonstrated by simulation results. The results of using another variable frame rate scheme, Test Model for H.263 (TMN-5), and the results of using a fixed frame rate coding scheme, Reference Model 8 for H.261 (RM8), are also provided for comparison.

Journal ArticleDOI
TL;DR: This work develops a novel 8/spl times/8 two-dimensional (2-D) discrete cosine transform/inverse discrete Cosine transform (DCT/IDCT) architecture based on the direct 2-D approach and the rotation technique.
Abstract: Among the various transform techniques for image compression, the discrete cosine transform (DCT) is the most popular and effective one in practical image and video coding applications, such as high-definition television (HDTV). We develop a novel 8/spl times/8 two-dimensional (2-D) discrete cosine transform/inverse discrete cosine transform (DCT/IDCT) architecture based on the direct 2-D approach and the rotation technique. The computational complexity is reduced by taking advantage of the special attribute of a complex number. Both the parallel and the folded architectures are proposed. Unlike other approaches, the proposed architecture is regular and economically allowable for VLSI implementation. Compared to the row-column method, less internal wordlength is needed in order to meet the error requirement of IDCT, and the throughput of the proposed architecture can achieve two times that of the row-column method with 30% hardware increased.

Journal ArticleDOI
K.J. O'Connell1
TL;DR: The rate-distortion comparisons presented show that the technique's adaptive nature allows it to operate efficiently over a wide range of rates and distortions and across a variety of input material, whereas other methods are efficient over more limited conditions.
Abstract: The paper presents a new technique for compactly representing the shape of a visual object within a scene. This method encodes the vertices of a polygonal approximation of the object's shape by adapting the representation to the dynamic range of the relative locations of the object's vertices and by exploiting an octant-based representation of each individual vertex. The object-level adaptation to the relative-location dynamic range provides the flexibility needed to efficiently encode objects of different sizes and with different allowed approximation distortion. At the vertex-level, the octant-based representation allows coding gains for vertices closely spaced relative to the object-level dynamic range. This vertex coding method may be used with techniques which code the polygonal approximation error for further gains in coding efficiency. Results are shown which demonstrate the effectiveness of the vertex encoding method. The rate-distortion comparisons presented show that the technique's adaptive nature allows it to operate efficiently over a wide range of rates and distortions and across a variety of input material, whereas other methods are efficient over more limited conditions.

Journal ArticleDOI
TL;DR: This paper discusses an approach for detecting edges in color images that can easily accommodate concepts, such as multiscale edge detection, as well as the latest developments in vector order statistics for color image processing.
Abstract: This paper discusses an approach for detecting edges in color images. A color image is represented by a vector field, and the color image edges are detected as differences in the local vector statistics. These statistical differences can include local variations in color or spatial image properties. The proposed approach can easily accommodate concepts, such as multiscale edge detection, as well as the latest developments in vector order statistics for color image processing. A distinction between the proposed approach and previous approaches for color edge detection using vector order statistics is that, besides the edge magnitude, the local edge direction is also provided. Note that edge direction information is a relevant feature to a variety of image analysis tasks (e.g., texture analysis).

Journal ArticleDOI
TL;DR: The demonstration of a new 128/spl times/128 CMOS APS with programmable multiresolution readout capability with 80 dB of dynamic range while dissipating only 5 mW of power.
Abstract: The development of the CMOS active pixel sensor (APS) has, for the first time, permitted large scale integration of supporting circuitry and smart camera-functions on the same chip as a high-performance image sensor. This paper reports on the demonstration of a new 128/spl times/128 CMOS APS with programmable multiresolution readout capability. By placing signal processing circuitry on the imaging focal plane, the image sensor can output data at varying resolutions which can decrease the computational load of downstream image processing. For instance, software intensive image pyramid reconstruction can be eliminated. The circuit uses a passive switched capacitor network to average arbitrarily large neighborhoods of pixels which can then be read out at any user-defined resolution by configuring a set of digital shift registers. The full resolution frame rate is 30 Hz with higher rates for all other image resolutions. The sensor achieved 80 dB of dynamic range while dissipating only 5 mW of power. Circuit error was less than -34 dB and introduced no objectionable fixed pattern noise or other artifacts into the image.

Journal ArticleDOI
TL;DR: A new morphological spatio-temporal segmentation algorithm that incorporates luminance and motion information simultaneously and uses morphological tools such as morphological filters and watershed algorithm is presented.
Abstract: This paper presents a new morphological spatio-temporal segmentation algorithm. The algorithm incorporates luminance and motion information simultaneously and uses morphological tools such as morphological filters and watershed algorithm. The procedure toward complete segmentation consists of three steps: joint marker extraction, boundary decision, and motion-based region fusion. First, the joint marker extraction identifies the presence of homogeneous regions in both motion and luminance, where a simple joint marker extraction technique is proposed. Second, the spatio-temporal boundaries are decided by the watershed algorithm. For this purpose, a new joint similarity measure is proposed. Finally, an elimination of redundant regions is done using motion-based region fusion. By incorporating spatial and temporal information simultaneously, we can obtain visually meaningful segmentation results. Simulation results demonstrates the efficiency of the proposed method.

Journal ArticleDOI
TL;DR: This paper forms the design of good fast estimation algorithms based on motion vector candidate reduction into an optimization problem that involves the checking point pattern (CPP) design via minimizing the distance from the true motion vector to the closest checking point (DCCP).
Abstract: There are basically three approaches for carrying out fast block motion estimation: (1) fast search by a reduction of motion vector candidates; (2) fast block-matching distortion (BMD) computation; and (3) motion field subsampling. The first approach has been studied more extensively since different ways of reducing motion vector candidates may result in significantly different performance; while the second and third approaches can in general be integrated into the first one so as to further accelerate the estimation process. In this paper, we first formulate the design of good fast estimation algorithms based on motion vector candidate reduction into an optimization problem that involves the checking point pattern (CPP) design via minimizing the distance from the true motion vector to the closest checking point (DCCP). Then, we demonstrate through extensive studies on the statistical behavior of real-world motion vectors that the DCCP minimization can result in fast search algorithms that are very efficient as well as highly robust. To further utilize the spatiotemporal correlation of motion vectors, we develop an adaptive search scheme and a hybrid search idea that involves a fixed CPP and a variable CPP. Simulations are performed to confirm their advantages over conventional fast search algorithms.

Journal ArticleDOI
TL;DR: The major features of the proposed MC are block-partitioning prediction and the utilization of two time-differential reference frames, which improves the image quality around the objects' boundaries and consequently reduces prediction errors.
Abstract: Several studies on very low bit rate video coding have been reported. One of the major goals of the studies is to improve the coding performance, which gives better subjective and objective quality than conventional coding methods at the same bit rate. As the shape and structure of an object in a picture are arbitrary, the performance of traditional coding with block-based motion compensation (MC) is not satisfactory. We present advanced MC schemes for very low bit-rate video coding. The major features of the proposed MC are block-partitioning prediction and the utilization of two time-differential reference frames. This coding scheme improves the image quality around the objects' boundaries and consequently reduces prediction errors. It also works well in the case of object occlusions. The combination of the proposed MC and discrete cosine transformation (DCT) shows a better performance in several test sequences than full-spec H.263.