scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2002"


Journal ArticleDOI
TL;DR: A generic Fourier descriptor (GFD) is proposed to overcome the drawbacks of existing shape representation techniques by applying two-dimensional Fourier transform on a polar-raster sampled shape image.
Abstract: Shape description is one of the key parts of image content description for image retrieval. Most of the existing shape descriptors are usually either application dependent or non-robust, making them undesirable for generic shape description. In this paper, a generic Fourier descriptor (GFD) is proposed to overcome the drawbacks of existing shape representation techniques. The proposed shape descriptor is derived by applying two-dimensional Fourier transform on a polar-raster sampled shape image. The acquired shape descriptor is application independent and robust. Experimental results show that the proposed GFD outperforms common contour-based and region-based shape descriptors.

534 citations


Journal ArticleDOI
TL;DR: Although the JPEG 2000 standard only specifies the decoder and the codesteam syntax, the discussion will span both encoder and decoder issues to provide a better understanding of the standard in various applications.
Abstract: In 1996, the JPEG committee began to investigate possibilities for a new still image compression standard to serve current and future applications. This initiative, which was named JPEG 2000, has resulted in a comprehensive standard (ISO 15444∣ITU-T Recommendation T.800) that is being issued in six parts. Part 1, in the same vein as the JPEG baseline system, is aimed at minimal complexity and maximal interchange and was issued as an International Standard at the end of 2000. Parts 2–6 define extensions to both the compression technology and the file format and are currently in various stages of development. In this paper, a technical description of Part 1 of the JPEG 2000 standard is provided, and the rationale behind the selected technologies is explained. Although the JPEG 2000 standard only specifies the decoder and the codesteam syntax, the discussion will span both encoder and decoder issues to provide a better understanding of the standard in various applications.

528 citations


Journal ArticleDOI
TL;DR: The results show that the choice of the “best” standard depends strongly on the application at hand, but that JPEG 2000 supports the widest set of features among the evaluated standards, while providing superior rate-distortion performance in most cases.
Abstract: JPEG 2000, the new ISO/ITU-T standard for still image coding, has recently reached the International Standard (IS) status. Other new standards have been recently introduced, namely JPEG-LS and MPEG-4 VTC. This paper provides a comparison of JPEG 2000 with JPEG-LS and MPEG-4 VTC, in addition to older but widely used solutions, such as JPEG and PNG, and well established algorithms, such as SPIHT. Lossless compression efficiency, fixed and progressive lossy rate-distortion performance, as well as complexity and robustness to transmission errors, are evaluated. Region of Interest coding is also discussed and its behavior evaluated. Finally, the set of provided functionalities of each standard is also evaluated. In addition, the principles behind each algorithm are briefly described. The results show that the choice of the “best” standard depends strongly on the application at hand, but that JPEG 2000 supports the widest set of features among the evaluated standards, while providing superior rate-distortion performance in most cases.

149 citations


Journal ArticleDOI
TL;DR: This paper describes the embedded block coding algorithm at the heart of the JPEG 2000 image compression standard, and discusses key considerations which led to the development and adoption of this algorithm, and also investigates performance and complexity issues.
Abstract: This paper describes the embedded block coding algorithm at the heart of the JPEG 2000 image compression standard. The paper discusses key considerations which led to the development and adoption of this algorithm, and also investigates performance and complexity issues. The JPEG 2000 coding system achieves excellent compression performance, somewhat higher (and, in some cases, substantially higher) than that of SPIHT with arithmetic coding, a popular benchmark for comparison The algorithm utilizes the same low complexity binary arithmetic coding engine as JBIG2. Together with careful design of the bit-plane coding primitives, this enables comparable execution speed to that observed with the simpler variant of SPIHT without arithmetic coding. The coder offers additional advantages including memory locality, spatial random access and ease of geometric manipulation.

130 citations


Journal ArticleDOI
TL;DR: A feature selection process that sorts the principal components, generated by principal component analysis, in the order of their importance to solve a specific recognition task and results confirm that the choice of the representation strongly influences the classification results and that a classifier has to be designed for a specific representation.
Abstract: The design of a recognition system requires careful attention to pattern representation and classifier design. Some statistical approaches choose those features, in a d-dimensional initial space, which allow sample vectors belonging to different categories to occupy compact and disjoint regions in a low-dimensional subspace. The effectiveness of the representation subspace is then determined by how well samples from different classes can be separated. In this paper, we propose a feature selection process that sorts the principal components, generated by principal component analysis, in the order of their importance to solve a specific recognition task. This method provides a low-dimensional representation subspace which has been optimized to improve the classification accuracy. We focus on the problem of facial expression recognition to demonstrate this technique. We also propose a decision tree-based classifier that provides a ‘‘coarse-to-fine’’ classification of new samples by successive projections onto more and more precise representation subspaces. Results confirm, first, that the choice of the representation strongly influences the classification results, second that a classifier has to be designed for a specific representation. r 2002 Elsevier Science B.V. All rights reserved.

129 citations


Journal ArticleDOI
TL;DR: A perceptual video quality system is proposed, that uses a linear combination of three indicators, the “edginess” of the luminance, the normalized color error and the temporal decorrelation, that showed the highest variance weighted regression overall correlation of all models.
Abstract: Modern video coding systems such as ISO-MPEG1,2,4 exploit properties of the human visual system, to reduce the bit rate at which a video sequence is coded, given a certain required video quality. As a result, to the degree in which such exploitation is successful, accurate prediction of the quality of the output video of such systems, should also take the human visual system into account. In this paper, we propose a perceptual video quality system, that uses a linear combination of three indicators. The indicators are, the “edginess” of the luminance, the normalized color error and the temporal decorrelation. In the benchmark by the Video Quality Expert Group (VQEG), a combined ITU-T and ITU-R expert group, the model showed the highest variance weighted regression overall correlation of all models.

114 citations


Journal ArticleDOI
TL;DR: An overview of these quantization methods is provided, including generalized uniform scalar dead-zone quantization and trellis coded quantization (TCQ).
Abstract: Quantization is instrumental in enabling the rich feature set of JPEG 2000. Several quantization options are provided within JPEG 2000. Part I of the standard includes only uniform scalar dead-zone quantization, while Part II allows both generalized uniform scalar dead-zone quantization and trellis coded quantization (TCQ). In this paper, an overview of these quantization methods is provided. Issues that arise when each of these methods are employed are discussed as well.

97 citations


Journal ArticleDOI
TL;DR: It is shown that the visual tool sets in JPEG 2000 are much richer than what is achievable in JPEG, where only spatially invariant frequency weighting can be exploited.
Abstract: The human visual system plays a key role in the final perceived quality of the compressed images. It is therefore desirable to allow system designers and users to take advantage of the current knowledge of visual perception and models in a compression system. In this paper, we review the various tools in JPEG 2000 that allow its users to exploit many properties of the human visual system such as spatial frequency sensitivity, color sensitivity, and the visual masking effects. We show that the visual tool sets in JPEG 2000 are much richer than what is achievable in JPEG, where only spatially invariant frequency weighting can be exploited. As a result, the visually optimized JPEG 2000 images can usually have much better visual quality than the visually optimized JPEG images at the same bit rates. Some visual comparisons between different visual optimization tools, as well as some visual comparisons between JPEG 2000 and JPEG, will be shown.

90 citations


Journal ArticleDOI
TL;DR: The results in this paper show that the Maxshift method can be used to greatly increase the compression efficiency by lowering the quality of the background and that it also makes it possible to receive the ROI before the background, when transmitting the image.
Abstract: This paper describes the functionality in the JPEG 2000 Part 1 standard, for encoding images with predefined regions of interest (ROI) of arbitrary shape. The method described is called the Maxshift method. This method is based on scaling of the wavelet coefficients after the wavelet transformation and quantization. By sufficiently scaling the wavelet coefficients used to reconstruct the ROI, all the information pertaining to the ROI is placed before the information pertaining to the rest of the image (background), in the codestream. By varying the quantization of the image and by truncation of the codestream, different quality for the ROI and for the background can be obtained. A description is also given of how the wavelet coefficients that are used to reconstruct the ROI (ROI mask) can be found. Since the decoder uses only the number of significant bitplanes for each wavelet coefficient to determine whether it should be scaled back, an arbitrary set of wavelet coefficients can be scaled on the encoder side. This means that there is no need to encode or send the shape of the ROI. This paper also describes how this can be used to further enhance the ROI functionality. The results in this paper show that the Maxshift method can be used to greatly increase the compression efficiency by lowering the quality of the background and that it also makes it possible to receive the ROI before the background, when transmitting the image.

89 citations


Journal ArticleDOI
TL;DR: The video analysis system described in this paper aims at facial expression recognition consistent with the MPEG4 standardized parameters for facial animation, FAP using an improved active contour algorithm and a Hidden Markov Model classifier.
Abstract: The video analysis system described in this paper aims at facial expression recognition consistent with the MPEG4 standardized parameters for facial animation, FAP. For this reason, two levels of analysis are necessary: low-level analysis to extract the MPEG4 compliant parameters and high-level analysis to estimate the expression of the sequence using these low-level parameters. The low-level analysis is based on an improved active contour algorithm that uses high level information based on principal component analysis to locate the most significant contours of the face (eyebrows and mouth), and on motion estimation to track them. The high-level analysis takes as input the FAP produced by the low-level analysis tool and, by means of a Hidden Markov Model classifier, detects the expression of the sequence.

89 citations


Journal ArticleDOI
TL;DR: A novel approach to the problem of compressing the significant quantity of data required to represent integral 3D images is presented and it was found that the proposed algorithm improves the rate-distortion performance when compared to baseline JPEG and previously reported 3D-DCT compression scheme with respect to compression ratio and subjective and objective image quality.
Abstract: Integral imaging is employed as part of a three-dimensional imaging system, allowing the display of full colour images with continuous parallax within a wide viewing zone. A significant quantity of data is required to represent a captured integral 3D image with high resolution. A lossy compression scheme has been developed based on the use of a 3D-DCT, which make possible efficient storage and transmission of such images, while maintaining all information necessary to produce a high quality 3D display. In this paper, a novel approach to the problem of compressing the significant quantity of data required to represent integral 3D images is presented. The algorithm is based on using a variable number of microlens images (or sub-images) in the computation of the 3D-DCT. It involves segmentation of the planar mean image formed by the mean values of the microlens images and it takes advantage of the high cross-correlation between the sub-images generated by the microlens array. The algorithm has been simulated on several integral 3D images. It was found that the proposed algorithm improves the rate-distortion performance when compared to baseline JPEG and previously reported 3D-DCT compression scheme with respect to compression ratio and subjective and objective image quality.

Journal ArticleDOI
TL;DR: An open, flexible and modular immersive TV system that is backwards compatible to today's 2D digital television and that is able to support a wide range of different 2D and 3D displays is introduced.
Abstract: Depth perception in images and video has been a relevant research issue for years, with the main focus on the basic idea of “stereoscopic” viewing. However, it is well known from the literature that stereovision is only one of the relevant depth cues and that motion parallax, as well as color, brightness and geometric appearance of video objects are at least of the same importance and that their individual influence mainly depending on the object distance. Thus, for depth perception it may sometimes be sufficient to watch pictures or movies on large screens with brilliant quality or to provide head-motion parallax viewing on conventional 2D displays. Based on this observation we introduce an open, flexible and modular immersive TV system that is backwards compatible to today's 2D digital television and that is able to support a wide range of different 2D and 3D displays. The system is based on a three-stage concept and aims to add more and more depth cues at each additional layer.

Journal ArticleDOI
TL;DR: Two optimization techniques of weighted vector median filters, parametrized by a set of N weights, are proposed for colour image processing and evaluated by simulations related to the denoising of textured, or natural, colour images, in the presence of impulsive noise.
Abstract: Weighted vector median filters (WVMF) are a powerful tool for the non-linear processing of multi-components signals. These filters are parametrized by a set of N weights and, in this paper, we propose two optimization techniques of these weights for colour image processing. The first one is an adaptive optimization of the N−1 peripheral weights of the filter mask. The major and more difficult task is to get a mathematical expression for the derivative of the WVMF output with respect to its weights; two approximations are proposed to measure this filter output sensitivity. The second optimization technique corresponds to a global optimization of the central weight alone, the value of which is determined, in a noise reduction context, by an analytical expression depending upon the mask size and the probability of occurrence of an impulsive noise. Both approaches are evaluated by simulations related to the denoising of textured, or natural, colour images, in the presence of impulsive noise. Furthermore, as they are complementary, they are also tested when used together.

Journal ArticleDOI
TL;DR: An analytical approach to pilot-based synchronization algorithms for data hiding in still images by proposing the use of the Levenberg–Marquardt's method for nonlinear least-squares estimation, and showing how an estimate of the geometrical transformation parameters can be obtained.
Abstract: In this paper we present an analytical approach to pilot-based synchronization algorithms for data hiding in still images. A representative algorithm belonging to the family of those exploiting a regular structure in the spreading sequence is chosen for study. We improve it by proposing the use of the Levenberg–Marquardt's method for nonlinear least-squares estimation, and show how an estimate of the geometrical transformation parameters can be obtained. A statistical model for the estimation error in the parameters is derived and theoretically justified. This allows to quantify the resolution of the algorithm for a certain watermark structure. Moreover, the increase in the bit error probability of the hidden information for a given transformation and interpolation scheme is theoretically analyzed and quantified. Finally, we provide experimental results that support our analysis.

Journal ArticleDOI
TL;DR: The novel approach to detect features automatically using a statistical analysis for facial information is described, not only interested in the location of the features but also the shape of local features.
Abstract: There are two main processes to create a 3D animatable facial model from photographs. The first is to extract features such as eyes, nose, mouth and chin curves on the photographs. The second is to create 3D individualized facial model using extracted feature information. The final facial model is expected to have an individualized shape, photograph-realistic skin color, and animatable structures. Here, we describe our novel approach to detect features automatically using a statistical analysis for facial information. We are not only interested in the location of the features but also the shape of local features. How to create 3D models from detected features is also explained and several resulting 3D facial models are illustrated and discussed.

Journal ArticleDOI
TL;DR: A new psycho-visual test method based on natural scenery stimuli and operates in the wavelet domain is proposed, which is finally used to evaluate the performance of various masking models under conditions as found in real image processing applications like compression.
Abstract: Various image processing applications exploit a model of the human visual system (HVS). One element of HVS-models describes the masking-effect, which is typically parameterized by psycho-visual experiments that employ superimposed sinusoidal stimuli. Those stimuli are oversimplified with respect to real images and can capture only very elementary masking-effects. To overcome these limitations a new psycho-visual test method is proposed. It is based on natural scenery stimuli and operates in the wavelet domain. The collected psycho-visual data is finally used to evaluate the performance of various masking models under conditions as found in real image processing applications like compression.

Journal ArticleDOI
TL;DR: The proposed algorithm is efficient and effective in reducing ringing artifacts as well as blocking artifacts in the low bit-rate block-based video coding.
Abstract: In this paper, we present a method to reduce blocking and ringing artifacts in low bit-rate block-based video coding. For each block, its DC value and DC values of the surrounding eight neighbor blocks are exploited to predict low frequency AC coefficients, which allow to infer spatial characteristics of a block before quantization stage in the encoding system. These predicted AC coefficients are used to classify each block into either of two categories, low-activity or high-activity block. In the following post-processing stage, two kinds of low pass filters are adaptively applied according to the classified result on each block. It allows for strong low pass filtering in low-activity regions where the blocking artifacts are most noticeable, whereas it allows for weak low pass filtering in high-activity regions to reduce ringing noise as well as blocking artifacts without introducing undesired blur. In the former case, the blocking artifacts are reduced by one-dimensional (1-D) horizontal and vertical low pass filters. In the latter case, both deblocking and deringing are considered by using either 3-tap or 2-tap filter, which make the architecture simple. TMN8 decoder for H.263+ is used to test the proposed method. The experimental results are evaluated in both subjective and objective manner, and show that the proposed algorithm is efficient and effective in reducing ringing artifacts as well as blocking artifacts in the low bit-rate block-based video coding.

Journal ArticleDOI
TL;DR: This paper aims to identify the potential improvements to compression performance through improved decorrelation through two adaptive prediction schemes presented that aim to provide the highest possible decorrelation of the prediction error data.
Abstract: Lossless image compression is often performed through decorrelation, context modelling and entropy coding of the prediction error. This paper aims to identify the potential improvements to compression performance through improved decorrelation. Two adaptive prediction schemes are presented that aim to provide the highest possible decorrelation of the prediction error data. Consequently, complexity is overlooked and a high degree of adaptivity is sought. The adaptation of the respective predictor coefficients is based on training of the predictors in a local causal area adjacent to the pixel to be predicted. The causal nature of the training means no transmission overhead is required and also enables lossless coding of the images. The first scheme is an adaptive neural network, trained on the actual data being coded enabling continuous updates of the network weights. This results in a highly adaptive predictor, with localised optimisation based on stochastic gradient learning. Training for the second scheme is based on the recursive LMS (RLMS) algorithm incorporating feedback of the prediction error. In addition to the adaptive prediction, the results presented here also incorporate an arithmetic coding scheme, producing results which are better than CALIC.

Journal ArticleDOI
TL;DR: A technique for embedded coding of large images using zero trees with reduced memory requirements is devised and a strip based non-embedded coding which uses a single pass algorithm is described to handle high-input data rates.
Abstract: Thanks to advances in sensor technology, today we have many applications (space-borne imaging, medical imaging, etc.) where images of large sizes are generated. Straightforward application of wavelet techniques for above images involves certain difficulties. Embedded coders such as EZW and SPIHT require that the wavelet transform of the full image be buffered for coding. Since the transform coefficients also require storing in high precision, buffering requirements for large images become prohibitively high. In this paper, we first devise a technique for embedded coding of large images using zero trees with reduced memory requirements. A 'strip buffer' capable of holding few lines of wavelet coefficients from all the subbands belonging to the same spatial location is employed. A pipeline architecure for a line implementation of above technique is then proposed. Further, an efficient algorithm to extract an encoded bitstream corresponding to a region of interest in the image has also been developed. Finally, the paper describes a strip based non-embedded coding which uses a single pass algorithm. This is to handle high-input data rates. (C) 2002 Elsevier Science B.V. All rights reserved.

Journal ArticleDOI
TL;DR: A novel procedure for fingerprint enhancement filter design that is adapted to the input image characteristics to improve its efficiency and quantify and justify the functional relationship between image features and filter parameters.
Abstract: A novel procedure for fingerprint enhancement filter design is described. Fingerprints are best used as unique and invariant identifiers of individuals. Identification of fingerprint images is based on matching the features obtained from a query image against those stored in a database. Poor quality of fingerprint images makes serious problems in the performance of subsequent matching process. The main contribution of this work is to quantify and justify the functional relationship between image features and filter parameters. In this work, the enhancement process is adapted to the input image characteristics to improve its efficiency. Experimental results show the superiority of the proposed enhancement algorithm compared to the best fingerprint enhancement procedures reported in the literature.

Journal ArticleDOI
TL;DR: The proposed algorithm allows one to impose desirable properties from each type of image coder, such as progressive transmission, the zerotree structure, and range-domain block decoding, as well as improving compression performance.
Abstract: In this paper, a hybrid fractal zerotree wavelet (FZW) image coding algorithm is proposed. The algorithm couples a zerotree-based encoder, such as the embedded zerotree wavelet (EZW) coder or set partitioning in hierarchical trees, and a fractal image coder; this coupling is done in the wavelet domain. Based on perceptually-weighted distortion-rate calculations, a fractal method is adaptively applied to the parts of an image that can be encoded more efficiently relative to an EZW coder at a given rate. In addition to improving compression performance, the proposed algorithm also allows one to impose desirable properties from each type of image coder, such as progressive transmission, the zerotree structure, and range-domain block decoding.

Journal ArticleDOI
TL;DR: A spline model for different orders is introduced, both for orthogonal and hexagonal lattices, and an expression for a least-squares approximation is derived which can be applied to convolution-based resampling.
Abstract: Resampling is a common operation in digital image processing systems. The standard procedure involves the (conceptual) reconstruction of a continuous image succeeded by sampling on the new lattice sites. When the reconstruction is done by classical interpolation functions, results might be sub-optimal because the information loss is not minimized. In the particular case of subsampling (i.e., resampling to a coarser lattice), aliasing artifacts might arise and produce disturbing moire patterns. This paper first introduces a spline model for different orders, both for orthogonal and hexagonal lattices. Next, an expression for a least-squares approximation is derived which can be applied to convolution-based resampling. Experimental results for a printing application demonstrate the feasibility of the proposed method and are compared against the standard approach. Our technique can be applied to general least-squares resampling between regular lattices.

Journal ArticleDOI
TL;DR: This paper describes the binary format, metadata architecture, and colorspace encoding architecture of the JPEG 2000 file format and shows how this format can be used as the basis for more advanced applications, such as the upcoming motion JPEG 2000 standard.
Abstract: While there exist many different image file formats, the JPEG committee concluded that none of those formats addressed a majority of the needs of tomorrow's complicated imaging applications. Many formats do not provide sufficient flexibility for the intelligent storage and maintenance of metadata. Others are very restrictive in terms of colorspace specification. Others provide flexibility, but with a very high cost because of complexity. The JPEG 2000 file format addresses these concerns by combining a simple binary container with flexible metadata architecture and a useful yet simple mechanism for encoding the colorspace of an image. This paper describes the binary format, metadata architecture, and colorspace encoding architecture of the JPEG 2000 file format. It also shows how this format can be used as the basis for more advanced applications, such as the upcoming motion JPEG 2000 standard.

Journal ArticleDOI
TL;DR: This paper presents a novel method in facial modeling and animation based on non-uniform rational B-spline curves used to model and animate human facial expressions.
Abstract: This paper presents a novel method in facial modeling and animation. Non-uniform rational B-spline (NURBS) curves are used to model and animate human facial expressions. Based on facial anatomy, control polygons are positioned, and sample points of the NURBS curves are associated with the facial mesh geometrically. Features of facial muscles are simulated based on fuzzy sets. Facial muscle movements can be modeled by changing the weights or moving the control points of the NURBS curves. The method is significantly different from the traditional parameterization methods in facial animation.

Journal ArticleDOI
TL;DR: The paper describes the principle of image decomposition, the possibilities for recursive calculation, its basic characteristics and modifications, and some results of its modelling show the application capacities in image coding systems.
Abstract: This paper presents a method for pyramidal image decomposition called “inverse” because of the order followed to obtain the pyramid levels: from top to bottom, in correspondence with the requirement for “progressive” image transmission. The pyramid top (level zero) consists in selecting the low-frequency coefficients of the discrete cosine image transform. The following pyramid levels are made up of low-frequency discrete cosine transform (DCT) coefficients of the subimages obtained from quadtree division at each level. The quadtree root coincides with the pyramid top. The first level is the difference between the image and its approximation obtained by inverse DCT. The following (second) level is a difference too, between the previous (first) level and its approximation obtained with inverse DCT for every subimage in the first level, etc. The paper describes the principle of image decomposition, the possibilities for recursive calculation, its basic characteristics and modifications. The block diagram and the generalised scheme of the decomposition are given and some results of its modelling show the application capacities in image coding systems.

Journal ArticleDOI
TL;DR: This paper addresses the problem of error-correction for extremely noisy channels (BER from 0.1 to 0.5) and presents a coding scheme concatenating a repetition code with another one, and design rules in order to select these codes for a given watermarking application are developed.
Abstract: This paper addresses the problem of error-correction for extremely noisy channels (BER from 0.1 to 0.5), such as those obtained for image or video watermarking. Minimum distance arguments are used to identify a region for which no other code is as efficient as repetition codes, whatever the rate, at least when bounded decoding is considered. However, in order to obtain a reasonable and sufficiently low BER, repetition codes are not very efficient. We present a coding scheme concatenating a repetition code with another one, and design rules in order to select these codes for a given watermarking application are developed. The repetition code lowers the huge channel BER, as no other code can do this part of the job. Then, the second more powerful code working at a lower BER achieves a larger BER reduction. In this paper, this role is devoted to BCH codes, as members of a classical family. Thanks to their moderate decoding complexity, they turn out to be an interesting cost versus performance trade-off, while more efficient coding schemes based on soft decoding are far more complex. However, we also provide an idea of the solutions to consider for watermarking applications with fewer complexity limitations, for which more powerful decoding techniques can be implemented. (C) 2002 Published by Elsevier Science B.V.

Journal ArticleDOI
TL;DR: This work proposes a novel face detection and semantic human object generation algorithm, which combines skin color segmentation and facial filters with a contour-based temporal tracking procedure to provide variant semantic video objects by using the same function.
Abstract: Automatic semantic video object extraction is an important step for providing content-based video coding, indexing and retrieval. However, it is very difficult to design a generic semantic video object extraction technique, which can provide variant semantic video objects by using the same function. Since the presence and absence of persons in an image sequence provide important clues about video content, automatic face detection and human being generation are very attractive for content-based video database applications. For this reason, we propose a novel face detection and semantic human object generation algorithm. The homogeneous image regions with accurate boundaries are first obtained by integrating the results of color edge detection and region growing procedures. The human faces are detected from these homogeneous image regions by using skin color segmentation and facial filters. These detected faces are then used as object seed for semantic human object generation. The correspondences of the detected faces and semantic human objects along time axis are further exploited by a contour-based temporal tracking procedure.

Journal ArticleDOI
Guang Deng1
TL;DR: This paper is concerned with adaptive prediction for lossless image coding, and a new predictor involves constructing a good predictor for each pixel using the transform domain LMS algorithm and adaptively combining it with a set of fixed predictors.
Abstract: This paper is concerned with adaptive prediction for lossless image coding A new predictor is proposed This predictor involves two major steps: constructing a good predictor for each pixel using the transform domain LMS algorithm and adaptively combining it with a set of fixed predictors The first step is targeting areas where simple predictors do not perform well, while the second step is an effective method to reduce the modelling costs associated with the uncertainty of the models When a context-based arithmetic encoder is used to encode the prediction error, the compression performance of the proposed algorithm is better than or comparable to that of other published algorithms

Journal ArticleDOI
TL;DR: New adaptive H.263+ rate control algorithms for video streaming and interactive video applications under networks supporting bandwidth renegotiation, which can communicate with end-users to accommodate their time-varying bandwidth requests during the data transmission so the overall network utilization can be improved simultaneously.
Abstract: This paper presents new adaptive H.263+ rate control algorithms for video streaming and interactive video applications under networks supporting bandwidth renegotiation, which can communicate with end-users to accommodate their time-varying bandwidth requests during the data transmission. That is, the requests of end-users can be supported adaptively according to the availability of the network resources, and thus the overall network utilization can be improved simultaneously. They are especially suitable for the transmission of non-stationary video traffics. The proposed rate control algorithms communicate with the network to renegotiate the required bandwidth for the underlying video which are measured based on the motion change information, and choose their control strategies according to the renegotiation results. Unlike most conventional algorithms that control only the spatial quality by adjusting quantization parameters, the proposed algorithms treat both the spatial and temporal qualities at the same time to enhance human visual perceptual quality. Experimental results are provided to demonstrate that the proposed rate control algorithms can achieve superior performance to the conventional ones with low computational complexity under networks supporting bandwidth renegotiation.

Journal ArticleDOI
TL;DR: An interactive authoring system is proposed for semi-automatic video object (VO) segmentation and annotation, which features a new contour interpolation algorithm, which enables the user to define the contour of a VO on multiple frames while the computer interpolates the missing contours of this object on every frame automatically.
Abstract: An interactive authoring system is proposed for semi-automatic video object (VO) segmentation and annotation. This system features a new contour interpolation algorithm, which enables the user to define the contour of a VO on multiple frames while the computer interpolates the missing contours of this object on every frame automatically. Typical active contour (snake) model is adapted and the contour interpolation problem is decomposed into a two-directional contour tracking problem and a merging problem. In addition, new user interaction models are designed for the user to interact with the computer. Experiments indicate that this system offers a good balance between algorithm complexity and user interaction efficiency.