scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Image Processing in 2004"


Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations


Journal ArticleDOI
TL;DR: The simultaneous propagation of texture and structure information is achieved by a single, efficient algorithm that combines the advantages of two approaches: exemplar-based texture synthesis and block-based sampling process.
Abstract: A new algorithm is proposed for removing large objects from digital images. The challenge is to fill in the hole that is left behind in a visually plausible way. In the past, this problem has been addressed by two classes of algorithms: 1) "texture synthesis" algorithms for generating large image regions from sample textures and 2) "inpainting" techniques for filling in small image gaps. The former has been demonstrated for "textures"-repeating two-dimensional patterns with some stochasticity; the latter focus on linear "structures" which can be thought of as one-dimensional patterns, such as lines and object contours. This paper presents a novel and efficient algorithm that combines the advantages of these two approaches. We first note that exemplar-based texture synthesis contains the essential process required to replicate both texture and structure; the success of structure propagation, however, is highly dependent on the order in which the filling proceeds. We propose a best-first algorithm in which the confidence in the synthesized pixel values is propagated in a manner similar to the propagation of information in inpainting. The actual color values are computed using exemplar-based synthesis. In this paper, the simultaneous propagation of texture and structure information is achieved by a single , efficient algorithm. Computational efficiency is achieved by a block-based sampling process. A number of examples on real and synthetic images demonstrate the effectiveness of our algorithm in removing large occluding objects, as well as thin scratches. Robustness with respect to the shape of the manually selected target region is also demonstrated. Our results compare favorably to those obtained by existing techniques.

3,066 citations


Journal ArticleDOI
TL;DR: This paper proposes an alternate approach using L/sub 1/ norm minimization and robust regularization based on a bilateral prior to deal with different data and noise models and demonstrates its superiority to other super-resolution methods.
Abstract: Super-resolution reconstruction produces one or a set of high-resolution images from a set of low-resolution images. In the last two decades, a variety of super-resolution methods have been proposed. These methods are usually very sensitive to their assumed model of data and noise, which limits their utility. This paper reviews some of these methods and addresses their shortcomings. We propose an alternate approach using L/sub 1/ norm minimization and robust regularization based on a bilateral prior to deal with different data and noise models. This computationally inexpensive method is robust to errors in motion and blur estimation and results in images with sharp edges. Simulation results confirm the effectiveness of our method and demonstrate its superiority to other super-resolution methods.

2,175 citations


Journal ArticleDOI
TL;DR: Results indicate that the spatial, quad-based algorithm developed for color images allows for hiding the largest payload at the highest signal-to-noise ratio.
Abstract: A reversible watermarking algorithm with very high data-hiding capacity has been developed for color images. The algorithm allows the watermarking process to be reversed, which restores the exact original image. The algorithm hides several bits in the difference expansion of vectors of adjacent pixels. The required general reversible integer transform and the necessary conditions to avoid underflow and overflow are derived for any vector of arbitrary length. Also, the potential payload size that can be embedded into a host image is discussed, and a feedback system for controlling this size is developed. In addition, to maximize the amount of data that can be hidden into an image, the embedding algorithm can be applied recursively across the color components. Simulation results using spatial triplets, spatial quads, cross-color triplets, and cross-color quads are presented and compared with the existing reversible watermarking algorithms. These results indicate that the spatial, quad-based algorithm allows for hiding the largest payload at the highest signal-to-noise ratio.

1,149 citations


Journal ArticleDOI
TL;DR: Quantitative evaluation and comparison show that the proposed Bayesian framework for foreground object detection in complex environments provides much improved results.
Abstract: This paper addresses the problem of background modeling for foreground object detection in complex environments. A Bayesian framework that incorporates spectral, spatial, and temporal features to characterize the background appearance is proposed. Under this framework, the background is represented by the most significant and frequent features, i.e., the principal features , at each pixel. A Bayes decision rule is derived for background and foreground classification based on the statistics of principal features. Principal feature representation for both the static and dynamic background pixels is investigated. A novel learning method is proposed to adapt to both gradual and sudden "once-off" background changes. The convergence of the learning process is analyzed and a formula to select a proper learning rate is derived. Under the proposed framework, a novel algorithm for detecting foreground objects from complex environments is then established. It consists of change detection, change classification, foreground segmentation, and background maintenance. Experiments were conducted on image sequences containing targets of interest in a variety of environments, e.g., offices, public buildings, subway stations, campuses, parking lots, airports, and sidewalks. Good results of foreground detection were obtained. Quantitative evaluation and comparison with the existing method show that the proposed method provides much improved results.

1,120 citations


Journal ArticleDOI
TL;DR: The basic idea is that local sharp variation points, denoting the appearing or vanishing of an important image structure, are utilized to represent the characteristics of the iris.
Abstract: Unlike other biometrics such as fingerprints and face, the distinct aspect of iris comes from randomly distributed features. This leads to its high reliability for personal identification, and at the same time, the difficulty in effectively representing such details in an image. This paper describes an efficient algorithm for iris recognition by characterizing key local variations. The basic idea is that local sharp variation points, denoting the appearing or vanishing of an important image structure, are utilized to represent the characteristics of the iris. The whole procedure of feature extraction includes two steps: 1) a set of one-dimensional intensity signals is constructed to effectively characterize the most important information of the original two-dimensional image; 2) using a particular class of wavelets, a position sequence of local sharp variation points in such signals is recorded as features. We also present a fast matching scheme based on exclusive OR operation to compute the similarity between a pair of position sequences. Experimental results on 2 255 iris images show that the performance of the proposed method is encouraging and comparable to the best iris recognition algorithm found in the current literature.

999 citations


Journal ArticleDOI
TL;DR: A general-purpose usefulness of the algorithm is suggested in improving compression ratios of unconstrained video, based on a nonlinear integration of low-level visual cues, mimicking processing in primate occipital, and posterior parietal cortex.
Abstract: We evaluate the applicability of a biologically-motivated algorithm to select visually-salient regions of interest in video streams for multiply-foveated video compression. Regions are selected based on a nonlinear integration of low-level visual cues, mimicking processing in primate occipital, and posterior parietal cortex. A dynamic foveation filter then blurs every frame, increasingly with distance from salient locations. Sixty-three variants of the algorithm (varying number and shape of virtual foveas, maximum blur, and saliency competition) are evaluated against an outdoor video scene, using MPEG-1 and constant-quality MPEG-4 (DivX) encoding. Additional compression radios of 1.1 to 8.5 are achieved by foveation. Two variants of the algorithm are validated against eye fixations recorded from four to six human observers on a heterogeneous collection of 50 video clips (over 45 000 frames in total). Significantly higher overlap than expected by chance is found between human and algorithmic foveations. With both variants, foveated clips are, on average, approximately half the size of unfoveated clips, for both MPEG-1 and MPEG-4. These results suggest a general-purpose usefulness of the algorithm in improving compression ratios of unconstrained video.

796 citations


Journal ArticleDOI
TL;DR: This work proposes an approach that incorporates appearance-adaptive models in a particle filter to realize robust visual tracking and recognition algorithms and demonstrates the effectiveness and robustness of the tracking algorithm.
Abstract: We present an approach that incorporates appearance-adaptive models in a particle filter to realize robust visual tracking and recognition algorithms. Tracking needs modeling interframe motion and appearance changes, whereas recognition needs modeling appearance changes between frames and gallery images. In conventional tracking algorithms, the appearance model is either fixed or rapidly changing, and the motion model is simply a random walk with fixed noise variance. Also, the number of particles is typically fixed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following modifications: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptive-velocity model is derived using a first-order linear predictor based on the appearance difference between the incoming observation and the previous particle configuration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in a particle filter. For recognition purposes, we model the appearance changes between frames and gallery images by constructing the intra- and extrapersonal spaces. Accurate recognition is achieved when confronted by pose and view variations.

742 citations


Journal ArticleDOI
TL;DR: A view-based approach to recognize humans from their gait by employing a hidden Markov model (HMM) and the statistical nature of the HMM lends overall robustness to representation and recognition.
Abstract: We propose a view-based approach to recognize humans from their gait. Two different image features have been considered: the width of the outer contour of the binarized silhouette of the walking person and the entire binary silhouette itself. To obtain the observation vector from the image features, we employ two different methods. In the first method, referred to as the indirect approach, the high-dimensional image feature is transformed to a lower dimensional space by generating what we call the frame to exemplar (FED) distance. The FED vector captures both structural and dynamic traits of each individual. For compact and effective gait representation and recognition, the gait information in the FED vector sequences is captured in a hidden Markov model (HMM). In the second method, referred to as the direct approach, we work with the feature vector directly (as opposed to computing the FED) and train an HMM. We estimate the HMM parameters (specifically the observation probability B) based on the distance between the exemplars and the image features. In this way, we avoid learning high-dimensional probability density functions. The statistical nature of the HMM lends overall robustness to representation and recognition. The performance of the methods is illustrated using several databases.

579 citations


Journal ArticleDOI
TL;DR: A novel approach to multiresolution signal-level image fusion is presented for accurately transferring visual information from any number of input image signals, into a single fused image without loss of information or the introduction of distortion.
Abstract: A novel approach to multiresolution signal-level image fusion is presented for accurately transferring visual information from any number of input image signals, into a single fused image without loss of information or the introduction of distortion. The proposed system uses a "fuse-then-decompose" technique realized through a novel, fusion/decomposition system architecture. In particular, information fusion is performed on a multiresolution gradient map representation domain of image signal information. At each resolution, input images are represented as gradient maps and combined to produce new, fused gradient maps. Fused gradient map signals are processed, using gradient filters derived from high-pass quadrature mirror filters to yield a fused multiresolution pyramid representation. The fused output image is obtained by applying, on the fused pyramid, a reconstruction process that is analogous to that of conventional discrete wavelet transform. This new gradient fusion significantly reduces the amount of distortion artefacts and the loss of contrast information usually observed in fused images obtained from conventional multiresolution fusion schemes. This is because fusion in the gradient map domain significantly improves the reliability of the feature selection and information fusion processes. Fusion performance is evaluated through informal visual inspection and subjective psychometric preference tests, as well as objective fusion performance measurements. Results clearly demonstrate the superiority of this new approach when compared to conventional fusion systems.

536 citations


Journal ArticleDOI
TL;DR: The theoretical optimal shift that maximizes the quality of the authors' shifted linear interpolation is nonzero and close to 1/5, and this optimal value is similar to that of the computationally more costly "high-quality" cubic convolution.
Abstract: We present a simple, original method to improve piecewise-linear interpolation with uniform knots: we shift the sampling knots by a fixed amount, while enforcing the interpolation property. We determine the theoretical optimal shift that maximizes the quality of our shifted linear interpolation. Surprisingly enough, this optimal value is nonzero and close to 1/5. We confirm our theoretical findings by performing several experiments: a cumulative rotation experiment and a zoom experiment. Both show a significant increase of the quality of the shifted method with respect to the standard one. We also observe that, in these results, we get a quality that is similar to that of the computationally more costly "high-quality" cubic convolution.


Journal ArticleDOI
TL;DR: This paper proposes a wavelet-tree-based blind watermarking scheme for copyright protection that embeds each watermark bit in perceptually important frequency bands, which renders the mark more resistant to frequency based attacks.
Abstract: This paper proposes a wavelet-tree-based blind watermarking scheme for copyright protection. The wavelet coefficients of the host image are grouped into so-called super trees. The watermark is embedded by quantizing super trees. The trees are so quantized that they exhibit a large enough statistical difference, which will later be used for watermark extraction. Each watermark bit is embedded in perceptually important frequency bands, which renders the mark more resistant to frequency based attacks. Also, the watermark is spread throughout large spatial regions. This yields more robustness against time domain geometric attacks. Examples of various attacks will be given to demonstrate the robustness of the proposed technique.

Journal ArticleDOI
TL;DR: This paper proposes a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection.
Abstract: In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.

Journal ArticleDOI
TL;DR: A novel maximum a posteriori estimator for enhancing the spatial resolution of an image using co-registered high spatial-resolution imagery from an auxiliary sensor, focusing on the use of high-resolution panchromatic data to enhance hyperspectral imagery.
Abstract: This paper presents a novel maximum a posteriori estimator for enhancing the spatial resolution of an image using co-registered high spatial-resolution imagery from an auxiliary sensor. Here, we focus on the use of high-resolution panchromatic data to enhance hyperspectral imagery. However, the estimation framework developed allows for any number of spectral bands in the primary and auxiliary image. The proposed technique is suitable for applications where some correlation, either localized or global, exists between the auxiliary image and the image being enhanced. To exploit localized correlations, a spatially varying statistical model, based on vector quantization, is used. Another important aspect of the proposed algorithm is that it allows for the use of an accurate observation model relating the "true" scene with the low-resolutions observations. Experimental results with hyperspectral data derived from the airborne visible-infrared imaging spectrometer are presented to demonstrate the efficacy of the proposed estimator.

Journal ArticleDOI
TL;DR: A new watermarking system based on the principles of informed coding and informed embedding that encodes watermark messages with a modified trellis code in which a given message may be represented by a variety of different signals, with the embedded signal selected according to the cover image.
Abstract: We describe a new watermarking system based on the principles of informed coding and informed embedding. This system is capable of embedding 1380 bits of information in images with dimensions 240/spl times/368 pixels. Experiments on 2000 images indicate the watermarks are robust to significant valumetric distortions, including additive noise, low-pass filtering, changes in contrast, and lossy compression. Our system encodes watermark messages with a modified trellis code in which a given message may be represented by a variety of different signals, with the embedded signal selected according to the cover image. The signal is embedded by an iterative method that seeks to ensure the message will not be confused with other messages, even after addition of noise. Fidelity is improved by the incorporation of perceptual shaping into the embedding process. We show that each of these three components improves performance substantially.

Journal ArticleDOI
TL;DR: It is shown that the amplitude distribution of the complex wave, the real and the imaginery components of which are assumed to be distributed by the /spl alpha/-stable distribution, is a generalization of the Rayleigh distribution.
Abstract: Synthetic aperture radar (SAR) imagery has found important applications due to its clear advantages over optical satellite imagery one of them being able to operate in various weather conditions. However, due to the physics of the radar imaging process, SAR images contain unwanted artifacts in the form of a granular look which is called speckle. The assumptions of the classical SAR image generation model lead to a Rayleigh distribution model for the histogram of the SAR image. However, some experimental data such as images of urban areas show impulsive characteristics that correspond to underlying heavy-tailed distributions, which are clearly non-Rayleigh. Some alternative distributions have been suggested such as the Weibull, log-normal, and the k-distribution which had success in varying degrees depending on the application. Recently, an alternative model namely the /spl alpha/-stable distribution has been suggested for modeling radar clutter. In this paper, we show that the amplitude distribution of the complex wave, the real and the imaginery components of which are assumed to be distributed by the /spl alpha/-stable distribution, is a generalization of the Rayleigh distribution. We demonstrate that the amplitude distribution is a mixture of Rayleighs as is the k-distribution in accordance with earlier work on modeling SAR images which showed that almost all successful SAR image models could be expressed as mixtures of Rayleighs. We also present parameter estimation techniques based on negative order moments for the new model. Finally, we test the performance of the model on urban images and compare with other models such as Rayleigh, Weibull, and the k-distribution.

Journal ArticleDOI
TL;DR: The proposed techniques have proven to be highly robust to all geometric manipulations, filtering, compression and slight cropping which are performed as part of StirMark attacks as well as noise addition, both Gaussian and salt & pepper.
Abstract: Surviving geometric attacks in image watermarking is considered to be of great importance. In this paper, the watermark is used in an authentication context. Two solutions are being proposed for such a problem. Both geometric and invariant moments are used in the proposed techniques. An invariant watermark is designed and tested against attacks performed by StirMark using the invariant moments. On the other hand, an image normalization technique is also proposed which creates a normalized environment for watermark embedding and detection. The proposed algorithms have the advantage of being robust, computationally efficient, and no overhead needs to be transmitted to the decoder side. The proposed techniques have proven to be highly robust to all geometric manipulations, filtering, compression and slight cropping which are performed as part of StirMark attacks as well as noise addition, both Gaussian and salt & pepper.

Journal ArticleDOI
TL;DR: A new approach to tracking using active contours is presented, which aims to find the region within the current image, such that the sample distribution of the interior of the region most closely matches the model distribution.
Abstract: A new approach to tracking using active contours is presented. The class of objects to be tracked is assumed to be characterized by a probability distribution over some variable, such as intensity, color, or texture. The goal of the algorithm is to find the region within the current image, such that the sample distribution of the interior of the region most closely matches the model distribution. Two separate criteria for matching distributions are examined, and the curve evolution equations are derived in each case. The flows are shown to perform well in experiments.

Journal ArticleDOI
TL;DR: An alternative formulation in which total variation is used as a constraint in a general convex programming framework is proposed, which places no limitation on the incorporation of additional constraints in the restoration process and the resulting optimization problem can be solved efficiently via block-iterative methods.
Abstract: Total variation has proven to be a valuable concept in connection with the recovery of images featuring piecewise smooth components. So far, however, it has been used exclusively as an objective to be minimized under constraints. In this paper, we propose an alternative formulation in which total variation is used as a constraint in a general convex programming framework. This approach places no limitation on the incorporation of additional constraints in the restoration process and the resulting optimization problem can be solved efficiently via block-iterative methods. Image denoising and deconvolution applications are demonstrated.

Journal ArticleDOI
TL;DR: It is demonstrated that an acceptable, expedient solution of the energy functional is possible through a search of the image-level lines: boundaries of connected components within the level sets obtained by threshold decomposition.
Abstract: We propose a cell detection and tracking solution using image-level sets computed via threshold decomposition. In contrast to existing methods where manual initialization is required to track individual cells, the proposed approach can automatically identify and track multiple cells by exploiting the shape and intensity characteristics of the cells. The capture of the cell boundary is considered as an evolution of a closed curve that maximizes image gradient along the curve enclosing a homogeneous region. An energy functional dependent upon the gradient magnitude along the cell boundary, the region homogeneity within the cell boundary and the spatial overlap of the detected cells is minimized using a variational approach. For tracking between frames, this energy functional is modified considering the spatial and shape consistency of a cell as it moves in the video sequence. The integrated energy functional complements shape-based segmentation with a spatial consistency based tracking technique. We demonstrate that an acceptable, expedient solution of the energy functional is possible through a search of the image-level lines: boundaries of connected components within the level sets obtained by threshold decomposition. The level set analysis can also capture multiple cells in a single frame rather than iteratively computing a single active contour for each individual cell. Results of cell detection using the energy functional approach and the level set approach are presented along with the associated processing time. Results of successful tracking of rolling leukocytes from a number of digital video sequences are reported and compared with the results from a correlation tracking scheme.

Journal ArticleDOI
TL;DR: The orthonormal version of Tchebichef moments is introduced, and the recursive procedure used for polynomial evaluation can be suitably modified to reduce the accumulation of numerical errors.
Abstract: Discrete orthogonal moments have several computational advantages over continuous moments. However, when the moment order becomes large, discrete orthogonal moments (such as the Tchebichef moments) tend to exhibit numerical instabilities. This paper introduces the orthonormal version of Tchebichef moments, and analyzes some of their computational aspects. The recursive procedure used for polynomial evaluation can be suitably modified to reduce the accumulation of numerical errors. The proposed set of moments can be used for representing image shape features and for reconstructing an image from its moments with a high degree of accuracy.

Journal ArticleDOI
TL;DR: The design of finite-size linear-phase separable kernels for differentiation of discrete multidimensional signals is described and a numerical procedure for optimizing the constraint is developed, which is used in constructing a set of example filters.
Abstract: We describe the design of finite-size linear-phase separable kernels for differentiation of discrete multidimensional signals. The problem is formulated as an optimization of the rotation-invariance of the gradient operator, which results in a simultaneous constraint on a set of one-dimensional low-pass prefilter and differentiator filters up to the desired order. We also develop extensions of this formulation to both higher dimensions and higher order directional derivatives. We develop a numerical procedure for optimizing the constraint, and demonstrate its use in constructing a set of example filters. The resulting filters are significantly more accurate than those commonly used in the image and multidimensional signal processing literature.

Journal ArticleDOI
TL;DR: This work uses partial differential equation techniques to remove noise from digital images using a total-variation filter to smooth the normal vectors of the level curves of a noise image and finite difference schemes are used to solve these equations.
Abstract: In this work, we use partial differential equation techniques to remove noise from digital images. The removal is done in two steps. We first use a total-variation filter to smooth the normal vectors of the level curves of a noise image. After this, we try to find a surface to fit the smoothed normal vectors. For each of these two stages, the problem is reduced to a nonlinear partial differential equation. Finite difference schemes are used to solve these equations. A broad range of numerical examples are given in the paper.

Journal ArticleDOI
TL;DR: It is shown that parametric snakes can guarantee low curvature curves, but only if they are described in the curvilinear abscissa, and a new edge-based energy term is introduced that will force this configuration.
Abstract: Parametric active contour models are one of the preferred approaches for image segmentation because of their computational efficiency and simplicity. However, they have a few drawbacks which limit their performance. In this paper, we identify some of these problems and propose efficient solutions to get around them. The widely-used gradient magnitude-based energy is parameter dependent; its use will negatively affect the parametrization of the curve and, consequently, its stiffness. Hence, we introduce a new edge-based energy that is independent of the parameterization. It is also more robust since it takes into account the gradient direction as well. We express this energy term as a surface integral, thus unifying it naturally with the region-based schemes. The unified framework enables the user to tune the image energy to the application at hand. We show that parametric snakes can guarantee low curvature curves, but only if they are described in the curvilinear abscissa. Since normal curve evolution do not ensure constant arc-length, we propose a new internal energy term that will force this configuration. The curve evolution can sometimes give rise to closed loops in the contour, which will adversely interfere with the optimization algorithm. We propose a curve evolution scheme that prevents this condition.

Journal ArticleDOI
TL;DR: An image retrieval framework that integrates efficient region-based representation in terms of storage and complexity and effective on-line learning capability and a region weighting strategy is introduced to optimally weight the regions and enable the system to self-improve.
Abstract: An image retrieval framework that integrates efficient region-based representation in terms of storage and complexity and effective on-line learning capability is proposed. The framework consists of methods for region-based image representation and comparison, indexing using modified inverted files, relevance feedback, and learning region weighting. By exploiting a vector quantization method, both compact and sparse (vector) region-based image representations are achieved. Using the compact representation, an indexing scheme similar to the inverted file technology and an image similarity measure based on Earth Mover's Distance are presented. Moreover, the vector representation facilitates a weighted query point movement algorithm and the compact representation enables a classification-based algorithm for relevance feedback. Based on users' feedback information, a region weighting strategy is also introduced to optimally weight the regions and enable the system to self-improve. Experimental results on a database of 10 000 general-purposed images demonstrate the efficiency and effectiveness of the proposed framework.

Journal ArticleDOI
TL;DR: The fundamental performance limits for the problem of image registration as derived from the Cramer-Rao inequality are presented, and the bias of the popular gradient-based estimator is derived and explored showing how widely used multiscale methods for improving performance can be explained with this bias expression.
Abstract: The task of image registration is fundamental in image processing. It often is a critical preprocessing step to many modern image processing and computer vision tasks, and many algorithms and techniques have been proposed to address the registration problem. Often, the performances of these techniques have been presented using a variety of relative measures comparing different estimators, leaving open the critical question of overall optimality. In this paper, we present the fundamental performance limits for the problem of image registration as derived from the Cramer-Rao inequality. We compare the experimental performance of several popular methods with respect to this performance bound, and explain the fundamental tradeoff between variance and bias inherent to the problem of image registration. In particular, we derive and explore the bias of the popular gradient-based estimator showing how widely used multiscale methods for improving performance can be explained with this bias expression. Finally, we present experimental simulations showing the general rule-of-thumb performance limits for gradient-based image registration techniques.

Journal ArticleDOI
TL;DR: An unsupervised algorithm for learning a finite mixture model from multivariate data based on the Dirichlet distribution, which offers high flexibility for modeling data.
Abstract: This paper presents an unsupervised algorithm for learning a finite mixture model from multivariate data. This mixture model is based on the Dirichlet distribution, which offers high flexibility for modeling data. The proposed approach for estimating the parameters of a Dirichlet mixture is based on the maximum likelihood (ML) and Fisher scoring methods. Experimental results are presented for the following applications: estimation of artificial histograms, summarization of image databases for efficient retrieval, and human skin color modeling and its application to skin detection in multimedia databases.

Journal ArticleDOI
Jie Zhou1, Jinwei Gu1
TL;DR: A model-based method for the computation of orientation field estimation that has a robust performance on different fingerprint images and shows that the performance of a whole fingerprint recognition system can be improved by applying this algorithm instead of previous orientation estimation methods.
Abstract: As a global feature of fingerprints, the orientation field is very important for automatic fingerprint recognition. Many algorithms have been proposed for orientation field estimation, but their results are unsatisfactory, especially for poor quality fingerprint images. In this paper, a model-based method for the computation of orientation field is proposed. First a combination model is established for the representation of the orientation field by considering its smoothness except for several singular points, in which a polynomial model is used to describe the orientation field globally and a point-charge model is taken to improve the accuracy locally at each singular point. When the coarse field is computed by using the gradient-based algorithm, a further result can be gained by using the model for a weighted approximation. Due to the global approximation, this model-based orientation field estimation algorithm has a robust performance on different fingerprint images. A further experiment shows that the performance of a whole fingerprint recognition system can be improved by applying this algorithm instead of previous orientation estimation methods.

Journal ArticleDOI
TL;DR: The diagonal slice of the fourth-order cumulants is proportional to the autocorrelation of a related noiseless sinusoidal signal with identical frequencies and is proposed to use to estimate a power spectrum from which the harmonic frequencies can be easily extracted.
Abstract: In this paper, a method of harmonics extraction from Higher Order Statistics (HOS) is developed for texture decomposition. We show that the diagonal slice of the fourth-order cumulants is proportional to the autocorrelation of a related noiseless sinusoidal signal with identical frequencies. We propose to use this fourth-order cumulants slice to estimate a power spectrum from which the harmonic frequencies can be easily extracted. Hence, a texture can be decomposed into deterministic components and indeterministic components as in a unified texture model through a Wold-like decomposition procedure. The simulation and experimental results demonstrated that this method is effective for texture decomposition and it performs better than traditional lower order statistics based decomposition methods.