scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Image Processing in 2007"


Journal ArticleDOI
TL;DR: An algorithm based on an enhanced sparse representation in transform domain based on a specially developed collaborative Wiener filtering achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.
Abstract: We propose a novel image denoising strategy based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2D image fragments (e.g., blocks) into 3D data arrays which we call "groups." Collaborative Altering is a special procedure developed to deal with these 3D groups. We realize it using the three successive steps: 3D transformation of a group, shrinkage of the transform spectrum, and inverse 3D transformation. The result is a 3D estimate that consists of the jointly filtered grouped image blocks. By attenuating the noise, the collaborative filtering reveals even the finest details shared by grouped blocks and, at the same time, it preserves the essential unique features of each individual block. The filtered blocks are then returned to their original positions. Because these blocks are overlapping, for each pixel, we obtain many different estimates which need to be combined. Aggregation is a particular averaging procedure which is exploited to take advantage of this redundancy. A significant improvement is obtained by a specially developed collaborative Wiener filtering. An algorithm based on this novel denoising strategy and its efficient implementation are presented in full detail; an extension to color-image denoising is also developed. The experimental results demonstrate that this computationally scalable algorithm achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.

7,912 citations


Journal ArticleDOI
TL;DR: This paper introduces two-step 1ST (TwIST) algorithms, exhibiting much faster convergence rate than 1ST for ill-conditioned problems, and introduces a monotonic version of TwIST (MTwIST); although the convergence proof does not apply, the effectiveness of the new methods are experimentally confirmed on problems of image deconvolution and of restoration with missing samples.
Abstract: Iterative shrinkage/thresholding (1ST) algorithms have been recently proposed to handle a class of convex unconstrained optimization problems arising in image restoration and other linear inverse problems. This class of problems results from combining a linear observation model with a nonquadratic regularizer (e.g., total variation or wavelet-based regularization). It happens that the convergence rate of these 1ST algorithms depends heavily on the linear observation operator, becoming very slow when this operator is ill-conditioned or ill-posed. In this paper, we introduce two-step 1ST (TwIST) algorithms, exhibiting much faster convergence rate than 1ST for ill-conditioned problems. For a vast class of nonquadratic convex regularizers (lscrP norms, some Besov norms, and total variation), we show that TwIST converges to a minimizer of the objective function, for a given range of values of its parameters. For noninvertible observation operators, we introduce a monotonic version of TwIST (MTwIST); although the convergence proof does not apply to this scenario, we give experimental evidence that MTwIST exhibits similar speed gains over IST. The effectiveness of the new methods are experimentally confirmed on problems of image deconvolution and of restoration with missing samples.

1,870 citations


Journal ArticleDOI
TL;DR: This paper adapt and expand kernel regression ideas for use in image denoising, upscaling, interpolation, fusion, and more and establishes key relationships with some popular existing methods and shows how several of these algorithms are special cases of the proposed framework.
Abstract: In this paper, we make contact with the field of nonparametric statistics and present a development and generalization of tools and results for use in image processing and reconstruction. In particular, we adapt and expand kernel regression ideas for use in image denoising, upscaling, interpolation, fusion, and more. Furthermore, we establish key relationships with some popular existing methods and show how several of these algorithms, including the recently popularized bilateral filter, are special cases of the proposed framework. The resulting algorithms and analyses are amply illustrated with practical examples

1,457 citations


Journal ArticleDOI
TL;DR: The experimental results for many standard test images show that prediction-error expansion doubles the maximum embedding capacity when compared to difference expansion, and there is a significant improvement in the quality of the watermarked image, especially at moderate embedding capacities.
Abstract: Reversible watermarking enables the embedding of useful information in a host signal without any loss of host information. Tian's difference-expansion technique is a high-capacity, reversible method for data embedding. However, the method suffers from undesirable distortion at low embedding capacities and lack of capacity control due to the need for embedding a location map. We propose a histogram shifting technique as an alternative to embedding the location map. The proposed technique improves the distortion performance at low embedding capacities and mitigates the capacity control problem. We also propose a reversible data-embedding technique called prediction-error expansion. This new technique better exploits the correlation inherent in the neighborhood of a pixel than the difference-expansion scheme. Prediction-error expansion and histogram shifting combine to form an effective method for data embedding. The experimental results for many standard test images show that prediction-error expansion doubles the maximum embedding capacity when compared to difference expansion. There is also a significant improvement in the quality of the watermarked image, especially at moderate embedding capacities

1,229 citations


Journal ArticleDOI
TL;DR: The proposed VSNR metric is generally competitive with current metrics of visual fidelity; it is efficient both in terms of its low computational complexity and in termsof its low memory requirements; and it operates based on physical luminances and visual angle (rather than on digital pixel values and pixel-based dimensions) to accommodate different viewing conditions.
Abstract: This paper presents an efficient metric for quantifying the visual fidelity of natural images based on near-threshold and suprathreshold properties of human vision. The proposed metric, the visual signal-to-noise ratio (VSNR), operates via a two-stage approach. In the first stage, contrast thresholds for detection of distortions in the presence of natural images are computed via wavelet-based models of visual masking and visual summation in order to determine whether the distortions in the distorted image are visible. If the distortions are below the threshold of detection, the distorted image is deemed to be of perfect visual fidelity (VSNR = infin)and no further analysis is required. If the distortions are suprathreshold, a second stage is applied which operates based on the low-level visual property of perceived contrast, and the mid-level visual property of global precedence. These two properties are modeled as Euclidean distances in distortion-contrast space of a multiscale wavelet decomposition, and VSNR is computed based on a simple linear sum of these distances. The proposed VSNR metric is generally competitive with current metrics of visual fidelity; it is efficient both in terms of its low computational complexity and in terms of its low memory requirements; and it operates based on physical luminances and visual angle (rather than on digital pixel values and pixel-based dimensions) to accommodate different viewing conditions.

1,153 citations


Journal ArticleDOI
TL;DR: A new hypothesis for color constancy namely the gray-edge hypothesis, which assumes that the average edge difference in a scene is achromatic is proposed, and an algorithm forcolor constancy is proposed based on the derivative structure of images.
Abstract: Color constancy is the ability to measure colors of objects independent of the color of the light source. A well-known color constancy method is based on the gray-world assumption which assumes that the average reflectance of surfaces in the world is achromatic. In this paper, we propose a new hypothesis for color constancy namely the gray-edge hypothesis, which assumes that the average edge difference in a scene is achromatic. Based on this hypothesis, we propose an algorithm for color constancy. Contrary to existing color constancy algorithms, which are computed from the zero-order structure of images, our method is based on the derivative structure of images. Furthermore, we propose a framework which unifies a variety of known (gray-world, max-RGB, Minkowski norm) and the newly proposed gray-edge and higher order gray-edge algorithms. The quality of the various instantiations of the framework is tested and compared to the state-of-the-art color constancy methods on two large data sets of images recording objects under a large number of different light sources. The experiments show that the proposed color constancy algorithms obtain comparable results as the state-of-the-art color constancy methods with the merit of being computationally more efficient.

801 citations


Journal ArticleDOI
TL;DR: A novel approach to image filtering based on the shape-adaptive discrete cosine transform is presented, in particular, image denoising and image deblocking and deringing from block-DCT compression and a special structural constraint in luminance-chrominance space is proposed to enable an accurate filtering of color images.
Abstract: The shape-adaptive discrete cosine transform (SA-DCT) transform can be computed on a support of arbitrary shape, but retains a computational complexity comparable to that of the usual separable block-DCT (B-DCT). Despite the near-optimal decorrelation and energy compaction properties, application of the SA-DCT has been rather limited, targeted nearly exclusively to video compression. In this paper, we present a novel approach to image filtering based on the SA-DCT. We use the SA-DCT in conjunction with the Anisotropic Local Polynomial Approximation-Intersection of Confidence Intervals technique, which defines the shape of the transform's support in a pointwise adaptive manner. The thresholded or attenuated SA-DCT coefficients are used to reconstruct a local estimate of the signal within the adaptive-shape support. Since supports corresponding to different points are in general overlapping, the local estimates are averaged together using adaptive weights that depend on the region's statistics. This approach can be used for various image-processing tasks. In this paper, we consider, in particular, image denoising and image deblocking and deringing from block-DCT compression. A special structural constraint in luminance-chrominance space is also proposed to enable an accurate filtering of color images. Simulation experiments show a state-of-the-art quality of the final estimate, both in terms of objective criteria and visual appearance. Thanks to the adaptive support, reconstructed edges are clean, and no unpleasant ringing artifacts are introduced by the fitted transform

721 citations


Journal ArticleDOI
TL;DR: Two novel methods for facial expression recognition in facial image sequences are presented, one based on deformable models and the other based on grid-tracking and deformation systems.
Abstract: In this paper, two novel methods for facial expression recognition in facial image sequences are presented. The user has to manually place some of Candide grid nodes to face landmarks depicted at the first frame of the image sequence under examination. The grid-tracking and deformation system used, based on deformable models, tracks the grid in consecutive video frames over time, as the facial expression evolves, until the frame that corresponds to the greatest facial expression intensity. The geometrical displacement of certain selected Candide nodes, defined as the difference of the node coordinates between the first and the greatest facial expression intensity frame, is used as an input to a novel multiclass Support Vector Machine (SVM) system of classifiers that are used to recognize either the six basic facial expressions or a set of chosen Facial Action Units (FAUs). The results on the Cohn-Kanade database show a recognition accuracy of 99.7% for facial expression recognition using the proposed multiclass SVMs and 95.1% for facial expression recognition based on FAU detection

676 citations


Journal ArticleDOI
TL;DR: An interscale orthonormal wavelet thresholding algorithm is described based on this new approach and its near-optimal performance is described by comparing it with the results of three state-of-the-art nonredundant denoising algorithms on a large set of test images.
Abstract: This paper introduces a new approach to orthonormal wavelet image denoising. Instead of postulating a statistical model for the wavelet coefficients, we directly parametrize the denoising process as a sum of elementary nonlinear processes with unknown weights. We then minimize an estimate of the mean square error between the clean image and the denoised one. The key point is that we have at our disposal a very accurate, statistically unbiased, MSE estimate-Stein's unbiased risk estimate-that depends on the noisy image alone, not on the clean one. Like the MSE, this estimate is quadratic in the unknown weights, and its minimization amounts to solving a linear system of equations. The existence of this a priori estimate makes it unnecessary to devise a specific statistical model for the wavelet coefficients. Instead, and contrary to the custom in the literature, these coefficients are not considered random any more. We describe an interscale orthonormal wavelet thresholding algorithm based on this new approach and show its near-optimal performance-both regarding quality and CPU requirement-by comparing it with the results of three state-of-the-art nonredundant denoising algorithms on a large set of test images. An interesting fallout of this study is the development of a new, group-delay-based, parent-child prediction in a wavelet dyadic tree

641 citations


Journal ArticleDOI
TL;DR: The proposed methods are successfully applied to face recognition, and the experiment results on the large-scale FERET and CAS-PEAL databases show that the proposed algorithms significantly outperform other well-known systems in terms of recognition rate.
Abstract: A novel object descriptor, histogram of Gabor phase pattern (HGPP), is proposed for robust face recognition. In HGPP, the quadrant-bit codes are first extracted from faces based on the Gabor transformation. Global Gabor phase pattern (GGPP) and local Gabor phase pattern (LGPP) are then proposed to encode the phase variations. GGPP captures the variations derived from the orientation changing of Gabor wavelet at a given scale (frequency), while LGPP encodes the local neighborhood variations by using a novel local XOR pattern (LXP) operator. They are both divided into the nonoverlapping rectangular regions, from which spatial histograms are extracted and concatenated into an extended histogram feature to represent the original image. Finally, the recognition is performed by using the nearest-neighbor classifier with histogram intersection as the similarity measurement. The features of HGPP lie in two aspects: 1) HGPP can describe the general face images robustly without the training procedure; 2) HGPP encodes the Gabor phase information, while most previous face recognition methods exploit the Gabor magnitude information. In addition, Fisher separation criterion is further used to improve the performance of HGPP by weighing the subregions of the image according to their discriminative powers. The proposed methods are successfully applied to face recognition, and the experiment results on the large-scale FERET and CAS-PEAL databases show that the proposed algorithms significantly outperform other well-known systems in terms of recognition rate

613 citations


Journal ArticleDOI
TL;DR: New extensions to the previously published multivariate alteration detection (MAD) method for change detection in bi-temporal, multi- and hypervariate data such as remote sensing imagery and three regularization schemes are described.
Abstract: This paper describes new extensions to the previously published multivariate alteration detection (MAD) method for change detection in bi-temporal, multi- and hypervariate data such as remote sensing imagery. Much like boosting methods often applied in data mining work, the iteratively reweighted (IR) MAD method in a series of iterations places increasing focus on "difficult" observations, here observations whose change status over time is uncertain. The MAD method is based on the established technique of canonical correlation analysis: for the multivariate data acquired at two points in time and covering the same geographical region, we calculate the canonical variates and subtract them from each other. These orthogonal differences contain maximum information on joint change in all variables (spectral bands). The change detected in this fashion is invariant to separate linear (affine) transformations in the originally measured variables at the two points in time, such as 1) changes in gain and offset in the measuring device used to acquire the data, 2) data normalization or calibration schemes that are linear (affine) in the gray values of the original variables, or 3) orthogonal or other affine transformations, such as principal component (PC) or maximum autocorrelation factor (MAF) transformations. The IR-MAD method first calculates ordinary canonical and original MAD variates. In the following iterations we apply different weights to the observations, large weights being assigned to observations that show little change, i.e., for which the sum of squared, standardized MAD variates is small, and small weights being assigned to observations for which the sum is large. Like the original MAD method, the iterative extension is invariant to linear (affine) transformations of the original variables. To stabilize solutions to the (IR-)MAD problem, some form of regularization may be needed. This is especially useful for work on hyperspectral data. This paper describes ordinary two-set canonical correlation analysis, the MAD transformation, the iterative extension, and three regularization schemes. A simple case with real Landsat Thematic Mapper (TM) data at one point in time and (partly) constructed data at the other point in time that demonstrates the superiority of the iterative scheme over the original MAD method is shown. Also, examples with SPOT High Resolution Visible data from an agricultural region in Kenya, and hyperspectral airborne HyMap data from a small rural area in southeastern Germany are given. The latter case demonstrates the need for regularization

Journal ArticleDOI
TL;DR: Several new results are proved which strongly support the claim that the SI does not compromise the usefulness of this class of algorithms, and introduce a new algorithm, resulting from using bounds for nonconvex regularizers, which confirms the superior performance of this method, when compared to the one based on quadratic majorization.
Abstract: Standard formulations of image/signal deconvolution under wavelet-based priors/regularizers lead to very high-dimensional optimization problems involving the following difficulties: the non-Gaussian (heavy-tailed) wavelet priors lead to objective functions which are nonquadratic, usually nondifferentiable, and sometimes even nonconvex; the presence of the convolution operator destroys the separability which underlies the simplicity of wavelet-based denoising. This paper presents a unified view of several recently proposed algorithms for handling this class of optimization problems, placing them in a common majorization-minimization (MM) framework. One of the classes of algorithms considered (when using quadratic bounds on nondifferentiable log-priors) shares the infamous ?singularity issue? (SI) of ?iteratively re weighted least squares? (IRLS) algorithms: the possibility of having to handle infinite weights, which may cause both numerical and convergence issues. In this paper, we prove several new results which strongly support the claim that the SI does not compromise the usefulness of this class of algorithms. Exploiting the unified MM perspective, we introduce a new algorithm, resulting from using bounds for nonconvex regularizers; the experiments confirm the superior performance of this method, when compared to the one based on quadratic majorization. Finally, an experimental comparison of the several algorithms, reveals their relative merits for different standard types of scenarios.

Journal ArticleDOI
TL;DR: Hypercomplex numbers, specifically quaternions, are used to define a Fourier transform applicable to color images, and the properties of the transform are developed, and it is shown that the transform may be computed using two standard complex fast Fourier transforms.
Abstract: Fourier transforms are a fundamental tool in signal and image processing, yet, until recently, there was no definition of a Fourier transform applicable to color images in a holistic manner. In this paper, hypercomplex numbers, specifically quaternions, are used to define a Fourier transform applicable to color images. The properties of the transform are developed, and it is shown that the transform may be computed using two standard complex fast Fourier transforms. The resulting spectrum is explained in terms of familiar phase and modulus concepts, and a new concept of hypercomplex axis. A method for visualizing the spectrum using color graphics is also presented. Finally, a convolution operational formula in the spectral domain is discussed

Journal ArticleDOI
TL;DR: The presented algorithms use the fact that the relationship between stimulus and perception is logarithmic and afford a marriage between enhancement qualities and computational efficiency to choose the best parameters and transform for each enhancement.
Abstract: Many applications of histograms for the purposes of image processing are well known. However, applying this process to the transform domain by way of a transform coefficient histogram has not yet been fully explored. This paper proposes three methods of image enhancement: a) logarithmic transform histogram matching, b) logarithmic transform histogram shifting, and c) logarithmic transform histogram shaping using Gaussian distributions. They are based on the properties of the logarithmic transform domain histogram and histogram equalization. The presented algorithms use the fact that the relationship between stimulus and perception is logarithmic and afford a marriage between enhancement qualities and computational efficiency. A human visual system-based quantitative measurement of image contrast improvement is also defined. This helps choose the best parameters and transform for each enhancement. A number of experimental results are presented to illustrate the performance of the proposed algorithms

Journal ArticleDOI
TL;DR: New filter banks specially designed for undecimated wavelet decompositions which have some useful properties such as being robust to ringing artifacts which appear generally in wavelet-based denoising methods are presented.
Abstract: This paper describes the undecimated wavelet transform and its reconstruction. In the first part, we show the relation between two well known undecimated wavelet transforms, the standard undecimated wavelet transform and the isotropic undecimated wavelet transform. Then we present new filter banks specially designed for undecimated wavelet decompositions which have some useful properties such as being robust to ringing artifacts which appear generally in wavelet-based denoising methods. A range of examples illustrates the results

Journal ArticleDOI
TL;DR: A new energy minimization framework for phase unwrapping with considered objective functions are first-order Markov random fields and two algorithms, which solve integer optimization problems by computing a sequence of binary optimizations, each one solved by graph cut techniques are named.
Abstract: Phase unwrapping is the inference of absolute phase from modulo-2pi phase. This paper introduces a new energy minimization framework for phase unwrapping. The considered objective functions are first-order Markov random fields. We provide an exact energy minimization algorithm, whenever the corresponding clique potentials are convex, namely for the phase unwrapping classical Lp norm, with pges1. Its complexity is KT(n,3n), where K is the length of the absolute phase domain measured in 2pi units and T(n,m) is the complexity of a max-flow computation in a graph with n nodes and m edges. For nonconvex clique potentials, often used owing to their discontinuity preserving ability, we face an NP-hard problem for which we devise an approximate solution. Both algorithms solve integer optimization problems by computing a sequence of binary optimizations, each one solved by graph cut techniques. Accordingly, we name the two algorithms PUMA, for phase unwrapping max-flow/min-cut. A set of experimental results illustrates the effectiveness of the proposed approach and its competitiveness in comparison with state-of-the-art phase unwrapping algorithms

Journal ArticleDOI
TL;DR: This work compares the performance of eight optimization methods: gradient descent, quasi-Newton, nonlinear conjugate gradient, Kiefer-Wolfowitz, simultaneous perturbation, Robbins-Monro, and evolution strategy, and shows that the Robbins- Monro method is the best choice in most applications.
Abstract: A popular technique for nonrigid registration of medical images is based on the maximization of their mutual information, in combination with a deformation field parameterized by cubic B-splines. The coordinate mapping that relates the two images is found using an iterative optimization procedure. This work compares the performance of eight optimization methods: gradient descent (with two different step size selection algorithms), quasi-Newton, nonlinear conjugate gradient, Kiefer-Wolfowitz, simultaneous perturbation, Robbins-Monro, and evolution strategy. Special attention is paid to computation time reduction by using fewer voxels to calculate the cost function and its derivatives. The optimization methods are tested on manually deformed CT images of the heart, on follow-up CT chest scans, and on MR scans of the prostate acquired using a BFFE, Tl, and T2 protocol. Registration accuracy is assessed by computing the overlap of segmented edges. Precision and convergence properties are studied by comparing deformation fields. The results show that the Robbins-Monro method is the best choice in most applications. With this approach, the computation time per iteration can be lowered approximately 500 times without affecting the rate of convergence by using a small subset of the image, randomly selected in every iteration, to compute the derivative of the mutual information. From the other methods the quasi-Newton and the nonlinear conjugate gradient method achieve a slightly higher precision, at the price of larger computation times.

Journal ArticleDOI
TL;DR: A new class of fractional-order anisotropic diffusion equations for noise removal are introduced which are Euler-Lagrange equations of a cost functional which is an increasing function of the absolute value of the fractional derivative of the image intensity function.
Abstract: This paper introduces a new class of fractional-order anisotropic diffusion equations for noise removal. These equations are Euler-Lagrange equations of a cost functional which is an increasing function of the absolute value of the fractional derivative of the image intensity function, so the proposed equations can be seen as generalizations of second-order and fourth-order anisotropic diffusion equations. We use the discrete Fourier transform to implement the numerical algorithm and give an iterative scheme in the frequency domain. It is one important aspect of the algorithm that it considers the input image as a periodic image. To overcome this problem, we use a folded algorithm by extending the image symmetrically about its borders. Finally, we list various numerical results on denoising real images. Experiments show that the proposed fractional-order anisotropic diffusion equations yield good visual effects and better signal-to-noise ratio.

Journal ArticleDOI
TL;DR: Examples and comparisons are presented to show the advantages of this innovation, including superior noise robustness, reduced computational cost, and the flexibility of tailoring the force field.
Abstract: Snakes, or active contours, have been widely used in image processing applications. Typical roadblocks to consistent performance include limited capture range, noise sensitivity, and poor convergence to concavities. This paper proposes a new external force for active contours, called vector field convolution (VFC), to address these problems. VFC is calculated by convolving the edge map generated from the image with the user-defined vector field kernel. We propose two structures for the magnitude function of the vector field kernel, and we provide an analytical method to estimate the parameter of the magnitude function. Mixed VFC is introduced to alleviate the possible leakage problem caused by choosing inappropriate parameters. We also demonstrate that the standard external force and the gradient vector flow (GVF) external force are special cases of VFC in certain scenarios. Examples and comparisons with GVF are presented in this paper to show the advantages of this innovation, including superior noise robustness, reduced computational cost, and the flexibility of tailoring the force field.

Journal ArticleDOI
TL;DR: It is shown that a denoising algorithm merely amounts to solving a linear system of equations which is obviously fast and efficient, and the very competitive results obtained by performing a simple threshold on the undecimated Haar wavelet coefficients show that the SURE-LET principle has a huge potential.
Abstract: We propose a new approach to image denoising, based on the image-domain minimization of an estimate of the mean squared error-Stein's unbiased risk estimate (SURE). Unlike most existing denoising algorithms, using the SURE makes it needless to hypothesize a statistical model for the noiseless image. A key point of our approach is that, although the (nonlinear) processing is performed in a transformed domain-typically, an undecimated discrete wavelet transform, but we also address nonorthonormal transforms-this minimization is performed in the image domain. Indeed, we demonstrate that, when the transform is a ldquotightrdquo frame (an undecimated wavelet transform using orthonormal filters), separate subband minimization yields substantially worse results. In order for our approach to be viable, we add another principle, that the denoising process can be expressed as a linear combination of elementary denoising processes-linear expansion of thresholds (LET). Armed with the SURE and LET principles, we show that a denoising algorithm merely amounts to solving a linear system of equations which is obviously fast and efficient. Quite remarkably, the very competitive results obtained by performing a simple threshold (image-domain SURE optimized) on the undecimated Haar wavelet coefficients show that the SURE-LET principle has a huge potential.

Journal ArticleDOI
TL;DR: The use of a model for binary inpainting based on the Cahn-Hilliard equation is outlined, which allows for fast, efficient inPainting of degraded text, as well as super-resolution of high contrast images.
Abstract: Image inpainting is the filling in of missing or damaged regions of images using information from surrounding areas. We outline here the use of a model for binary inpainting based on the Cahn-Hilliard equation, which allows for fast, efficient inpainting of degraded text, as well as super-resolution of high contrast images

Journal ArticleDOI
TL;DR: A relation between the local directional variance of theimage intensity and the local geometry of the image, which can justify the choice of the gradient and the principal curvature directions as a basis for the diffusion matrix is shown.
Abstract: Ultrasound imaging systems provide the clinician with noninvasive, low-cost, and real-time images that can help them in diagnosis, planning, and therapy. However, although the human eye is able to derive the meaningful information from these images, automatic processing is very difficult due to noise and artifacts present in the image. The speckle reducing anisotropic diffusion filter was recently proposed to adapt the anisotropic diffusion filter to the characteristics of the speckle noise present in the ultrasound images and to facilitate automatic processing of images. We analyze the properties of the numerical scheme associated with this filter, using a semi-explicit scheme. We then extend the filter to a matrix anisotropic diffusion, allowing different levels of filtering across the image contours and in the principal curvature directions. We also show a relation between the local directional variance of the image intensity and the local geometry of the image, which can justify the choice of the gradient and the principal curvature directions as a basis for the diffusion matrix. Finally, different filtering techniques are compared on a 2-D synthetic image with two different levels of multiplicative noise and on a 3-D synthetic image of a Y-junction, and the new filter is applied on a 3-D real ultrasound image of the liver

Journal ArticleDOI
TL;DR: A new exemplar-based framework is presented, which treats image completion, texture synthesis, and image inpainting in a unified manner, and manages to resolve what is currently considered as one major limitation of the BP algorithm: its inefficiency in handling MRFs with very large discrete state spaces.
Abstract: In this paper, a new exemplar-based framework is presented, which treats image completion, texture synthesis, and image inpainting in a unified manner. In order to be able to avoid the occurrence of visually inconsistent results, we pose all of the above image-editing tasks in the form of a discrete global optimization problem. The objective function of this problem is always well-defined, and corresponds to the energy of a discrete Markov random field (MRF). For efficiently optimizing this MRF, a novel optimization scheme, called priority belief propagation (BP), is then proposed, which carries two very important extensions over the standard BP algorithm: ldquopriority-based message schedulingrdquo and ldquodynamic label pruning.rdquo These two extensions work in cooperation to deal with the intolerable computational cost of BP, which is caused by the huge number of labels associated with our MRF. Moreover, both of our extensions are generic, since they do not rely on the use of domain-specific prior knowledge. They can, therefore, be applied to any MRF, i.e., to a very wide class of problems in image processing and computer vision, thus managing to resolve what is currently considered as one major limitation of the BP algorithm: its inefficiency in handling MRFs with very large discrete state spaces. Experimental results on a wide variety of input images are presented, which demonstrate the effectiveness of our image-completion framework for tasks such as object removal, texture synthesis, text removal, and image inpainting.

Journal ArticleDOI
TL;DR: This paper presents a novel approach to solve the supervised dimensionality reduction problem by encoding an image object as a general tensor of second or even higher order, and proposes a discriminant tensor criterion, whereby multiple interrelated lower dimensional discriminative subspaces are derived for feature extraction.
Abstract: There is a growing interest in subspace learning techniques for face recognition; however, the excessive dimension of the data space often brings the algorithms into the curse of dimensionality dilemma. In this paper, we present a novel approach to solve the supervised dimensionality reduction problem by encoding an image object as a general tensor of second or even higher order. First, we propose a discriminant tensor criterion, whereby multiple interrelated lower dimensional discriminative subspaces are derived for feature extraction. Then, a novel approach, called k-mode optimization, is presented to iteratively learn these subspaces by unfolding the tensor along different tensor directions. We call this algorithm multilinear discriminant analysis (MDA), which has the following characteristics: 1) multiple interrelated subspaces can collaborate to discriminate different classes, 2) for classification problems involving higher order tensors, the MDA algorithm can avoid the curse of dimensionality dilemma and alleviate the small sample size problem, and 3) the computational cost in the learning stage is reduced to a large extent owing to the reduced data dimensions in k-mode optimization. We provide extensive experiments on ORL, CMU PIE, and FERET databases by encoding face images as second- or third-order tensors to demonstrate that the proposed MDA algorithm based on higher order tensors has the potential to outperform the traditional vector-based subspace learning algorithms, especially in the cases with small sample sizes

Journal ArticleDOI
TL;DR: This paper addresses the problem of image segmentation by means of active contours, whose evolution is driven by the gradient flow derived from an energy functional that is based on the Bhattacharyya distance, and proposes a method for automatically adjusting the smoothness properties of the empirical distributions.
Abstract: This paper addresses the problem of image segmentation by means of active contours, whose evolution is driven by the gradient flow derived from an energy functional that is based on the Bhattacharyya distance. In particular, given the values of a photometric variable (or of a set thereof), which is to be used for classifying the image pixels, the active contours are designed to converge to the shape that results in maximal discrepancy between the empirical distributions of the photometric variable inside and outside of the contours. The above discrepancy is measured by means of the Bhattacharyya distance that proves to be an extremely useful tool for solving the problem at hand. The proposed methodology can be viewed as a generalization of the segmentation methods, in which active contours maximize the difference between a finite number of empirical moments of the ldquoinsiderdquo and ldquooutsiderdquo distributions. Furthermore, it is shown that the proposed methodology is very versatile and flexible in the sense that it allows one to easily accommodate a diversity of the image features based on which the segmentation should be performed. As an additional contribution, a method for automatically adjusting the smoothness properties of the empirical distributions is proposed. Such a procedure is crucial in situations when the number of data samples (supporting a certain segmentation class) varies considerably in the course of the evolution of the active contour. In this case, the smoothness properties of the empirical distributions have to be properly adjusted to avoid either over- or underestimation artifacts. Finally, a number of relevant segmentation results are demonstrated and some further research directions are discussed.

Journal ArticleDOI
TL;DR: A complete face recognition system is implemented by integrating the best option of each step and achieves superior performance on every category of the FERET test: near perfect classification accuracy, and significantly better than any other reported performance on pictures taken several days to more than a year apart.
Abstract: In contrast to holistic methods, local matching methods extract facial features from different levels of locality and quantify them precisely. To determine how they can be best used for face recognition, we conducted a comprehensive comparative study at each step of the local matching process. The conclusions from our experiments include: (1) additional evidence that Gabor features are effective local feature representations and are robust to illumination changes; (2) discrimination based only on a small portion of the face area is surprisingly good; (3) the configuration of facial components does contain rich discriminating information and comparing corresponding local regions utilizes shape features more effectively than comparing corresponding facial components; (4) spatial multiresolution analysis leads to better classification performance; (5) combining local regions with Borda count classifier combination method alleviates the curse of dimensionality. We implemented a complete face recognition system by integrating the best option of each step. Without training, illumination compensation and without any parameter tuning, it achieves superior performance on every category of the FERET test: near perfect classification accuracy (99.5%) on pictures taken on the same day regardless of indoor illumination variations, and significantly better than any other reported performance on pictures taken several days to more than a year apart. The most significant experiments were repeated on the AR database, with similar results.

Journal ArticleDOI
TL;DR: Encouraging preliminary results on real-world video sequences are presented, particularly in the realm of transmission losses, where PRISM exhibits the characteristic of rapid recovery, in contrast to contemporary codecs, which renders PRISM as an attractive candidate for wireless video applications.
Abstract: We describe PRISM, a video coding paradigm based on the principles of lossy distributed compression (also called source coding with side information or Wyner-Ziv coding) from multiuser information theory. PRISM represents a major departure from conventional video coding architectures (e.g., the MPEGx, H.26x families) that are based on motion-compensated predictive coding, with the goal of addressing some of their architectural limitations. PRISM allows for two key architectural enhancements: (1) inbuilt robustness to "drift" between encoder and decoder and (2) the feasibility of a flexible distribution of computational complexity between encoder and decoder. Specifically, PRISM enables transfer of the computationally expensive video encoder motion-search module to the video decoder. Based on this capability, we consider an instance of PRISM corresponding to a near reversal in codec complexities with respect to today's codecs (leading to a novel light encoder and heavy decoder paradigm), in this paper. We present encouraging preliminary results on real-world video sequences, particularly in the realm of transmission losses, where PRISM exhibits the characteristic of rapid recovery, in contrast to contemporary codecs. This renders PRISM as an attractive candidate for wireless video applications.

Journal ArticleDOI
TL;DR: This paper shows how the MCA convergence can be drastically improved using the mutual incoherence of the dictionaries associated to the different components.
Abstract: In a recent paper, a method called morphological component analysis (MCA) has been proposed to separate the texture from the natural part in images. MCA relies on an iterative thresholding algorithm, using a threshold which decreases linearly towards zero along the iterations. This paper shows how the MCA convergence can be drastically improved using the mutual incoherence of the dictionaries associated to the different components. This modified MCA algorithm is then compared to basis pursuit, and experiments show that MCA and BP solutions are similar in terms of sparsity, as measured by the lscr1 norm, but MCA is much faster and gives us the possibility of handling large scale data sets.

Journal ArticleDOI
TL;DR: In this correspondence, a new, simple, yet much faster, algorithm exhibiting O(1) runtime complexity is described and analyzed and compared and benchmarked against previous algorithms.
Abstract: The median filter is one of the basic building blocks in many image processing situations. However, its use has long been hampered by its algorithmic complexity O(tau) of in the kernel radius. With the trend toward larger images and proportionally larger filter kernels, the need for a more efficient median filtering algorithm becomes pressing. In this correspondence, a new, simple, yet much faster, algorithm exhibiting O(1) runtime complexity is described and analyzed. It is compared and benchmarked against previous algorithms. Extensions to higher dimensional or higher precision data and an approximation to a circular kernel are presented, as well.

Journal ArticleDOI
TL;DR: A simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image, and shows distinct advantage of the proposed method over Eigen light-field method.
Abstract: The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.