Showing papers in "IEEE Transactions on Image Processing in 2010"
TL;DR: This paper presents a new approach to single-image superresolution, based upon sparse signal representation, which generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods.
Abstract: This paper presents a new approach to single-image superresolution, based upon sparse signal representation. Research on image statistics suggests that image patches can be well-represented as a sparse linear combination of elements from an appropriately chosen over-complete dictionary. Inspired by this observation, we seek a sparse representation for each patch of the low-resolution input, and then use the coefficients of this representation to generate the high-resolution output. Theoretical results from compressed sensing suggest that under mild conditions, the sparse representation can be correctly recovered from the downsampled signals. By jointly training two dictionaries for the low- and high-resolution image patches, we can enforce the similarity of sparse representations between the low-resolution and high-resolution image patch pair with respect to their own dictionaries. Therefore, the sparse representation of a low-resolution image patch can be applied with the high-resolution image patch dictionary to generate a high-resolution image patch. The learned dictionary pair is a more compact representation of the patch pairs, compared to previous approaches, which simply sample a large amount of image patch pairs , reducing the computational cost substantially. The effectiveness of such a sparsity prior is demonstrated for both general image super-resolution (SR) and the special case of face hallucination. In both cases, our algorithm generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods. In addition, the local sparse modeling of our approach is naturally robust to noise, and therefore the proposed algorithm can handle SR with noisy inputs in a more unified framework.
TL;DR: This work presents a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition, and improves robustness by adding Kernel principal component analysis (PCA) feature extraction and incorporating rich local appearance cues from two complementary sources.
Abstract: Making recognition more reliable under uncontrolled lighting conditions is one of the most important challenges for practical face recognition systems. We tackle this by combining the strengths of robust illumination normalization, local texture-based face representations, distance transform based matching, kernel-based feature extraction and multiple feature fusion. Specifically, we make three main contributions: 1) we present a simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition; 2) we introduce local ternary patterns (LTP), a generalization of the local binary pattern (LBP) local texture descriptor that is more discriminant and less sensitive to noise in uniform regions, and we show that replacing comparisons based on local spatial histograms with a distance transform based similarity metric further improves the performance of LBP/LTP based face recognition; and 3) we further improve robustness by adding Kernel principal component analysis (PCA) feature extraction and incorporating rich local appearance cues from two complementary sources-Gabor wavelets and LBP-showing that the combination is considerably more accurate than either feature set alone. The resulting method provides state-of-the-art performance on three data sets that are widely used for testing recognition under difficult illumination conditions: Extended Yale-B, CAS-PEAL-R1, and Face Recognition Grand Challenge version 2 experiment 4 (FRGC-204). For example, on the challenging FRGC-204 data set it halves the error rate relative to previously published methods, achieving a face verification rate of 88.1% at 0.1% false accept rate. Further experiments show that our preprocessing method outperforms several existing preprocessors for a range of feature sets, data sets and lighting conditions.
TL;DR: It is shown that CLBP_S preserves more information of the local structure thanCLBP_M, which explains why the simple LBP operator can extract the texture features reasonably well and can be made for rotation invariant texture classification.
Abstract: In this correspondence, a completed modeling of the local binary pattern (LBP) operator is proposed and an associated completed LBP (CLBP) scheme is developed for texture classification. A local region is represented by its center pixel and a local difference sign-magnitude transform (LDSMT). The center pixels represent the image gray level and they are converted into a binary code, namely CLBP-Center (CLBP_C), by global thresholding. LDSMT decomposes the image local differences into two complementary components: the signs and the magnitudes, and two operators, namely CLBP-Sign (CLBP_S) and CLBP-Magnitude (CLBP_M), are proposed to code them. The traditional LBP is equivalent to the CLBP_S part of CLBP, and we show that CLBP_S preserves more information of the local structure than CLBP_M, which explains why the simple LBP operator can extract the texture features reasonably well. By combining CLBP_S, CLBP_M, and CLBP_C features into joint or hybrid distributions, significant improvement can be made for rotation invariant texture classification.
TL;DR: A new variational level set formulation in which the regularity of the level set function is intrinsically maintained during thelevel set evolution called distance regularized level set evolution (DRLSE), which eliminates the need for reinitialization and thereby avoids its induced numerical errors.
Abstract: Level set methods have been widely used in image processing and computer vision. In conventional level set formulations, the level set function typically develops irregularities during its evolution, which may cause numerical errors and eventually destroy the stability of the evolution. Therefore, a numerical remedy, called reinitialization, is typically applied to periodically replace the degraded level set function with a signed distance function. However, the practice of reinitialization not only raises serious problems as when and how it should be performed, but also affects numerical accuracy in an undesirable way. This paper proposes a new variational level set formulation in which the regularity of the level set function is intrinsically maintained during the level set evolution. The level set evolution is derived as the gradient flow that minimizes an energy functional with a distance regularization term and an external energy that drives the motion of the zero level set toward desired locations. The distance regularization term is defined with a potential function such that the derived level set evolution has a unique forward-and-backward (FAB) diffusion effect, which is able to maintain a desired shape of the level set function, particularly a signed distance profile near the zero level set. This yields a new type of level set evolution called distance regularized level set evolution (DRLSE). The distance regularization effect eliminates the need for reinitialization and thereby avoids its induced numerical errors. In contrast to complicated implementations of conventional level set formulations, a simpler and more efficient finite difference scheme can be used to implement the DRLSE formulation. DRLSE also allows the use of more general and efficient initialization of the level set function. In its numerical implementation, relatively large time steps can be used in the finite difference scheme to reduce the number of iterations, while ensuring sufficient numerical accuracy. To demonstrate the effectiveness of the DRLSE formulation, we apply it to an edge-based active contour model for image segmentation, and provide a simple narrowband implementation to greatly reduce computational cost.
TL;DR: A new fast algorithm for solving one of the standard formulations of image restoration and reconstruction which consists of an unconstrained optimization problem where the objective includes an l2 data-fidelity term and a nonsmooth regularizer is proposed.
Abstract: We propose a new fast algorithm for solving one of the standard formulations of image restoration and reconstruction which consists of an unconstrained optimization problem where the objective includes an l2 data-fidelity term and a nonsmooth regularizer. This formulation allows both wavelet-based (with orthogonal or frame-based representations) regularization or total-variation regularization. Our approach is based on a variable splitting to obtain an equivalent constrained optimization formulation, which is then addressed with an augmented Lagrangian method. The proposed algorithm is an instance of the so-called alternating direction method of multipliers, for which convergence has been proved. Experiments on a set of image restoration and reconstruction benchmark problems show that the proposed algorithm is faster than the current state of the art methods.
TL;DR: A recent large-scale subjective study of video quality on a collection of videos distorted by a variety of application-relevant processes results in a diverse independent public database of distorted videos and subjective scores that is freely available.
Abstract: We present the results of a recent large-scale subjective study of video quality on a collection of videos distorted by a variety of application-relevant processes. Methods to assess the visual quality of digital videos as perceived by human observers are becoming increasingly important, due to the large number of applications that target humans as the end users of video. Owing to the many approaches to video quality assessment (VQA) that are being developed, there is a need for a diverse independent public database of distorted videos and subjective scores that is freely available. The resulting Laboratory for Image and Video Engineering (LIVE) Video Quality Database contains 150 distorted videos (obtained from ten uncompressed reference videos of natural scenes) that were created using four different commonly encountered distortion types. Each video was assessed by 38 human subjects, and the difference mean opinion scores (DMOS) were recorded. We also evaluated the performance of several state-of-the-art, publicly available full-reference VQA algorithms on the new database. A statistical evaluation of the relative performance of these algorithms is also presented. The database has a dedicated web presence that will be maintained as long as it remains relevant and the data is available online.
TL;DR: The nth-order LDP is proposed to encode the (n-1)th -order local derivative direction variations, which can capture more detailed information than the first-order local pattern used in local binary pattern (LBP).
Abstract: This paper proposes a novel high-order local pattern descriptor, local derivative pattern (LDP), for face recognition. LDP is a general framework to encode directional pattern features based on local derivative variations. The nth-order LDP is proposed to encode the (n-1)th -order local derivative direction variations, which can capture more detailed information than the first-order local pattern used in local binary pattern (LBP). Different from LBP encoding the relationship between the central point and its neighbors, the LDP templates extract high-order local information by encoding various distinctive spatial relationships contained in a given local region. Both gray-level images and Gabor feature images are used to evaluate the comparative performances of LDP and LBP. Extensive experimental results on FERET, CAS-PEAL, CMU-PIE, Extended Yale B, and FRGC databases show that the high-order LDP consistently performs much better than LBP for both face identification and face verification under various conditions.
TL;DR: A variation of fuzzy c-means (FCM) algorithm that provides image clustering that incorporates the local spatial information and gray level information in a novel fuzzy way, called fuzzy local information C-Means (FLICM).
Abstract: This paper presents a variation of fuzzy c-means (FCM) algorithm that provides image clustering. The proposed algorithm incorporates the local spatial information and gray level information in a novel fuzzy way. The new algorithm is called fuzzy local information C-Means (FLICM). FLICM can overcome the disadvantages of the known fuzzy c-means algorithms and at the same time enhances the clustering performance. The major characteristic of FLICM is the use of a fuzzy local (both spatial and gray level) similarity measure, aiming to guarantee noise insensitiveness and image detail preservation. Furthermore, the proposed algorithm is fully free of the empirically adjusted parameters (a, ?g, ?s, etc.) incorporated into all other fuzzy c-means algorithms proposed in the literature. Experiments performed on synthetic and real-world images show that FLICM algorithm is effective and efficient, providing robustness to noisy images.
TL;DR: Extensive tests of videos, natural images, and psychological patterns show that the proposed PQFT model is more effective in saliency detection and can predict eye fixations better than other state-of-the-art models in previous literature.
Abstract: Salient areas in natural scenes are generally regarded as areas which the human eye will typically focus on, and finding these areas is the key step in object detection. In computer vision, many models have been proposed to simulate the behavior of eyes such as SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT), and others, but they demand high computational cost and computing useful results mostly relies on their choice of parameters. Although some region-based approaches were proposed to reduce the computational complexity of feature maps, these approaches still were not able to work in real time. Recently, a simple and fast approach called spectral residual (SR) was proposed, which uses the SR of the amplitude spectrum to calculate the image's saliency map. However, in our previous work, we pointed out that it is the phase spectrum, not the amplitude spectrum, of an image's Fourier transform that is key to calculating the location of salient areas, and proposed the phase spectrum of Fourier transform (PFT) model. In this paper, we present a quaternion representation of an image which is composed of intensity, color, and motion features. Based on the principle of PFT, a novel multiresolution spatiotemporal saliency detection model called phase spectrum of quaternion Fourier transform (PQFT) is proposed in this paper to calculate the spatiotemporal saliency map of an image by its quaternion representation. Distinct from other models, the added motion dimension allows the phase spectrum to represent spatiotemporal saliency in order to perform attention selection not only for images but also for videos. In addition, the PQFT model can compute the saliency map of an image under various resolutions from coarse to fine. Therefore, the hierarchical selectivity (HS) framework based on the PQFT model is introduced here to construct the tree structure representation of an image. With the help of HS, a model called multiresolution wavelet domain foveation (MWDF) is proposed in this paper to improve coding efficiency in image and video compression. Extensive tests of videos, natural images, and psychological patterns show that the proposed PQFT model is more effective in saliency detection and can predict eye fixations better than other state-of-the-art models in previous literature. Moreover, our model requires low computational cost and, therefore, can work in real time. Additional experiments on image and video compression show that the HS-MWDF model can achieve higher compression rate than the traditional model.
TL;DR: A comprehensive optimization method to arrive at the spatial and spectral layout of the color filter array of a GAP camera is presented and a novel algorithm for reconstructing the under-sampled channels of the image while minimizing aliasing artifacts is developed.
Abstract: We propose the concept of a generalized assorted pixel (GAP) camera, which enables the user to capture a single image of a scene and, after the fact, control the tradeoff between spatial resolution, dynamic range and spectral detail. The GAP camera uses a complex array (or mosaic) of color filters. A major problem with using such an array is that the captured image is severely under-sampled for at least some of the filter types. This leads to reconstructed images with strong aliasing. We make four contributions in this paper: 1) we present a comprehensive optimization method to arrive at the spatial and spectral layout of the color filter array of a GAP camera. 2) We develop a novel algorithm for reconstructing the under-sampled channels of the image while minimizing aliasing artifacts. 3) We demonstrate how the user can capture a single image and then control the tradeoff of spatial resolution to generate a variety of images, including monochrome, high dynamic range (HDR) monochrome, RGB, HDR RGB, and multispectral images. 4) Finally, the performance of our GAP camera has been verified using extensive simulations that use multispectral images of real world scenes. A large database of these multispectral images has been made available at http://wwwl.cs.columbia.edu/ CAVE/projects/gap_camera/ for use by the research community.
TL;DR: A general, spatio-spectrally localized multiscale framework for evaluating dynamic video fidelity that integrates both spatial and temporal aspects of distortion assessment and is found to be quite competitive with, and even outperform, algorithms developed and submitted to the VQEG FRTV Phase 1 study, as well as more recent VQA algorithms tested on this database.
Abstract: There has recently been a great deal of interest in the development of algorithms that objectively measure the integrity of video signals. Since video signals are being delivered to human end users in an increasingly wide array of applications and products, it is important that automatic methods of video quality assessment (VQA) be available that can assist in controlling the quality of video being delivered to this critical audience. Naturally, the quality of motion representation in videos plays an important role in the perception of video quality, yet existing VQA algorithms make little direct use of motion information, thus limiting their effectiveness. We seek to ameliorate this by developing a general, spatio-spectrally localized multiscale framework for evaluating dynamic video fidelity that integrates both spatial and temporal (and spatio-temporal) aspects of distortion assessment. Video quality is evaluated not only in space and time, but also in space-time, by evaluating motion quality along computed motion trajectories. Using this framework, we develop a full reference VQA algorithm for which we coin the term the MOtion-based Video Integrity Evaluation index, or MOVIE index. It is found that the MOVIE index delivers VQA scores that correlate quite closely with human subjective judgment, using the Video Quality Expert Group (VQEG) FRTV Phase 1 database as a test bed. Indeed, the MOVIE index is found to be quite competitive with, and even outperform, algorithms developed and submitted to the VQEG FRTV Phase 1 study, as well as more recent VQA algorithms tested on this database.
TL;DR: This paper model the components of the compressive sensing (CS) problem, i.e., the signal acquisition process, the unknown signal coefficients and the model parameters for the signal and noise using the Bayesian framework and develops a constructive (greedy) algorithm designed for fast reconstruction useful in practical settings.
Abstract: In this paper, we model the components of the compressive sensing (CS) problem, i.e., the signal acquisition process, the unknown signal coefficients and the model parameters for the signal and noise using the Bayesian framework. We utilize a hierarchical form of the Laplace prior to model the sparsity of the unknown signal. We describe the relationship among a number of sparsity priors proposed in the literature, and show the advantages of the proposed model including its high degree of sparsity. Moreover, we show that some of the existing models are special cases of the proposed model. Using our model, we develop a constructive (greedy) algorithm designed for fast reconstruction useful in practical settings. Unlike most existing CS reconstruction methods, the proposed algorithm is fully automated, i.e., the unknown signal coefficients and all necessary parameters are estimated solely from the observation, and, therefore, no user-intervention is needed. Additionally, the proposed algorithm provides estimates of the uncertainty of the reconstructions. We provide experimental results with synthetic 1-D signals and images, and compare with the state-of-the-art CS reconstruction algorithms demonstrating the superior performance of the proposed approach.
TL;DR: Compared with the conventional k -nearest-neighbor graph and ¿-ball graph, the ¿1-graph possesses the advantages: greater robustness to data noise, (2) automatic sparsity, and (3) adaptive neighborhood for individual datum.
Abstract: The graph construction procedure essentially determines the potentials of those graph-oriented learning algorithms for image analysis. In this paper, we propose a process to build the so-called directed ?1-graph, in which the vertices involve all the samples and the ingoing edge weights to each vertex describe its ?1-norm driven reconstruction from the remaining samples and the noise. Then, a series of new algorithms for various machine learning tasks, e.g., data clustering, subspace learning, and semi-supervised learning, are derived upon the ?1-graphs. Compared with the conventional k -nearest-neighbor graph and ?-ball graph, the ?1-graph possesses the advantages: (1) greater robustness to data noise, (2) automatic sparsity, and (3) adaptive neighborhood for individual datum. Extensive experiments on three real-world datasets show the consistent superiority of ?1-graph over those classic graphs in data clustering, subspace learning, and semi-supervised learning tasks.
TL;DR: A novel examplar-based inpainting algorithm through investigating the sparsity of natural image patches that enables better discrimination of structure and texture, and the patch sparse representation forces the newly inpainted regions to be sharp and consistent with the surrounding textures.
Abstract: This paper introduces a novel examplar-based inpainting algorithm through investigating the sparsity of natural image patches. Two novel concepts of sparsity at the patch level are proposed for modeling the patch priority and patch representation, which are two crucial steps for patch propagation in the examplar-based inpainting approach. First, patch structure sparsity is designed to measure the confidence of a patch located at the image structure (e.g., the edge or corner) by the sparseness of its nonzero similarities to the neighboring patches. The patch with larger structure sparsity will be assigned higher priority for further inpainting. Second, it is assumed that the patch to be filled can be represented by the sparse linear combination of candidate patches under the local patch consistency constraint in a framework of sparse representation. Compared with the traditional examplar-based inpainting approach, structure sparsity enables better discrimination of structure and texture, and the patch sparse representation forces the newly inpainted regions to be sharp and consistent with the surrounding textures. Experiments on synthetic and natural images show the advantages of the proposed approach.
TL;DR: This paper proposes an approach to deconvolving Poissonian images, which is based upon an alternating direction optimization method of multipliers (ADMM), which belongs to the family of augmented Lagrangian algorithms.
Abstract: Much research has been devoted to the problem of restoring Poissonian images, namely for medical and astronomical applications. However, the restoration of these images using state-of-the-art regularizers (such as those based upon multiscale representations or total variation) is still an active research area, since the associated optimization problems are quite challenging. In this paper, we propose an approach to deconvolving Poissonian images, which is based upon an alternating direction optimization method. The standard regularization [or maximum a posteriori (MAP)] restoration criterion, which combines the Poisson log-likelihood with a (nonsmooth) convex regularizer (log-prior), leads to hard optimization problems: the log-likelihood is nonquadratic and nonseparable, the regularizer is nonsmooth, and there is a nonnegativity constraint. Using standard convex analysis tools, we present sufficient conditions for existence and uniqueness of solutions of these optimization problems, for several types of regularizers: total-variation, frame-based analysis, and frame-based synthesis. We attack these problems with an instance of the alternating direction method of multipliers (ADMM), which belongs to the family of augmented Lagrangian algorithms. We study sufficient conditions for convergence and show that these are satisfied, either under total-variation or frame-based (analysis and synthesis) regularization. The resulting algorithms are shown to outperform alternative state-of-the-art methods, both in terms of speed and restoration accuracy.
TL;DR: This work estimates a lower bound on the mean squared error of the denoised result and compares the performance of current state-of-the-art denoising methods with this bound, showing that despite the phenomenal recent progress in the quality of denoizing algorithms, some room for improvement still remains for a wide class of general images, and at certain signal-to-noise levels.
Abstract: Image denoising has been a well studied problem in the field of image processing. Yet researchers continue to focus attention on it to better the current state-of-the-art. Recently proposed methods take different approaches to the problem and yet their denoising performances are comparable. A pertinent question then to ask is whether there is a theoretical limit to denoising performance and, more importantly, are we there yet? As camera manufacturers continue to pack increasing numbers of pixels per unit area, an increase in noise sensitivity manifests itself in the form of a noisier image. We study the performance bounds for the image denoising problem. Our work in this paper estimates a lower bound on the mean squared error of the denoised result and compares the performance of current state-of-the-art denoising methods with this bound. We show that despite the phenomenal recent progress in the quality of denoising algorithms, some room for improvement still remains for a wide class of general images, and at certain signal-to-noise levels. Therefore, image denoising is not dead - yet.
TL;DR: A unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points by modeling the mismatch between h(X) and F.
Abstract: We propose a unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised dimension reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F0 = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F0. Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised dimension reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised dimension reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing dimension reduction algorithms.
TL;DR: The discrete shearlet transform (DST) is developed which provides efficient multiscale directional representation and the implementation of the transform is built in the discrete framework based on a multiresolution analysis (MRA).
Abstract: It is now widely acknowledged that analyzing the intrinsic geometrical features of the underlying image is essential in many applications including image processing. In order to achieve this, several directional image representation schemes have been proposed. In this paper, we develop the discrete shearlet transform (DST) which provides efficient multiscale directional representation and show that the implementation of the transform is built in the discrete framework based on a multiresolution analysis (MRA). We assess the performance of the DST in image denoising and approximation applications. In image approximations, our approximation scheme using the DST outperforms the discrete wavelet transform (DWT) while the computational cost of our scheme is comparable to the DWT. Also, in image denoising, the DST compares favorably with other existing transforms in the literature.
TL;DR: Experiments show that, for rich-grained digital image, the capability of nonlinearly enhancing complex texture details in smooth area by fractional differential-based approach appears obvious better than by traditional integral-based algorithms.
Abstract: In this paper, we intend to implement a class of fractional differential masks with high-precision. Thanks to two commonly used definitions of fractional differential for what are known as Grumwald-Letnikov and Riemann-Liouville, we propose six fractional differential masks and present the structures and parameters of each mask respectively on the direction of negative x-coordinate, positive x-coordinate, negative y-coordinate, positive y-coordinate, left downward diagonal, left upward diagonal, right downward diagonal, and right upward diagonal. Moreover, by theoretical and experimental analyzing, we demonstrate the second is the best performance fractional differential mask of the proposed six ones. Finally, we discuss further the capability of multiscale fractional differential masks for texture enhancement. Experiments show that, for rich-grained digital image, the capability of nonlinearly enhancing complex texture details in smooth area by fractional differential-based approach appears obvious better than by traditional integral-based algorithms.
TL;DR: This paper decomposes the road detection process into two steps: the estimation of the vanishing point associated with the main (straight) part of the road, followed by the segmentation of the corresponding road area based upon the detected vanishing point.
Abstract: Given a single image of an arbitrary road, that may not be well-paved, or have clearly delineated edges, or some a priori known color or texture distribution, is it possible for a computer to find this road? This paper addresses this question by decomposing the road detection process into two steps: the estimation of the vanishing point associated with the main (straight) part of the road, followed by the segmentation of the corresponding road area based upon the detected vanishing point. The main technical contributions of the proposed approach are a novel adaptive soft voting scheme based upon a local voting region using high-confidence voters, whose texture orientations are computed using Gabor filters, and a new vanishing-point-constrained edge detection technique for detecting road boundaries. The proposed method has been implemented, and experiments with 1003 general road images demonstrate that it is effective at detecting road regions in challenging conditions.
TL;DR: This paper proposes local Gabor XOR patterns (LGXP), which encodes the Gabor phase by using the local XOR pattern (LXP) operator, and introduces block-based Fisher's linear discriminant (BFLD) to reduce the dimensionality of the proposed descriptor and at the same time enhance its discriminative power.
Abstract: Gabor features have been known to be effective for face recognition. However, only a few approaches utilize phase feature and they usually perform worse than those using magnitude feature. To investigate the potential of Gabor phase and its fusion with magnitude for face recognition, in this paper, we first propose local Gabor XOR patterns (LGXP), which encodes the Gabor phase by using the local XOR pattern (LXP) operator. Then, we introduce block-based Fisher's linear discriminant (BFLD) to reduce the dimensionality of the proposed descriptor and at the same time enhance its discriminative power. Finally, by using BFLD, we fuse local patterns of Gabor magnitude and phase for face recognition. We evaluate our approach on FERET and FRGC 2.0 databases. In particular, we perform comparative experimental studies of different local Gabor patterns. We also make a detailed comparison of their combinations with BFLD, as well as the fusion of different descriptors by using BFLD. Extensive experimental results verify the effectiveness of our LGXP descriptor and also show that our fusion approach outperforms most of the state-of-the-art approaches.
TL;DR: A no-reference metric Q is proposed which is based upon singular value decomposition of local image gradient matrix, and provides a quantitative measure of true image content in the presence of noise and other disturbances, and is used to automatically and effectively set the parameters of two leading image denoising algorithms.
Abstract: Across the field of inverse problems in image and video processing, nearly all algorithms have various parameters which need to be set in order to yield good results. In practice, usually the choice of such parameters is made empirically with trial and error if no “ground-truth” reference is available. Some analytical methods such as cross-validation and Stein's unbiased risk estimate (SURE) have been successfully used to set such parameters. However, these methods tend to be strongly reliant on restrictive assumptions on the noise, and also computationally heavy. In this paper, we propose a no-reference metric Q which is based upon singular value decomposition of local image gradient matrix, and provides a quantitative measure of true image content (i.e., sharpness and contrast as manifested in visually salient geometric features such as edges,) in the presence of noise and other disturbances. This measure 1) is easy to compute, 2) reacts reasonably to both blur and random noise, and 3) works well even when the noise is not Gaussian. The proposed measure is used to automatically and effectively set the parameters of two leading image denoising algorithms. Ample simulated and real data experiments support our claims. Furthermore, tests using the TID2008 database show that this measure correlates well with subjective quality evaluations for both blur and noise distortions.
TL;DR: This paper proposes a new image clustering algorithm, referred to as clustering using local discriminant models and global integration (LDMGI), and shows that LDMGI shares a similar objective function with the spectral clustering (SC) algorithms, e.g., normalized cut (NCut).
Abstract: In this paper, we propose a new image clustering algorithm, referred to as clustering using local discriminant models and global integration (LDMGI). To deal with the data points sampled from a nonlinear manifold, for each data point, we construct a local clique comprising this data point and its neighboring data points. Inspired by the Fisher criterion, we use a local discriminant model for each local clique to evaluate the clustering performance of samples within the local clique. To obtain the clustering result, we further propose a unified objective function to globally integrate the local models of all the local cliques. With the unified objective function, spectral relaxation and spectral rotation are used to obtain the binary cluster indicator matrix for all the samples. We show that LDMGI shares a similar objective function with the spectral clustering (SC) algorithms, e.g., normalized cut (NCut). In contrast to NCut in which the Laplacian matrix is directly calculated based upon a Gaussian function, a new Laplacian matrix is learnt in LDMGI by exploiting both manifold structure and local discriminant information. We also prove that K-means and discriminative K-means (DisKmeans) are both special cases of LDMGI. Extensive experiments on several benchmark image datasets demonstrate the effectiveness of LDMGI. We observe in the experiments that LDMGI is more robust to algorithmic parameter, when compared with NCut. Thus, LDMGI is more appealing for the real image clustering applications in which the ground truth is generally not available for tuning algorithmic parameters.
TL;DR: A set of experiments shows that the proposed method, which is named MIDAL (multiplicative image denoising by augmented Lagrangian), yields state-of-the-art results both in terms of speed and Denoising performance.
Abstract: Multiplicative noise (also known as speckle noise) models are central to the study of coherent imaging systems, such as synthetic aperture radar and sonar, and ultrasound and laser imaging. These models introduce two additional layers of difficulties with respect to the standard Gaussian additive noise scenario: (1) the noise is multiplied by (rather than added to) the original image; (2) the noise is not Gaussian, with Rayleigh and Gamma being commonly used densities. These two features of multiplicative noise models preclude the direct application of most state-of-the-art algorithms, which are designed for solving unconstrained optimization problems where the objective has two terms: a quadratic data term (log-likelihood), reflecting the additive and Gaussian nature of the noise, plus a convex (possibly nonsmooth) regularizer (e.g., a total variation or wavelet-based regularizer/prior). In this paper, we address these difficulties by: (1) converting the multiplicative model into an additive one by taking logarithms, as proposed by some other authors; (2) using variable splitting to obtain an equivalent constrained problem; and (3) dealing with this optimization problem using the augmented Lagrangian framework. A set of experiments shows that the proposed method, which we name MIDAL (multiplicative image denoising by augmented Lagrangian), yields state-of-the-art results both in terms of speed and denoising performance.
TL;DR: A class of inverse problem estimators computed by mixing adaptively a family of linear estimators corresponding to different priors corresponding toDifferent priors are introduced, providing state-of-the-art numerical results.
Abstract: We introduce a class of inverse problem estimators computed by mixing adaptively a family of linear estimators corresponding to different priors. Sparse mixing weights are calculated over blocks of coefficients in a frame providing a sparse signal representation. They minimize an l1 norm taking into account the signal regularity in each block. Adaptive directional image interpolations are computed over a wavelet frame with an O(N log N) algorithm, providing state-of-the-art numerical results.
TL;DR: This paper combines copy-and-paste texture synthesis, geometric partial differential equations (PDEs), and coherence among neighboring pixels in a variational model, and provides a working algorithm for image inpainting trying to approximate the minimum of the proposed energy functional.
Abstract: Inpainting is the art of modifying an image in a form that is not detectable by an ordinary observer. There are numerous and very different approaches to tackle the inpainting problem, though as explained in this paper, the most successful algorithms are based upon one or two of the following three basic techniques: copy-and-paste texture synthesis, geometric partial differential equations (PDEs), and coherence among neighboring pixels. We combine these three building blocks in a variational model, and provide a working algorithm for image inpainting trying to approximate the minimum of the proposed energy functional. Our experiments show that the combination of all three terms of the proposed energy works better than taking each term separately, and the results obtained are within the state-of-the-art.
TL;DR: The main goal of this paper is to develop fast minimization algorithms to solve the nonconvex nonsmooth minimization problem and the effectiveness and efficiency of the proposed algorithms are shown.
Abstract: Nonconvex nonsmooth regularization has advantages over convex regularization for restoring images with neat edges. However, its practical interest used to be limited by the difficulty of the computational stage which requires a nonconvex nonsmooth minimization. In this paper, we deal with nonconvex nonsmooth minimization methods for image restoration and reconstruction. Our theoretical results show that the solution of the nonconvex nonsmooth minimization problem is composed of constant regions surrounded by closed contours and neat edges. The main goal of this paper is to develop fast minimization algorithms to solve the nonconvex nonsmooth minimization problem. Our experimental results show that the effectiveness and efficiency of the proposed algorithms.
TL;DR: The method is able to handle unconstrained blurs, but also allows the use of constraints or of prior information on the blurring filter, as well as theUse of filters defined in a parametric manner and shows to be applicable to a much wider range of blurs.
Abstract: A method for blind image deblurring is presented. The method only makes weak assumptions about the blurring filter and is able to undo a wide variety of blurring degradations. To overcome the ill-posedness of the blind image deblurring problem, the method includes a learning technique which initially focuses on the main edges of the image and gradually takes details into account. A new image prior, which includes a new edge detector, is used. The method is able to handle unconstrained blurs, but also allows the use of constraints or of prior information on the blurring filter, as well as the use of filters defined in a parametric manner. Furthermore, it works in both single-frame and multiframe scenarios. The use of constrained blur models appropriate to the problem at hand, and/or of multiframe scenarios, generally improves the deblurring results. Tests performed on monochrome and color images, with various synthetic and real-life degradations, without and with noise, in single-frame and multiframe scenarios, showed good results, both in subjective terms and in terms of the increase of signal to noise ratio (ISNR) measure. In comparisons with other state of the art methods, our method yields better results, and shows to be applicable to a much wider range of blurs.
TL;DR: A resolution progressive compression scheme which compresses an encrypted image progressively in resolution, such that the decoder can observe a low-resolution version of the image, study local statistics based on it, and use the statistics to decode the next resolution level.
Abstract: Lossless compression of encrypted sources can be achieved through Slepian-Wolf coding. For encrypted real-world sources, such as images, the key to improve the compression efficiency is how the source dependency is exploited. Approaches in the literature that make use of Markov properties in the Slepian-Wolf decoder do not work well for grayscale images. In this correspondence, we propose a resolution progressive compression scheme which compresses an encrypted image progressively in resolution, such that the decoder can observe a low-resolution version of the image, study local statistics based on it, and use the statistics to decode the next resolution level. Good performance is observed both theoretically and experimentally.
TL;DR: This paper converts the linear model, which reduces to a low-pass/high-pass filter pair, into a nonlinear filter pair involving the total variation, which retains both the essential features of Meyer's models and the simplicity and rapidity of thelinear model.
Abstract: Can images be decomposed into the sum of a geometric part and a textural part? In a theoretical breakthrough, [Y. Meyer, Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Providence, RI: American Mathematical Society, 2001] proposed variational models that force the geometric part into the space of functions with bounded variation, and the textural part into a space of oscillatory distributions. Meyer's models are simple minimization problems extending the famous total variation model. However, their numerical solution has proved challenging. It is the object of a literature rich in variants and numerical attempts. This paper starts with the linear model, which reduces to a low-pass/high-pass filter pair. A simple conversion of the linear filter pair into a nonlinear filter pair involving the total variation is introduced. This new-proposed nonlinear filter pair retains both the essential features of Meyer's models and the simplicity and rapidity of the linear model. It depends upon only one transparent parameter: the texture scale, measured in pixel mesh. Comparative experiments show a better and faster separation of cartoon from texture. One application is illustrated: edge detection.