Showing papers in "IEEE Transactions on Image Processing in 2013"
TL;DR: The so-called nonlocally centralized sparse representation (NCSR) model is as simple as the standard sparse representation model, and the extensive experiments validate the generality and state-of-the-art performance of the proposed NCSR algorithm.
Abstract: Sparse representation models code an image patch as a linear combination of a few atoms chosen out from an over-complete dictionary, and they have shown promising results in various image restoration applications. However, due to the degradation of the observed image (e.g., noisy, blurred, and/or down-sampled), the sparse representations by conventional models may not be accurate enough for a faithful reconstruction of the original image. To improve the performance of sparse representation-based image restoration, in this paper the concept of sparse coding noise is introduced, and the goal of image restoration turns to how to suppress the sparse coding noise. To this end, we exploit the image nonlocal self-similarity to obtain good estimates of the sparse coding coefficients of the original image, and then centralize the sparse coding coefficients of the observed image to those estimates. The so-called nonlocally centralized sparse representation (NCSR) model is as simple as the standard sparse representation model, while our extensive experiments on various types of image restoration problems, including denoising, deblurring and super-resolution, validate the generality and state-of-the-art performance of the proposed NCSR algorithm.
TL;DR: Experimental results demonstrate that the proposed method can obtain state-of-the-art performance for fusion of multispectral, multifocus, multimodal, and multiexposure images.
Abstract: A fast and effective image fusion method is proposed for creating a highly informative fused image through merging multiple images. The proposed method is based on a two-scale decomposition of an image into a base layer containing large scale variations in intensity, and a detail layer capturing small scale details. A novel guided filtering-based weighted average technique is proposed to make full use of spatial consistency for fusion of the base and detail layers. Experimental results demonstrate that the proposed method can obtain state-of-the-art performance for fusion of multispectral, multifocus, multimodal, and multiexposure images.
TL;DR: Experimental results demonstrate that the proposed enhancement algorithm can not only enhance the details but also preserve the naturalness for non-uniform illumination images.
Abstract: Image enhancement plays an important role in image processing and analysis. Among various enhancement algorithms, Retinex-based algorithms can efficiently enhance details and have been widely adopted. Since Retinex-based algorithms regard illumination removal as a default preference and fail to limit the range of reflectance, the naturalness of non-uniform illumination images cannot be effectively preserved. However, naturalness is essential for image enhancement to achieve pleasing perceptual quality. In order to preserve naturalness while enhancing details, we propose an enhancement algorithm for non-uniform illumination images. In general, this paper makes the following three major contributions. First, a lightness-order-error measure is proposed to access naturalness preservation objectively. Second, a bright-pass filter is proposed to decompose an image into reflectance and illumination, which, respectively, determine the details and the naturalness of the image. Third, we propose a bi-log transformation, which is utilized to map the illumination to make a balance between details and naturalness. Experimental results demonstrate that the proposed algorithm can not only enhance the details but also preserve the naturalness for non-uniform illumination images.
TL;DR: An automatic transformation technique that improves the brightness of dimmed images via the gamma correction and probability distribution of luminance pixels and uses temporal information regarding the differences between each frame to reduce computational complexity is presented.
Abstract: This paper proposes an efficient method to modify histograms and enhance contrast in digital images. Enhancement plays a significant role in digital image processing, computer vision, and pattern recognition. We present an automatic transformation technique that improves the brightness of dimmed images via the gamma correction and probability distribution of luminance pixels. To enhance video, the proposed image-enhancement method uses temporal information regarding the differences between each frame to reduce computational complexity. Experimental results demonstrate that the proposed method produces enhanced images of comparable or higher quality than those produced using previous state-of-the-art methods.
TL;DR: Experimental results demonstrate the state-of-the-art denoising performance of BM4D, and its effectiveness when exploited as a regularizer in volumetric data reconstruction.
Abstract: We present an extension of the BM3D filter to volumetric data. The proposed algorithm, BM4D, implements the grouping and collaborative filtering paradigm, where mutually similar d -dimensional patches are stacked together in a (d+1) -dimensional array and jointly filtered in transform domain. While in BM3D the basic data patches are blocks of pixels, in BM4D we utilize cubes of voxels, which are stacked into a 4-D “group.” The 4-D transform applied on the group simultaneously exploits the local correlation present among voxels in each cube and the nonlocal correlation between the corresponding voxels of different cubes. Thus, the spectrum of the group is highly sparse, leading to very effective separation of signal and noise through coefficient shrinkage. After inverse transformation, we obtain estimates of each grouped cube, which are then adaptively aggregated at their original locations. We evaluate the algorithm on denoising of volumetric data corrupted by Gaussian and Rician noise, as well as on reconstruction of volumetric phantom data with non-zero phase from noisy and incomplete Fourier-domain (k-space) measurements. Experimental results demonstrate the state-of-the-art denoising performance of BM4D, and its effectiveness when exploited as a regularizer in volumetric data reconstruction.
TL;DR: This paper takes a low-rank approach toward SSC and provides a conceptually simple interpretation from a bilateral variance estimation perspective, namely that singular-value decomposition of similar packed patches can be viewed as pooling both local and nonlocal information for estimating signal variances.
Abstract: Simultaneous sparse coding (SSC) or nonlocal image representation has shown great potential in various low-level vision tasks, leading to several state-of-the-art image restoration techniques, including BM3D and LSSC. However, it still lacks a physically plausible explanation about why SSC is a better model than conventional sparse coding for the class of natural images. Meanwhile, the problem of sparsity optimization, especially when tangled with dictionary learning, is computationally difficult to solve. In this paper, we take a low-rank approach toward SSC and provide a conceptually simple interpretation from a bilateral variance estimation perspective, namely that singular-value decomposition of similar packed patches can be viewed as pooling both local and nonlocal information for estimating signal variances. Such perspective inspires us to develop a new class of image restoration algorithms called spatially adaptive iterative singular-value thresholding (SAIST). For noise data, SAIST generalizes the celebrated BayesShrink from local to nonlocal models; for incomplete data, SAIST extends previous deterministic annealing-based solution to sparsity optimization through incorporating the idea of dictionary learning. In addition to conceptual simplicity and computational efficiency, SAIST has achieved highly competent (often better) objective performance compared to several state-of-the-art methods in image denoising and completion experiments. Our subjective quality results compare favorably with those obtained by existing techniques, especially at high noise levels and with a large amount of missing data.
TL;DR: Extensive synthetic and real data experiments show that the proposed small target detection method not only works more stably for different target sizes and signal-to-clutter ratio values, but also has better detection performance compared with conventional baseline methods.
Abstract: The robust detection of small targets is one of the key techniques in infrared search and tracking applications. A novel small target detection method in a single infrared image is proposed in this paper. Initially, the traditional infrared image model is generalized to a new infrared patch-image model using local patch construction. Then, because of the non-local self-correlation property of the infrared background image, based on the new model small target detection is formulated as an optimization problem of recovering low-rank and sparse matrices, which is effectively solved using stable principle component pursuit. Finally, a simple adaptive segmentation method is used to segment the target image and the segmentation result can be refined by post-processing. Extensive synthetic and real data experiments show that under different clutter backgrounds the proposed method not only works more stably for different target sizes and signal-to-clutter ratio values, but also has better detection performance compared with conventional baseline methods.
TL;DR: This study allows one to assess the state-of-the-art visual saliency modeling, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.
Abstract: Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as “visual saliency.” Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.
TL;DR: This paper is the first to demonstrate the utility and effectiveness of a fusion-based technique for dehazing based on a single degraded image that derives from two original hazy image inputs by applying a white balance and a contrast enhancing procedure.
Abstract: Haze is an atmospheric phenomenon that significantly degrades the visibility of outdoor scenes. This is mainly due to the atmosphere particles that absorb and scatter the light. This paper introduces a novel single image approach that enhances the visibility of such degraded images. Our method is a fusion-based strategy that derives from two original hazy image inputs by applying a white balance and a contrast enhancing procedure. To blend effectively the information of the derived inputs to preserve the regions with good visibility, we filter their important features by computing three measures (weight maps): luminance, chromaticity, and saliency. To minimize artifacts introduced by the weight maps, our approach is designed in a multiscale fashion, using a Laplacian pyramid representation. We are the first to demonstrate the utility and effectiveness of a fusion-based technique for dehazing based on a single degraded image. The method performs in a per-pixel fashion, which is straightforward to implement. The experimental results demonstrate that the method yields results comparative to and even better than the more complex state-of-the-art techniques, having the advantage of being appropriate for real-time applications.
TL;DR: An improved fuzzy C-means (FCM) algorithm for image segmentation is presented by introducing a tradeoff weighted fuzzy factor and a kernel metric and results show that the new algorithm is effective and efficient, and is relatively independent of this type of noise.
Abstract: In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its objective function. The new algorithm adaptively determines the kernel parameter by using a fast bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore, the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is relatively independent of this type of noise.
TL;DR: An objective quality assessment algorithm for tone-mapped images is proposed by combining: 1) a multiscale signal fidelity measure on the basis of a modified structural similarity index and 2) a naturalness measure onThe basis of intensity statistics of natural images.
Abstract: Tone-mapping operators (TMOs) that convert high dynamic range (HDR) to low dynamic range (LDR) images provide practically useful tools for the visualization of HDR images on standard LDR displays. Different TMOs create different tone-mapped images, and a natural question is which one has the best quality. Without an appropriate quality measure, different TMOs cannot be compared, and further improvement is directionless. Subjective rating may be a reliable evaluation method, but it is expensive and time consuming, and more importantly, is difficult to be embedded into optimization frameworks. Here we propose an objective quality assessment algorithm for tone-mapped images by combining: 1) a multiscale signal fidelity measure on the basis of a modified structural similarity index and 2) a naturalness measure on the basis of intensity statistics of natural images. Validations using independent subject-rated image databases show good correlations between subjective ranking score and the proposed tone-mapped image quality index (TMQI). Furthermore, we demonstrate the extended applications of TMQI using two examples - parameter tuning for TMOs and adaptive fusion of multiple tone-mapped images.
TL;DR: A novel local feature descriptor, local directional number pattern (LDN), for face analysis, i.e., face and expression recognition, that encodes the directional information of the face's textures in a compact way, producing a more discriminative code than current methods.
Abstract: This paper proposes a novel local feature descriptor, local directional number pattern (LDN), for face analysis, i.e., face and expression recognition. LDN encodes the directional information of the face's textures (i.e., the texture's structure) in a compact way, producing a more discriminative code than current methods. We compute the structure of each micro-pattern with the aid of a compass mask that extracts directional information, and we encode such information using the prominent direction indices (directional numbers) and sign-which allows us to distinguish among similar structural patterns that have different intensity transitions. We divide the face into several regions, and extract the distribution of the LDN features from them. Then, we concatenate these features into a feature vector, and we use it as a face descriptor. We perform several experiments in which our descriptor performs consistently under illumination, noise, expression, and time lapse variations. Moreover, we test our descriptor with different masks to analyze its performance in different face analysis tasks.
TL;DR: A novel contrast enhancement algorithm based on the layered difference representation of 2D histograms is proposed, which enhances images efficiently in terms of both objective quality and subjective quality.
Abstract: A novel contrast enhancement algorithm based on the layered difference representation of 2D histograms is proposed in this paper. We attempt to enhance image contrast by amplifying the gray-level differences between adjacent pixels. To this end, we obtain the 2D histogram h(k, k+l) from an input image, which counts the pairs of adjacent pixels with gray-levels k and k+l, and represent the gray-level differences in a tree-like layered structure. Then, we formulate a constrained optimization problem based on the observation that the gray-level differences, occurring more frequently in the input image, should be more emphasized in the output image. We first solve the optimization problem to derive the transformation function at each layer. We then combine the transformation functions at all layers into the unified transformation function, which is used to map input gray-levels to output gray-levels. Experimental results demonstrate that the proposed algorithm enhances images efficiently in terms of both objective quality and subjective quality.
TL;DR: This paper proposes a novel online object tracking algorithm with sparse prototypes, which exploits both classic principal component analysis (PCA) algorithms with recent sparse representation schemes for learning effective appearance models, and introduces l1 regularization into the PCA reconstruction.
Abstract: Online object tracking is a challenging problem as it entails learning an effective model to account for appearance change caused by intrinsic and extrinsic factors. In this paper, we propose a novel online object tracking algorithm with sparse prototypes, which exploits both classic principal component analysis (PCA) algorithms with recent sparse representation schemes for learning effective appearance models. We introduce l1 regularization into the PCA reconstruction, and develop a novel algorithm to represent an object by sparse prototypes that account explicitly for data and noise. For tracking, objects are represented by the sparse prototypes learned online with update. In order to reduce tracking drift, we present a method that takes occlusion and motion blur into account rather than simply includes image observations for model update. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.
TL;DR: An approach that simultaneously utilizes both visual and textual information to estimate the relevance of user tagged images is proposed, and the relevance estimation is determined with a hypergraph learning approach.
Abstract: Due to the popularity of social media websites, extensive research efforts have been dedicated to tag-based social image search. Both visual information and tags have been investigated in the research field. However, most existing methods use tags and visual characteristics either separately or sequentially in order to estimate the relevance of images. In this paper, we propose an approach that simultaneously utilizes both visual and textual information to estimate the relevance of user tagged images. The relevance estimation is determined with a hypergraph learning approach. In this method, a social image hypergraph is constructed, where vertices represent images and hyperedges represent visual or textual terms. Learning is achieved with use of a set of pseudo-positive images, where the weights of hyperedges are updated throughout the learning process. In this way, the impact of different tags and visual words can be automatically modulated. Comparative results of the experiments conducted on a dataset including 370+images are presented, which demonstrate the effectiveness of the proposed approach.
TL;DR: This paper proposes to consider every two adjacent prediction-errors jointly to generate a sequence consisting of prediction-error pairs, and based on the sequence and the resulting 2D prediction- error histogram, a more efficient embedding strategy, namely, pairwise PEE, can be designed to achieve an improved performance.
Abstract: In prediction-error expansion (PEE) based reversible data hiding, better exploiting image redundancy usually leads to a superior performance. However, the correlations among prediction-errors are not considered and utilized in current PEE based methods. Specifically, in PEE, the prediction-errors are modified individually in data embedding. In this paper, to better exploit these correlations, instead of utilizing prediction-errors individually, we propose to consider every two adjacent prediction-errors jointly to generate a sequence consisting of prediction-error pairs. Then, based on the sequence and the resulting 2D prediction-error histogram, a more efficient embedding strategy, namely, pairwise PEE, can be designed to achieve an improved performance. The superiority of our method is verified through extensive experiments.
TL;DR: This paper introduces a new cluster-based algorithm for co-saliency detection that is mostly bottom-up without heavy learning, and outperforms most the state-of-the-art saliency detection methods.
Abstract: Co-saliency is used to discover the common saliency on the multiple images, which is a relatively underexplored area. In this paper, we introduce a new cluster-based algorithm for co-saliency detection. Global correspondence between the multiple images is implicitly learned during the clustering process. Three visual attention cues: contrast, spatial, and corresponding, are devised to effectively measure the cluster saliency. The final co-saliency maps are generated by fusing the single image saliency and multiimage saliency. The advantage of our method is mostly bottom-up without heavy learning, and has the property of being simple, general, efficient, and effective. Quantitative and qualitative experiments result in a variety of benchmark datasets demonstrating the advantages of the proposed method over the competing co-saliency methods. Our method on single image also outperforms most the state-of-the-art saliency detection methods. Furthermore, we apply the co-saliency method on four vision applications: co-segmentation, robust image distance, weakly supervised learning, and video foreground detection, which demonstrate the potential usages of the co-saliency map.
TL;DR: A patch-based noise level estimation algorithm that selects low-rank patches without high frequency components from a single noisy image and estimates the noise level based on the gradients of the patches and their statistics is proposed.
Abstract: Noise level is an important parameter to many image processing applications. For example, the performance of an image denoising algorithm can be much degraded due to the poor noise level estimation. Most existing denoising algorithms simply assume the noise level is known that largely prevents them from practical use. Moreover, even with the given true noise level, these denoising algorithms still cannot achieve the best performance, especially for scenes with rich texture. In this paper, we propose a patch-based noise level estimation algorithm and suggest that the noise level parameter should be tuned according to the scene complexity. Our approach includes the process of selecting low-rank patches without high frequency components from a single noisy image. The selection is based on the gradients of the patches and their statistics. Then, the noise level is estimated from the selected patches using principal component analysis. Because the true noise level does not always provide the best performance for nonblind denoising algorithms, we further tune the noise level parameter for nonblind denoising. Experiments demonstrate that both the accuracy and stability are superior to the state of the art noise level estimation algorithm for various scenes and noise levels.
TL;DR: This paper proposes a novel model for bottom-up saliency within the Bayesian framework by exploiting low and mid level cues and proposes an algorithm in which a coarse saliency region is first obtained via a convex hull of interest points.
Abstract: Visual saliency detection is a challenging problem in computer vision, but one of great importance and numerous applications. In this paper, we propose a novel model for bottom-up saliency within the Bayesian framework by exploiting low and mid level cues. In contrast to most existing methods that operate directly on low level cues, we propose an algorithm in which a coarse saliency region is first obtained via a convex hull of interest points. We also analyze the saliency information with mid level visual cues via superpixels. We present a Laplacian sparse subspace clustering method to group superpixels with local features, and analyze the results with respect to the coarse saliency region to compute the prior saliency map. We use the low level visual cues based on the convex hull to compute the observation likelihood, thereby facilitating inference of Bayesian saliency at each pixel. Extensive experiments on a large data set show that our Bayesian saliency model performs favorably against the state-of-the-art algorithms.
TL;DR: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data, and develops and integrated a novel encoder control that guarantees that high quality intermediate views can be generated based on the decoded data.
Abstract: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data. In addition to the known concept of disparity-compensated prediction, inter-view motion parameter, and inter-view residual prediction for coding of the dependent video views are developed and integrated. Furthermore, for depth coding, new intra coding modes, a modified motion compensation and motion vector coding as well as the concept of motion parameter inheritance are part of the HEVC extension. A novel encoder control uses view synthesis optimization, which guarantees that high quality intermediate views can be generated based on the decoded data. The bitstream format supports the extraction of partial bitstreams, so that conventional 2D video, stereo video, and the full multi-view video plus depth format can be decoded from a single bitstream. Objective and subjective results are presented, demonstrating that the proposed approach provides 50% bit rate savings in comparison with HEVC simulcast and 20% in comparison with a straightforward multi-view extension of HEVC without the newly developed coding tools.
TL;DR: This paper revisits the HS technique and presents a general framework to construct HS-based RDH, and shows that several RDH algorithms reported in the literature are special cases of this general construction.
Abstract: Histogram shifting (HS) is a useful technique of reversible data hiding (RDH). With HS-based RDH, high capacity and low distortion can be achieved efficiently. In this paper, we revisit the HS technique and present a general framework to construct HS-based RDH. By the proposed framework, one can get a RDH algorithm by simply designing the so-called shifting and embedding functions. Moreover, by taking specific shifting and embedding functions, we show that several RDH algorithms reported in the literature are special cases of this general construction. In addition, two novel and efficient RDH algorithms are also introduced to further demonstrate the universality and applicability of our framework. It is expected that more efficient RDH algorithms can be devised according to the proposed framework by carefully designing the shifting and embedding functions.
TL;DR: This paper incorporates the image nonlocal self-similarity into SRM for image interpolation, and shows that the NARM-induced sampling matrix is less coherent with the representation dictionary, and consequently makes SRM more effective forimage interpolation.
Abstract: Sparse representation is proven to be a promising approach to image super-resolution, where the low-resolution (LR) image is usually modeled as the down-sampled version of its high-resolution (HR) counterpart after blurring. When the blurring kernel is the Dirac delta function, i.e., the LR image is directly down-sampled from its HR counterpart without blurring, the super-resolution problem becomes an image interpolation problem. In such cases, however, the conventional sparse representation models (SRM) become less effective, because the data fidelity term fails to constrain the image local structures. In natural images, fortunately, many nonlocal similar patches to a given patch could provide nonlocal constraint to the local structure. In this paper, we incorporate the image nonlocal self-similarity into SRM for image interpolation. More specifically, a nonlocal autoregressive model (NARM) is proposed and taken as the data fidelity term in SRM. We show that the NARM-induced sampling matrix is less coherent with the representation dictionary, and consequently makes SRM more effective for image interpolation. Our extensive experimental results demonstrate that the proposed NARM-based image interpolation method can effectively reconstruct the edge structures and suppress the jaggy/ringing artifacts, achieving the best image interpolation results so far in terms of PSNR as well as perceptual quality metrics such as SSIM and FSIM.
TL;DR: The exact unbiased inverse of the Anscombe transformation is introduced and it is demonstrated that this exact inverse leads to state-of-the-art results without any notable increase in the computational complexity compared to the other inverses.
Abstract: Many digital imaging devices operate by successive photon-to-electron, electron-to-voltage, and voltage-to-digit conversions. These processes are subject to various signal-dependent errors, which are typically modeled as Poisson-Gaussian noise. The removal of such noise can be effected indirectly by applying a variance-stabilizing transformation (VST) to the noisy data, denoising the stabilized data with a Gaussian denoising algorithm, and finally applying an inverse VST to the denoised data. The generalized Anscombe transformation (GAT) is often used for variance stabilization, but its unbiased inverse transformation has not been rigorously studied in the past. We introduce the exact unbiased inverse of the GAT and show that it plays an integral part in ensuring accurate denoising results. We demonstrate that this exact inverse leads to state-of-the-art results without any notable increase in the computational complexity compared to the other inverses. We also show that this inverse is optimal in the sense that it can be interpreted as a maximum likelihood inverse. Moreover, we thoroughly analyze the behavior of the proposed inverse, which also enables us to derive a closed-form approximation for it. This paper generalizes our work on the exact unbiased inverse of the Anscombe transformation, which we have presented earlier for the removal of pure Poisson noise.
TL;DR: In this paper, the authors focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation.
Abstract: l 1-minimization refers to finding the minimum l1-norm solution to an underdetermined linear system \mbib=A\mbix. Under certain conditions as described in compressive sensing theory, the minimum l1-norm solution is also the sparsest solution. In this paper, we study the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution based on a classical convex optimization framework, known as augmented Lagrangian methods. We conduct extensive experiments to validate and compare its performance against several popular l1-minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate message passing, and TFOCS. To aid peer evaluation, the code for all the algorithms has been made publicly available.
TL;DR: This paper shows that the noise variance can be estimated as the smallest eigenvalue of the image block covariance matrix, which is at least 15 times faster than methods with similar accuracy, and at least two times more accurate than other methods.
Abstract: The problem of blind noise level estimation arises in many image processing applications, such as denoising, compression, and segmentation. In this paper, we propose a new noise level estimation method on the basis of principal component analysis of image blocks. We show that the noise variance can be estimated as the smallest eigenvalue of the image block covariance matrix. Compared with 13 existing methods, the proposed approach shows a good compromise between speed and accuracy. It is at least 15 times faster than methods with similar accuracy, and it is at least two times more accurate than other methods. Our method does not assume the existence of homogeneous areas in the input image and, hence, can successfully process images containing only textures.
TL;DR: A novel document image binarization technique that addresses issues ofSegmentation of text from badly degraded document images by using adaptive image contrast, a combination of the local image contrast and theLocal image gradient that is tolerant to text and background variation caused by different types of document degradations.
Abstract: Segmentation of text from badly degraded document images is a very challenging task due to the high inter/intra-variation between the document background and the foreground text of different document images. In this paper, we propose a novel document image binarization technique that addresses these issues by using adaptive image contrast. The adaptive image contrast is a combination of the local image contrast and the local image gradient that is tolerant to text and background variation caused by different types of document degradations. In the proposed technique, an adaptive contrast map is first constructed for an input degraded document image. The contrast map is then binarized and combined with Canny's edge map to identify the text stroke edge pixels. The document text is further segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. The proposed method is simple, robust, and involves minimum parameter tuning. It has been tested on three public datasets that are used in the recent document image binarization contest (DIBCO) 2009 & 2011 and handwritten-DIBCO 2010 and achieves accuracies of 93.5%, 87.8%, and 92.03%, respectively, that are significantly higher than or close to that of the best-performing methods reported in the three contests. Experiments on the Bickley diary dataset that consists of several challenging bad quality document images also show the superior performance of our proposed method, compared with other techniques.
TL;DR: A flexible framework that allows for LPC computation in arbitrary fractional scales is proposed and a new sharpness assessment algorithm is developed without referencing the original image to demonstrate competitive performance when compared with state-of-the-art algorithms.
Abstract: Sharpness is an important determinant in visual assessment of image quality. The human visual system is able to effortlessly detect blur and evaluate sharpness of visual images, but the underlying mechanism is not fully understood. Existing blur/sharpness evaluation algorithms are mostly based on edge width, local gradient, or energy reduction of global/local high frequency content. Here we understand the subject from a different perspective, where sharpness is identified as strong local phase coherence (LPC) near distinctive image features evaluated in the complex wavelet transform domain. Previous LPC computation is restricted to be applied to complex coefficients spread in three consecutive dyadic scales in the scale-space. Here we propose a flexible framework that allows for LPC computation in arbitrary fractional scales. We then develop a new sharpness assessment algorithm without referencing the original image. We use four subject-rated publicly available image databases to test the proposed algorithm, which demonstrates competitive performance when compared with state-of-the-art algorithms.
TL;DR: Wang et al. as discussed by the authors proposed a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients by assuming that the coding residual and the coding coefficient are respectively independent and identically distributed.
Abstract: Recently the sparse representation based classification (SRC) has been proposed for robust face recognition (FR). In SRC, the testing image is coded as a sparse linear combination of the training samples, and the representation fidelity is measured by the l2-norm or l1 -norm of the coding residual. Such a sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution, which may not be effective enough to describe the coding residual in practical FR systems. Meanwhile, the sparsity constraint on the coding coefficients makes the computational cost of SRC very high. In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients. By assuming that the coding residual and the coding coefficient are respectively independent and identically distributed, the RRC seeks for a maximum a posterior solution of the coding problem. An iteratively reweighted regularized robust coding (IR3C) algorithm is proposed to solve the RRC model efficiently. Extensive experiments on representative face databases demonstrate that the RRC is much more effective and efficient than state-of-the-art sparse representation based methods in dealing with face occlusion, corruption, lighting, and expression changes, etc.
TL;DR: A no-reference binocular image quality assessment model that operates on static stereoscopic images that significantly outperforms the conventional 2D full-reference QA algorithms applied to stereopairs, as well as the 3D full -reference IQA algorithms on asymmetrically distorted stereopair images.
Abstract: We develop a no-reference binocular image quality assessment model that operates on static stereoscopic images. The model deploys 2D and 3D features extracted from stereopairs to assess the perceptual quality they present when viewed stereoscopically. Both symmetric- and asymmetric-distorted stereopairs are handled by accounting for binocular rivalry using a classic linear rivalry model. The NSS features are used to train a support vector machine model to predict the quality of a tested stereopair. The model is tested on the LIVE 3D Image Quality Database, which includes both symmetric- and asymmetric-distorted stereoscopic 3D images. The experimental results show that our proposed model significantly outperforms the conventional 2D full-reference QA algorithms applied to stereopairs, as well as the 3D full-reference IQA algorithms on asymmetrically distorted stereopairs.
TL;DR: In this paper, multiview Hessian regularization (mHR) is proposed to solve the problem of bias in LR-based image annotation, which optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold.
Abstract: The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape, and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.