Showing papers in "IEEE Transactions on Image Processing in 2012"
TL;DR: Despite its simplicity, it is able to show that BRISQUE is statistically better than the full-reference peak signal-to-noise ratio and the structural similarity index, and is highly competitive with respect to all present-day distortion-generic NR IQA algorithms.
Abstract: We propose a natural scene statistic-based distortion-generic blind/no-reference (NR) image quality assessment (IQA) model that operates in the spatial domain. The new model, dubbed blind/referenceless image spatial quality evaluator (BRISQUE) does not compute distortion-specific features, such as ringing, blur, or blocking, but instead uses scene statistics of locally normalized luminance coefficients to quantify possible losses of “naturalness” in the image due to the presence of distortions, thereby leading to a holistic measure of quality. The underlying features used derive from the empirical distribution of locally normalized luminances and products of locally normalized luminances under a spatial natural scene statistic model. No transformation to another coordinate frame (DCT, wavelet, etc.) is required, distinguishing it from prior NR IQA approaches. Despite its simplicity, we are able to show that BRISQUE is statistically better than the full-reference peak signal-to-noise ratio and the structural similarity index, and is highly competitive with respect to all present-day distortion-generic NR IQA algorithms. BRISQUE has very low computational complexity, making it well suited for real time applications. BRISQUE features may be used for distortion-identification as well. To illustrate a new practical application of BRISQUE, we describe how a nonblind image denoising algorithm can be augmented with BRISQUE in order to perform blind image denoising. Results show that BRISQUE augmentation leads to performance improvements over state-of-the-art methods. A software release of BRISQUE is available online: http://live.ece.utexas.edu/research/quality/BRISQUE_release.zip for public use and evaluation.
TL;DR: An efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics model of discrete cosine transform (DCT) coefficients, which requires minimal training and adopts a simple probabilistic model for score prediction.
Abstract: We develop an efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics (NSS) model of discrete cosine transform (DCT) coefficients. The algorithm is computationally appealing, given the availability of platforms optimized for DCT computation. The approach relies on a simple Bayesian inference model to predict image quality scores given certain extracted features. The features are based on an NSS model of the image DCT coefficients. The estimated parameters of the model are utilized to form features that are indicative of perceptual quality. These features are used in a simple Bayesian inference approach to predict quality scores. The resulting algorithm, which we name BLIINDS-II, requires minimal training and adopts a simple probabilistic model for score prediction. Given the extracted features from a test image, the quality score that maximizes the probability of the empirically determined inference model is chosen as the predicted quality score of that image. When tested on the LIVE IQA database, BLIINDS-II is shown to correlate highly with human judgments of quality, at a level that is competitive with the popular SSIM index.
TL;DR: This paper demonstrates that the coupled dictionary learning method can outperform the existing joint dictionary training method both quantitatively and qualitatively and speed up the algorithm approximately 10 times by learning a neural network model for fast sparse inference and selectively processing only those visually salient regions.
Abstract: In this paper, we propose a novel coupled dictionary training method for single-image super-resolution (SR) based on patchwise sparse recovery, where the learned couple dictionaries relate the low- and high-resolution (HR) image patch spaces via sparse representation. The learning process enforces that the sparse representation of a low-resolution (LR) image patch in terms of the LR dictionary can well reconstruct its underlying HR image patch with the dictionary in the high-resolution image patch space. We model the learning problem as a bilevel optimization problem, where the optimization includes an l1-norm minimization problem in its constraints. Implicit differentiation is employed to calculate the desired gradient for stochastic gradient descent. We demonstrate that our coupled dictionary learning method can outperform the existing joint dictionary training method both quantitatively and qualitatively. Furthermore, for real applications, we speed up the algorithm approximately 10 times by learning a neural network model for fast sparse inference and selectively processing only those visually salient regions. Extensive experimental comparisons with state-of-the-art SR algorithms validate the effectiveness of our proposed approach.
TL;DR: A novel systematic approach to enhance underwater images by a dehazing algorithm, to compensate the attenuation discrepancy along the propagation path, and to take the influence of the possible presence of an artifical light source into consideration is proposed.
Abstract: Light scattering and color change are two major sources of distortion for underwater photography. Light scattering is caused by light incident on objects reflected and deflected multiple times by particles present in the water before reaching the camera. This in turn lowers the visibility and contrast of the image captured. Color change corresponds to the varying degrees of attenuation encountered by light traveling in the water with different wavelengths, rendering ambient underwater environments dominated by a bluish tone. No existing underwater processing techniques can handle light scattering and color change distortions suffered by underwater images, and the possible presence of artificial lighting simultaneously. This paper proposes a novel systematic approach to enhance underwater images by a dehazing algorithm, to compensate the attenuation discrepancy along the propagation path, and to take the influence of the possible presence of an artifical light source into consideration. Once the depth map, i.e., distances between the objects and the camera, is estimated, the foreground and background within a scene are segmented. The light intensities of foreground and background are compared to determine whether an artificial light source is employed during the image capturing process. After compensating the effect of artifical light, the haze phenomenon and discrepancy in wavelength attenuation along the underwater propagation path to camera are corrected. Next, the water depth in the image scene is estimated according to the residual energy ratios of different color channels existing in the background light. Based on the amount of attenuation corresponding to each light wavelength, color change compensation is conducted to restore color balance. The performance of the proposed algorithm for wavelength compensation and image dehazing (WCID) is evaluated both objectively and subjectively by utilizing ground-truth color patches and video downloaded from the Youtube website. Both results demonstrate that images with significantly enhanced visibility and superior color fidelity are obtained by the WCID proposed.
TL;DR: The proposed IQA scheme is designed to follow the masking effect and visibility threshold more closely, i.e., the case when both masked and masking signals are small is more effectively tackled by the proposed scheme.
Abstract: In this paper, we propose a new image quality assessment (IQA) scheme, with emphasis on gradient similarity. Gradients convey important visual information and are crucial to scene understanding. Using such information, structural and contrast changes can be effectively captured. Therefore, we use the gradient similarity to measure the change in contrast and structure in images. Apart from the structural/contrast changes, image quality is also affected by luminance changes, which must be also accounted for complete and more robust IQA. Hence, the proposed scheme considers both luminance and contrast-structural changes to effectively assess image quality. Furthermore, the proposed scheme is designed to follow the masking effect and visibility threshold more closely, i.e., the case when both masked and masking signals are small is more effectively tackled by the proposed scheme. Finally, the effects of the changes in luminance and contrast-structure are integrated via an adaptive method to obtain the overall image quality score. Extensive experiments conducted with six publicly available subject-rated databases (comprising of diverse images and distortion types) have confirmed the effectiveness, robustness, and efficiency of the proposed scheme in comparison with the relevant state-of-the-art schemes.
TL;DR: Wang et al. as discussed by the authors proposed a single image-based rain removal framework via properly formulating rain removal as an image decomposition problem based on morphological component analysis, which first decomposes an image into the low and high-frequency (HF) parts using a bilateral filter.
Abstract: Rain removal from a video is a challenging problem and has been recently investigated extensively. Nevertheless, the problem of rain removal from a single image was rarely studied in the literature, where no temporal information among successive images can be exploited, making the problem very challenging. In this paper, we propose a single-image-based rain removal framework via properly formulating rain removal as an image decomposition problem based on morphological component analysis. Instead of directly applying a conventional image decomposition technique, the proposed method first decomposes an image into the low- and high-frequency (HF) parts using a bilateral filter. The HF part is then decomposed into a “rain component” and a “nonrain component” by performing dictionary learning and sparse coding. As a result, the rain component can be successfully removed from the image while preserving most original image details. Experimental results demonstrate the efficacy of the proposed algorithm.
TL;DR: A novel image indexing and retrieval algorithm using local tetra patterns (LTrPs) for content-based image retrieval (CBIR) that encodes the relationship between the referenced pixel and its neighbors, based on the directions that are calculated using the first-order derivatives in vertical and horizontal directions.
Abstract: In this paper, we propose a novel image indexing and retrieval algorithm using local tetra patterns (LTrPs) for content-based image retrieval (CBIR). The standard local binary pattern (LBP) and local ternary pattern (LTP) encode the relationship between the referenced pixel and its surrounding neighbors by computing gray-level difference. The proposed method encodes the relationship between the referenced pixel and its neighbors, based on the directions that are calculated using the first-order derivatives in vertical and horizontal directions. In addition, we propose a generic strategy to compute nth-order LTrP using (n - 1)th-order horizontal and vertical derivatives for efficient CBIR and analyze the effectiveness of our proposed algorithm by combining it with the Gabor transform. The performance of the proposed method is compared with the LBP, the local derivative patterns, and the LTP based on the results obtained using benchmark image databases viz., Corel 1000 database (DB1), Brodatz texture database (DB2), and MIT VisTex database (DB3). Performance analysis shows that the proposed method improves the retrieval result from 70.34%/44.9% to 75.9%/48.7% in terms of average precision/average recall on database DB1, and from 79.97% to 85.30% and 82.23% to 90.02% in terms of average retrieval rate on databases DB2 and DB3, respectively, as compared with the standard LBP.
TL;DR: A hypergraph analysis approach to address the problem of view-based 3-D object retrieval and recognition by avoiding the estimation of the distance between objects by constructing multiple hypergraphs based on their 2-D views.
Abstract: View-based 3-D object retrieval and recognition has become popular in practice, e.g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object retrieval and recognition methods may not perform well. In this paper, we propose a hypergraph analysis approach to address this problem by avoiding the estimation of the distance between objects. In particular, we construct multiple hypergraphs for a set of 3-D objects based on their 2-D views. In these hypergraphs, each vertex is an object, and each edge is a cluster of views. Therefore, an edge connects multiple vertices. We define the weight of each edge based on the similarities between any two views within the cluster. Retrieval and recognition are performed based on the hypergraphs. Therefore, our method can explore the higher order relationship among objects and does not use the distance between objects. We conduct experiments on the National Taiwan University 3-D model dataset and the ETH 3-D object collection. Experimental results demonstrate the effectiveness of the proposed method by comparing with the state-of-the-art methods.
TL;DR: Simulation experiments show that the decoupled algorithm derived from the GNE formulation demonstrates the best numerical and visual results and shows superiority with respect to the state of the art in the field, confirming a valuable potential of BM3D-frames as an advanced image modeling tool.
Abstract: A family of the block matching 3-D (BM3D) algorithms for various imaging problems has been recently proposed within the framework of nonlocal patchwise image modeling , . In this paper, we construct analysis and synthesis frames, formalizing BM3D image modeling, and use these frames to develop novel iterative deblurring algorithms. We consider two different formulations of the deblurring problem, i.e., one given by the minimization of the single-objective function and another based on the generalized Nash equilibrium (GNE) balance of two objective functions. The latter results in the algorithm where deblurring and denoising operations are decoupled. The convergence of the developed algorithms is proved. Simulation experiments show that the decoupled algorithm derived from the GNE formulation demonstrates the best numerical and visual results and shows superiority with respect to the state of the art in the field, confirming a valuable potential of BM3D-frames as an advanced image modeling tool.
TL;DR: A new approach to improve the performance of finger-vein identification systems presented in the literature is presented and two new score-level combinations are developed and investigated, i.e., holistic and nonlinear fusion.
Abstract: This paper presents a new approach to improve the performance of finger-vein identification systems presented in the literature. The proposed system simultaneously acquires the finger-vein and low-resolution fingerprint images and combines these two evidences using a novel score-level combination strategy. We examine the previously proposed finger-vein identification approaches and develop a new approach that illustrates it superiority over prior published efforts. The utility of low-resolution fingerprint images acquired from a webcam is examined to ascertain the matching performance from such images. We develop and investigate two new score-level combinations, i.e., holistic and nonlinear fusion, and comparatively evaluate them with more popular score-level fusion approaches to ascertain their effectiveness in the proposed system. The rigorous experimental results presented on the database of 6264 images from 156 subjects illustrate significant improvement in the performance, i.e., both from the authentication and recognition experiments.
TL;DR: The formulation of Kronecker product matrices enables the derivation of analytical bounds for the sparse approximation of multidimensional signals and CS recovery performance, as well as a means of evaluating novel distributed measurement schemes.
Abstract: Compressive sensing (CS) is an emerging approach for the acquisition of signals having a sparse or compressible representation in some basis. While the CS literature has mostly focused on problems involving 1-D signals and 2-D images, many important applications involve multidimensional signals; the construction of sparsifying bases and measurement systems for such signals is complicated by their higher dimensionality. In this paper, we propose the use of Kronecker product matrices in CS for two purposes. First, such matrices can act as sparsifying bases that jointly model the structure present in all of the signal dimensions. Second, such matrices can represent the measurement protocols used in distributed settings. Our formulation enables the derivation of analytical bounds for the sparse approximation of multidimensional signals and CS recovery performance, as well as a means of evaluating novel distributed measurement schemes.
TL;DR: Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually and propose a maximum a posteriori probability framework for SR recovery.
Abstract: Image super-resolution (SR) reconstruction is essentially an ill-posed problem, so it is important to design an effective prior. For this purpose, we propose a novel image SR method by learning both non-local and local regularization priors from a given low-resolution image. The non-local prior takes advantage of the redundancy of similar patches in natural images, while the local prior assumes that a target pixel can be estimated by a weighted average of its neighbors. Based on the above considerations, we utilize the non-local means filter to learn a non-local prior and the steering kernel regression to learn a local prior. By assembling the two complementary regularization terms, we propose a maximum a posteriori probability framework for SR recovery. Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually.
TL;DR: An unsupervised distribution-free change detection approach for synthetic aperture radar (SAR) images based on an image fusion strategy and a novel fuzzy clustering algorithm that exhibited lower error than its preexistences.
Abstract: This paper presents an unsupervised distribution-free change detection approach for synthetic aperture radar (SAR) images based on an image fusion strategy and a novel fuzzy clustering algorithm. The image fusion technique is introduced to generate a difference image by using complementary information from a mean-ratio image and a log-ratio image. In order to restrain the background information and enhance the information of changed regions in the fused difference image, wavelet fusion rules based on an average operator and minimum local area energy are chosen to fuse the wavelet coefficients for a low-frequency band and a high-frequency band, respectively. A reformulated fuzzy local-information C-means clustering algorithm is proposed for classifying changed and unchanged regions in the fused difference image. It incorporates the information about spatial context in a novel fuzzy way for the purpose of enhancing the changed information and of reducing the effect of speckle noise. Experiments on real SAR images show that the image fusion strategy integrates the advantages of the log-ratio operator and the mean-ratio operator and gains a better performance. The change detection results obtained by the improved fuzzy clustering algorithm exhibited lower error than its preexistences.
TL;DR: A dual mathematical interpretation of the proposed framework with a structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared with traditional sparse inverse problem techniques.
Abstract: A general framework for solving image inverse problems with piecewise linear estimations is introduced in this paper. The approach is based on Gaussian mixture models, which are estimated via a maximum a posteriori expectation-maximization algorithm. A dual mathematical interpretation of the proposed framework with a structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared with traditional sparse inverse problem techniques. We demonstrate that, in a number of image inverse problems, including interpolation, zooming, and deblurring of narrow kernels, the same simple and computationally efficient algorithm yields results in the same ballpark as that of the state of the art.
TL;DR: Experimental results show that the proposed method outperforms the existing methods, in terms of image quality and recognition accuracy, as well as face super-resolution methods.
Abstract: This paper addresses the very low resolution (VLR) problem in face recognition in which the resolution of the face image to be recognized is lower than 16 × 16. With the increasing demand of surveillance camera-based applications, the VLR problem happens in many face application systems. Existing face recognition algorithms are not able to give satisfactory performance on the VLR face image. While face super-resolution (SR) methods can be employed to enhance the resolution of the images, the existing learning-based face SR methods do not perform well on such a VLR face image. To overcome this problem, this paper proposes a novel approach to learn the relationship between the high-resolution image space and the VLR image space for face SR. Based on this new approach, two constraints, namely, new data and discriminative constraints, are designed for good visuality and face recognition applications under the VLR problem, respectively. Experimental results show that the proposed SR algorithm based on relationship learning outperforms the existing algorithms in public face databases.
TL;DR: Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements and significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions.
Abstract: Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for image recovery. In the context of compressive sensing, significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Additionally, we consider simpler (incomplete) measurements, defined by measuring a subset of image pixels, uniformly selected at random. Spatial interrelationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other methods in the literature.
TL;DR: This paper proposes an adaptive hypergraph learning method for transductive image classification that simultaneously learns the labels of unlabeled images and the weights of hyperedges and can automatically modulate the effects of different hyperedge effects.
Abstract: Recent years have witnessed a surge of interest in graph-based transductive image classification. Existing simple graph-based transductive learning methods only model the pairwise relationship of images, however, and they are sensitive to the radius parameter used in similarity calculation. Hypergraph learning has been investigated to solve both difficulties. It models the high-order relationship of samples by using a hyperedge to link multiple samples. Nevertheless, the existing hypergraph learning methods face two problems, i.e., how to generate hyperedges and how to handle a large set of hyperedges. This paper proposes an adaptive hypergraph learning method for transductive image classification. In our method, we generate hyperedges by linking images and their nearest neighbors. By varying the size of the neighborhood, we are able to generate a set of hyperedges for each image and its visual neighbors. Our method simultaneously learns the labels of unlabeled images and the weights of hyperedges. In this way, we can automatically modulate the effects of different hyperedges. Thorough empirical studies show the effectiveness of our approach when compared with representative baselines.
TL;DR: An approach to the problem of estimating the size of inhomogeneous crowds, which are composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking is proposed, using the mixture of dynamic-texture motion model and Bayesian regression.
Abstract: An approach to the problem of estimating the size of inhomogeneous crowds, which are composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking is proposed. Instead, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic-texture motion model. A set of holistic low-level features is extracted from each segmented region, and a function that maps features into estimates of the number of people per segment is learned with Bayesian regression. Two Bayesian regression models are examined. The first is a combination of Gaussian process regression with a compound kernel, which accounts for both the global and local trends of the count mapping but is limited by the real-valued outputs that do not match the discrete counts. We address this limitation with a second model, which is based on a Bayesian treatment of Poisson regression that introduces a prior distribution on the linear weights of the model. Since exact inference is analytically intractable, a closed-form approximation is derived that is computationally efficient and kernelizable, enabling the representation of nonlinear functions. An approximate marginal likelihood is also derived for kernel hyperparameter learning. The two regression-based crowd counting methods are evaluated on a large pedestrian data set, containing very distinct camera views, pedestrian traffic, and outliers, such as bikes or skateboarders. Experimental results show that regression-based counts are accurate regardless of the crowd size, outperforming the count estimates produced by state-of-the-art pedestrian detectors. Results on 2 h of video demonstrate the efficiency and robustness of the regression-based crowd size estimation over long periods of time.
TL;DR: S3 can generate sharpness maps, which are highly correlated with the human-subject maps, and is demonstrated to have utility for within-image and across-image sharpness prediction, no-reference image quality assessment of blurred images, and monotonic estimation of the standard deviation of the impulse response in Gaussian blurring.
Abstract: This paper presents an algorithm designed to measure the local perceived sharpness in an image. Our method utilizes both spectral and spatial properties of the image: For each block, we measure the slope of the magnitude spectrum and the total spatial variation. These measures are then adjusted to account for visual perception, and then, the adjusted measures are combined via a weighted geometric mean. The resulting measure, i.e., S3 (spectral and spatial sharpness), yields a perceived sharpness map in which greater values denote perceptually sharper regions. This map can be collapsed into a single index, which quantifies the overall perceived sharpness of the whole image. We demonstrate the utility of the S3 measure for within-image and across-image sharpness prediction, no-reference image quality assessment of blurred images, and monotonic estimation of the standard deviation of the impulse response used in Gaussian blurring. We further evaluate the accuracy of S3 in local sharpness estimation by comparing S3 maps to sharpness maps generated by human subjects. We show that S3 can generate sharpness maps, which are highly correlated with the human-subject maps.
TL;DR: The proposed video features can effectively deal with rotation variations of dynamic textures (DTs) and are robust with respect to changes in viewpoint, outperforming recent methods proposed for view-invariant recognition of DTs.
Abstract: In this paper, we propose a novel approach to compute rotation-invariant features from histograms of local noninvariant patterns. We apply this approach to both static and dynamic local binary pattern (LBP) descriptors. For static-texture description, we present LBP histogram Fourier (LBP-HF) features, and for dynamic-texture recognition, we present two rotation-invariant descriptors computed from the LBPs from three orthogonal planes (LBP-TOP) features in the spatiotemporal domain. LBP-HF is a novel rotation-invariant image descriptor computed from discrete Fourier transforms of LBP histograms. The approach can be also generalized to embed any uniform features into this framework, and combining the supplementary information, e.g., sign and magnitude components of the LBP, together can improve the description ability. Moreover, two variants of rotation-invariant descriptors are proposed to the LBP-TOP, which is an effective descriptor for dynamic-texture recognition, as shown by its recent success in different application problems, but it is not rotation invariant. In the experiments, it is shown that the LBP-HF and its extensions outperform noninvariant and earlier versions of the rotation-invariant LBP in the rotation-invariant texture classification. In experiments on two dynamic-texture databases with rotations or view variations, the proposed video features can effectively deal with rotation variations of dynamic textures (DTs). They also are robust with respect to changes in viewpoint, outperforming recent methods proposed for view-invariant recognition of DTs.
TL;DR: A series of normalized and generalized metrics based on the important ingredients of SSIM are constructed and it is shown that such modified measures are valid distance metrics and have many useful properties, among which the most significant ones include quasi-convexity, a region of convexity around the minimizer, and distance preservation under orthogonal or unitary transformations.
Abstract: Since its introduction in 2004, the structural similarity (SSIM) index has gained widespread popularity as a tool to assess the quality of images and to evaluate the performance of image processing algorithms and systems. There has been also a growing interest of using SSIM as an objective function in optimization problems in a variety of image processing applications. One major issue that could strongly impede the progress of such efforts is the lack of understanding of the mathematical properties of the SSIM measure. For example, some highly desirable properties such as convexity and triangular inequality that are possessed by the mean squared error may not hold. In this paper, we first construct a series of normalized and generalized (vector-valued) metrics based on the important ingredients of SSIM. We then show that such modified measures are valid distance metrics and have many useful properties, among which the most significant ones include quasi-convexity, a region of convexity around the minimizer, and distance preservation under orthogonal or unitary transformations. The groundwork laid here extends the potentials of SSIM in both theoretical development and practical applications.
TL;DR: Experimental results prove the effectiveness of the proposed video filtering algorithm in terms of both subjective and objective visual quality, and show that it outperforms the state of the art in video denoising.
Abstract: We propose a powerful video filtering algorithm that exploits temporal and spatial redundancy characterizing natural video sequences. The algorithm implements the paradigm of nonlocal grouping and collaborative filtering, where a higher dimensional transform-domain representation of the observations is leveraged to enforce sparsity, and thus regularize the data: 3-D spatiotemporal volumes are constructed by tracking blocks along trajectories defined by the motion vectors. Mutually similar volumes are then grouped together by stacking them along an additional fourth dimension, thus producing a 4-D structure, termed group, where different types of data correlation exist along the different dimensions: local correlation along the two dimensions of the blocks, temporal correlation along the motion trajectories, and nonlocal spatial correlation (i.e., self-similarity) along the fourth dimension of the group. Collaborative filtering is then realized by transforming each group through a decorrelating 4-D separable transform and then by shrinkage and inverse transformation. In this way, the collaborative filtering provides estimates for each volume stacked in the group, which are then returned and adaptively aggregated to their original positions in the video. The proposed filtering procedure addresses several video processing applications, such as denoising, deblocking, and enhancement of both grayscale and color data. Experimental results prove the effectiveness of our method in terms of both subjective and objective visual quality, and show that it outperforms the state of the art in video denoising.
TL;DR: Experimental results demonstrate that the proposed reranking approach is more robust than using each individual modality, and it also performs better than many existing methods.
Abstract: This paper introduces a web image search reranking approach that explores multiple modalities in a graph-based learning scheme. Different from the conventional methods that usually adopt a single modality or integrate multiple modalities into a long feature vector, our approach can effectively integrate the learning of relevance scores, weights of modalities, and the distance metric and its scaling for each modality into a unified scheme. In this way, the effects of different modalities can be adaptively modulated and better reranking performance can be achieved. We conduct experiments on a large dataset that contains more than 1000 queries and 1 million images to evaluate our approach. Experimental results demonstrate that the proposed reranking approach is more robust than using each individual modality, and it also performs better than many existing methods.
TL;DR: A hybrid scheme is proposed to combine head pose and eye location information to obtain enhanced gaze estimation, which improves the accuracy of eye estimations and considerably extends its operating range by more than 15° by overcoming the problems introduced by extreme head poses.
Abstract: Head pose and eye location for gaze estimation have been separately studied in numerous works in the literature. Previous research shows that satisfactory accuracy in head pose and eye location estimation can be achieved in constrained settings. However, in the presence of nonfrontal faces, eye locators are not adequate to accurately locate the center of the eyes. On the other hand, head pose estimation techniques are able to deal with these conditions; hence, they may be suited to enhance the accuracy of eye localization. Therefore, in this paper, a hybrid scheme is proposed to combine head pose and eye location information to obtain enhanced gaze estimation. To this end, the transformation matrix obtained from the head pose is used to normalize the eye regions, and in turn, the transformation matrix generated by the found eye location is used to correct the pose estimation procedure. The scheme is designed to enhance the accuracy of eye location estimations, particularly in low-resolution videos, to extend the operative range of the eye locators, and to improve the accuracy of the head pose tracker. These enhanced estimations are then combined to obtain a novel visual gaze estimation system, which uses both eye location and head information to refine the gaze estimates. From the experimental results, it can be derived that the proposed unified scheme improves the accuracy of eye estimations by 16% to 23%. Furthermore, it considerably extends its operating range by more than 15° by overcoming the problems introduced by extreme head poses. Moreover, the accuracy of the head pose tracker is improved by 12% to 24%. Finally, the experimentation on the proposed combined gaze estimation system shows that it is accurate (with a mean error between 2° and 5°) and that it can be used in cases where classic approaches would fail without imposing restraints on the position of the head.
TL;DR: The problem of automatic “reduced-reference” image quality assessment (QA) algorithms from the point of view of image information change is studied and algorithms that require just a single number from the reference for QA are shown to correlate very well with subjective quality scores.
Abstract: We study the problem of automatic “reduced-reference” image quality assessment (QA) algorithms from the point of view of image information change. Such changes are measured between the reference- and natural-image approximations of the distorted image. Algorithms that measure differences between the entropies of wavelet coefficients of reference and distorted images, as perceived by humans, are designed. The algorithms differ in the data on which the entropy difference is calculated and on the amount of information from the reference that is required for quality computation, ranging from almost full information to almost no information from the reference. A special case of these is algorithms that require just a single number from the reference for QA. The algorithms are shown to correlate very well with subjective quality scores, as demonstrated on the Laboratory for Image and Video Engineering Image Quality Assessment Database and the Tampere Image Database. Performance degradation, as the amount of information is reduced, is also studied.
TL;DR: Two applications of the proposed multitask joint sparse representation model to combine the strength of multiple features and/or instances for recognition are investigated: fusing multiple kernel features for object categorization and robust face recognition in video with an ensemble of query images.
Abstract: We address the problem of visual classification with multiple features and/or multiple instances. Motivated by the recent success of multitask joint covariate selection, we formulate this problem as a multitask joint sparse representation model to combine the strength of multiple features and/or instances for recognition. A joint sparsity-inducing norm is utilized to enforce class-level joint sparsity patterns among the multiple representation vectors. The proposed model can be efficiently optimized by a proximal gradient method. Furthermore, we extend our method to the setup where features are described in kernel matrices. We then investigate into two applications of our method to visual classification: 1) fusing multiple kernel features for object categorization and 2) robust face recognition in video with an ensemble of query images. Extensive experiments on challenging real-world data sets demonstrate that the proposed method is competitive to the state-of-the-art methods in respective applications.
TL;DR: This paper proposes a patch-based Wiener filter that exploits patch redundancy for image denoising that is on par or exceeding the current state of the art, both visually and quantitatively.
Abstract: In this paper, we propose a denoising method motivated by our previous analysis of the performance bounds for image denoising. Insights from that study are used here to derive a high-performance practical denoising algorithm. We propose a patch-based Wiener filter that exploits patch redundancy for image denoising. Our framework uses both geometrically and photometrically similar patches to estimate the different filter parameters. We describe how these parameters can be accurately estimated directly from the input noisy image. Our denoising approach, designed for near-optimal performance (in the mean-squared error sense), has a sound statistical foundation that is analyzed in detail. The performance of our approach is experimentally verified on a variety of images and noise levels. The results presented here demonstrate that our proposed method is on par or exceeding the current state of the art, both visually and quantitatively.
TL;DR: The proposed image retargeting algorithm effectively preserves the visually important regions for images, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-the-art algorithms, as demonstrated with the in-depth analysis in the extensive experiments.
Abstract: Saliency detection plays important roles in many image processing applications, such as regions of interest extraction and image resizing. Existing saliency detection models are built in the uncompressed domain. Since most images over Internet are typically stored in the compressed domain such as joint photographic experts group (JPEG), we propose a novel saliency detection model in the compressed domain in this paper. The intensity, color, and texture features of the image are extracted from discrete cosine transform (DCT) coefficients in the JPEG bit-stream. Saliency value of each DCT block is obtained based on the Hausdorff distance calculation and feature map fusion. Based on the proposed saliency detection model, we further design an adaptive image retargeting algorithm in the compressed domain. The proposed image retargeting algorithm utilizes multioperator operation comprised of the block-based seam carving and the image scaling to resize images. A new definition of texture homogeneity is given to determine the amount of removal block-based seams. Thanks to the directly derived accurate saliency information from the compressed domain, the proposed image retargeting algorithm effectively preserves the visually important regions for images, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-the-art algorithms, as demonstrated with the in-depth analysis in the extensive experiments.
TL;DR: A sparse neighbor selection scheme for SR reconstruction is proposed that can achieve competitive SR quality compared with other state-of-the-art baselines and develop an extended Robust-SL0 algorithm to simultaneously find the neighbors and to solve the reconstruction weights.
Abstract: Until now, neighbor-embedding-based (NE) algorithms for super-resolution (SR) have carried out two independent processes to synthesize high-resolution (HR) image patches. In the first process, neighbor search is performed using the Euclidean distance metric, and in the second process, the optimal weights are determined by solving a constrained least squares problem. However, the separate processes are not optimal. In this paper, we propose a sparse neighbor selection scheme for SR reconstruction. We first predetermine a larger number of neighbors as potential candidates and develop an extended Robust-SL0 algorithm to simultaneously find the neighbors and to solve the reconstruction weights. Recognizing that the k-nearest neighbor (k-NN) for reconstruction should have similar local geometric structures based on clustering, we employ a local statistical feature, namely histograms of oriented gradients (HoG) of low-resolution (LR) image patches, to perform such clustering. By conveying local structural information of HoG in the synthesis stage, the k-NN of each LR input patch is adaptively chosen from their associated subset, which significantly improves the speed of synthesizing the HR image while preserving the quality of reconstruction. Experimental results suggest that the proposed method can achieve competitive SR quality compared with other state-of-the-art baselines.
TL;DR: It is concluded that better dehazed performance with fewer artifacts and better coding efficiency is achieved when the dehazing is applied before compression.
Abstract: This paper makes an investigation of the dehazing effects on image and video coding for surveillance systems. The goal is to achieve good dehazed images and videos at the receiver while sustaining low bitrates (using compression) in the transmission pipeline. At first, this paper proposes a novel method for single-image dehazing, which is used for the investigation. It operates at a faster speed than current methods and can avoid halo effects by using the median operation. We then consider the dehazing effects in compression by investigating the coding artifacts and motion estimation in cases of applying any dehazing method before or after compression. We conclude that better dehazing performance with fewer artifacts and better coding efficiency is achieved when the dehazing is applied before compression. Simulations for Joint Photographers Expert Group images in addition to subjective and objective tests with H.264 compressed sequences validate our conclusion.