scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Image Processing in 2015"


Journal ArticleDOI
TL;DR: A simple but powerful color attenuation prior for haze removal from a single input hazy image is proposed and outperforms state-of-the-art haze removal algorithms in terms of both efficiency and the dehazing effect.
Abstract: Single image haze removal has been a challenging problem due to its ill-posed nature. In this paper, we propose a simple but powerful color attenuation prior for haze removal from a single input hazy image. By creating a linear model for modeling the scene depth of the hazy image under this novel prior and learning the parameters of the model with a supervised learning method, the depth information can be well recovered. With the depth map of the hazy image, we can easily estimate the transmission and restore the scene radiance via the atmospheric scattering model, and thus effectively remove the haze from a single image. Experimental results show that the proposed approach outperforms state-of-the-art haze removal algorithms in terms of both efficiency and the dehazing effect.

1,495 citations


Journal ArticleDOI
TL;DR: It is found that the models designed specifically for salient object detection generally work better than models in closely related areas, which provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems.
Abstract: We extensively compare, qualitatively and quantitatively, 41 state-of-the-art models (29 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over seven challenging data sets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted three years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for the state-of-the-art models, provide useful hints toward constructing more challenging large-scale data sets and better saliency models. Finally, we propose probable solutions for tackling several open problems, such as evaluation scores and data set bias, which also suggest future research directions in the rapidly growing field of salient object detection.

1,372 citations


Journal ArticleDOI
TL;DR: Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned [by deep neural networks (DNNs)].
Abstract: In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets for different tasks, including Labeled Faces in the Wild (LFW) for face verification; the MultiPIE, Extended Yale B, AR, Facial Recognition Technology (FERET) data sets for face recognition; and MNIST for hand-written digit recognition. Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned [by deep neural networks (DNNs)]. Even more surprisingly, the model sets new records for many classification tasks on the Extended Yale B, AR, and FERET data sets and on MNIST variations. Additional experiments on other public data sets also demonstrate the potential of PCANet to serve as a simple but highly competitive baseline for texture classification and object recognition.

1,034 citations


Journal ArticleDOI
TL;DR: The proposed opinion-unaware BIQA method does not need any distorted sample images nor subjective quality scores for training, yet extensive experiments demonstrate its superior quality-prediction performance to the state-of-the-art opinion-aware BIZA methods.
Abstract: Existing blind image quality assessment (BIQA) methods are mostly opinion-aware. They learn regression models from training images with associated human subjective scores to predict the perceptual quality of test images. Such opinion-aware methods, however, require a large amount of training samples with associated human subjective scores and of a variety of distortion types. The BIQA models learned by opinion-aware methods often have weak generalization capability, hereby limiting their usability in practice. By comparison, opinion-unaware methods do not need human subjective scores for training, and thus have greater potential for good generalization capability. Unfortunately, thus far no opinion-unaware BIQA method has shown consistently better quality prediction accuracy than the opinion-aware methods. Here, we aim to develop an opinion-unaware BIQA method that can compete with, and perhaps outperform, the existing opinion-aware methods. By integrating the features of natural image statistics derived from multiple cues, we learn a multivariate Gaussian model of image patches from a collection of pristine natural images. Using the learned multivariate Gaussian model, a Bhattacharyya-like distance is used to measure the quality of each image patch, and then an overall quality score is obtained by average pooling. The proposed BIQA method does not need any distorted sample images nor subjective quality scores for training, yet extensive experiments demonstrate its superior quality-prediction performance to the state-of-the-art opinion-aware BIQA methods. The MATLAB source code of our algorithm is publicly available at www.comp.polyu.edu.hk / $\sim $ cslzhang/IQA/ILNIQE/ILNIQE.htm.

783 citations


Journal ArticleDOI
TL;DR: This paper comprehensively encode 10 chromatic models into 16 carefully selected state-of-the-art visual trackers and performs detailed analysis on several issues, including the behavior of various combinations between color model and visual tracker, the degree of difficulty of each sequence for tracking, and how different challenge factors affect the tracking performance.
Abstract: While color information is known to provide rich discriminative clues for visual inference, most modern visual trackers limit themselves to the grayscale realm. Despite recent efforts to integrate color in tracking, there is a lack of comprehensive understanding of the role color information can play. In this paper, we attack this problem by conducting a systematic study from both the algorithm and benchmark perspectives. On the algorithm side, we comprehensively encode 10 chromatic models into 16 carefully selected state-of-the-art visual trackers. On the benchmark side, we compile a large set of 128 color sequences with ground truth and challenge factor annotations (e.g., occlusion). A thorough evaluation is conducted by running all the color-encoded trackers, together with two recently proposed color trackers. A further validation is conducted on an RGBD tracking benchmark. The results clearly show the benefit of encoding color information for tracking. We also perform detailed analysis on several issues, including the behavior of various combinations between color model and visual tracker, the degree of difficulty of each sequence for tracking, and how different challenge factors affect the tracking performance. We expect the study to provide the guidance, motivation, and benchmark for future work on encoding color in visual tracking.

684 citations


Journal ArticleDOI
TL;DR: In this article, a new underwater color image quality evaluation (UCIQE) metric is proposed to quantify the non-uniform color cast, blurring, and low contrast that characterize underwater engineering and monitoring images.
Abstract: Quality evaluation of underwater images is a key goal of underwater video image retrieval and intelligent processing. To date, no metric has been proposed for underwater color image quality evaluation (UCIQE). The special absorption and scattering characteristics of the water medium do not allow direct application of natural color image quality metrics especially to different underwater environments. In this paper, subjective testing for underwater image quality has been organized. The statistical distribution of the underwater image pixels in the CIELab color space related to subjective evaluation indicates the sharpness and colorful factors correlate well with subjective image quality perception. Based on these, a new UCIQE metric, which is a linear combination of chroma, saturation, and contrast, is proposed to quantify the non-uniform color cast, blurring, and low-contrast that characterize underwater engineering and monitoring images. Experiments are conducted to illustrate the performance of the proposed UCIQE metric and its capability to measure the underwater image enhancement results. They show that the proposed metric has comparable performance to the leading natural color image quality metrics and the underwater grayscale image quality metrics available in the literature, and can predict with higher accuracy the relative amount of degradation with similar image content in underwater environments. Importantly, UCIQE is a simple and fast solution for real-time underwater video processing. The effectiveness of the presented measure is also demonstrated by subjective evaluation. The results show better correlation between the UCIQE and the subjective mean opinion score.

638 citations


Journal ArticleDOI
TL;DR: This paper presents a universal pixel-level segmentation method that relies on spatiotemporal binary features as well as color information to detect changes, which allows camouflaged foreground objects to be detected more easily while most illumination variations are ignored.
Abstract: Foreground/background segmentation via change detection in video sequences is often used as a stepping stone in high-level analytics and applications. Despite the wide variety of methods that have been proposed for this problem, none has been able to fully address the complex nature of dynamic scenes in real surveillance tasks. In this paper, we present a universal pixel-level segmentation method that relies on spatiotemporal binary features as well as color information to detect changes. This allows camouflaged foreground objects to be detected more easily while most illumination variations are ignored. Besides, instead of using manually set, frame-wide constants to dictate model sensitivity and adaptation speed, we use pixel-level feedback loops to dynamically adjust our method’s internal parameters without user intervention. These adjustments are based on the continuous monitoring of model fidelity and local segmentation noise levels. This new approach enables us to outperform all 32 previously tested state-of-the-art methods on the 2012 and 2014 versions of the ChangeDetection.net dataset in terms of overall F-Measure. The use of local binary image descriptors for pixel-level modeling also facilitates high-speed parallel implementations: our own version, which used no low-level or architecture-specific instruction, reached real-time processing speed on a midlevel desktop CPU. A complete C++ implementation based on OpenCV is available online.

603 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel objective image quality assessment (IQA) algorithm for MEF images based on the principle of the structural similarity approach and a novel measure of patch structural consistency and shows that the proposed model well correlates with subjective judgments and significantly outperforms the existing IQA models for general image fusion.
Abstract: Multi-exposure image fusion (MEF) is considered an effective quality enhancement technique widely adopted in consumer electronics, but little work has been dedicated to the perceptual quality assessment of multi-exposure fused images. In this paper, we first build an MEF database and carry out a subjective user study to evaluate the quality of images generated by different MEF algorithms. There are several useful findings. First, considerable agreement has been observed among human subjects on the quality of MEF images. Second, no single state-of-the-art MEF algorithm produces the best quality for all test images. Third, the existing objective quality models for general image fusion are very limited in predicting perceived quality of MEF images. Motivated by the lack of appropriate objective models, we propose a novel objective image quality assessment (IQA) algorithm for MEF images based on the principle of the structural similarity approach and a novel measure of patch structural consistency. Our experimental results on the subjective database show that the proposed model well correlates with subjective judgments and significantly outperforms the existing IQA models for general image fusion. Finally, we demonstrate the potential application of the proposed model by automatically tuning the parameters of MEF algorithms. 1 The subjective database and the MATLAB code of the proposed model will be made available online. Preliminary results of Section III were presented at the 6th International Workshop on Quality of Multimedia Experience , Singapore, 2014.

530 citations


Journal ArticleDOI
TL;DR: A novel pose recovery method using non-linear mapping with multi-layered deep neural network and back-propagation deep learning to obtain a unified feature description by standard eigen-decomposition of the hypergraph Laplacian matrix.
Abstract: Video-based human pose recovery is usually conducted by retrieving relevant poses using image features. In the retrieving process, the mapping between 2D images and 3D poses is assumed to be linear in most of the traditional methods. However, their relationships are inherently non-linear, which limits recovery performance of these methods. In this paper, we propose a novel pose recovery method using non-linear mapping with multi-layered deep neural network. It is based on feature extraction with multimodal fusion and back-propagation deep learning. In multimodal fusion, we construct hypergraph Laplacian with low-rank representation. In this way, we obtain a unified feature description by standard eigen-decomposition of the hypergraph Laplacian matrix. In back-propagation deep learning, we learn a non-linear mapping from 2D images to 3D poses with parameter fine-tuning. The experimental results on three data sets show that the recovery error has been reduced by 20%–25%, which demonstrates the effectiveness of the proposed method.

515 citations


Journal ArticleDOI
TL;DR: The proposed model, called Fog Aware Density Evaluator (FADE), predicts the visibility of a foggy scene from a single image without reference to a corresponding fog-free image, without dependence on salient objects in a scene, without side geographical camera information, and without estimating a depth-dependent transmission map.
Abstract: We propose a referenceless perceptual fog density prediction model based on natural scene statistics (NSS) and fog aware statistical features. The proposed model, called Fog Aware Density Evaluator (FADE), predicts the visibility of a foggy scene from a single image without reference to a corresponding fog-free image, without dependence on salient objects in a scene, without side geographical camera information, without estimating a depth-dependent transmission map, and without training on human-rated judgments. FADE only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. Fog aware statistical features that define the perceptual fog density index derive from a space domain NSS model and the observed characteristics of foggy images. FADE not only predicts perceptual fog density for the entire image, but also provides a local fog density index for each patch. The predicted fog density using FADE correlates well with human judgments of fog density taken in a subjective study on a large foggy image database. As applications, FADE not only accurately assesses the performance of defogging algorithms designed to enhance the visibility of foggy images, but also is well suited for image defogging. A new FADE-based referenceless perceptual image defogging, dubbed DEnsity of Fog Assessment-based DEfogger (DEFADE) achieves better results for darker, denser foggy images as well as on standard foggy images than the state of the art defogging methods. A software release of FADE and DEFADE is available online for public use: http://live.ece.utexas.edu/research/fog/index.html .

510 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images, where they pose hashing learning as a problem of regularized similarity learning.
Abstract: Extracting informative image features and learning effective approximate hashing functions are two crucial steps in image retrieval. Conventional methods often study these two steps separately, e.g., learning hash functions from a predefined hand-crafted feature space. Meanwhile, the bit lengths of output hashing codes are preset in the most previous methods, neglecting the significance level of different bits and restricting their practical flexibility. To address these issues, we propose a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images. We pose hashing learning as a problem of regularized similarity learning. In particular, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between the matched pairs and the mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an end-to-end fashion, where discriminative image features and hash functions are simultaneously optimized. Furthermore, each bit of our hashing codes is unequally weighted, so that we can manipulate the code lengths by truncating the insignificant bits. Our framework outperforms state-of-the-arts on public benchmarks of similar image search and also achieves promising results in the application of person re-identification in surveillance. It is also shown that the generated bit-scalable hashing codes well preserve the discriminative powers with shorter code lengths.

Journal ArticleDOI
TL;DR: Experimental results show that the resultant algorithms produce images with better visual quality and at the same time halo artifacts can be reduced/avoided from appearing in the final images with negligible increment on running times.
Abstract: It is known that local filtering-based edge preserving smoothing techniques suffer from halo artifacts. In this paper, a weighted guided image filter (WGIF) is introduced by incorporating an edge-aware weighting into an existing guided image filter (GIF) to address the problem. The WGIF inherits advantages of both global and local smoothing filters in the sense that: 1) the complexity of the WGIF is O(N) for an image with N pixels, which is same as the GIF and 2) the WGIF can avoid halo artifacts like the existing global smoothing filters. The WGIF is applied for single image detail enhancement, single image haze removal, and fusion of differently exposed images. Experimental results show that the resultant algorithms produce images with better visual quality and at the same time halo artifacts can be reduced/avoided from appearing in the final images with negligible increment on running times.

Journal ArticleDOI
TL;DR: This paper proposes a fast multi-band image fusion algorithm, which combines a high-spatial low-spectral resolution image and a low-sp spatial high-spectrals resolution image, and exploits the properties of the circulant and downsampling matrices associated with the fusion problem.
Abstract: This paper proposes a fast multi-band image fusion algorithm, which combines a high-spatial low-spectral resolution image and a low-spatial high-spectral resolution image. The well admitted forward model is explored to form the likelihoods of the observations. Maximizing the likelihoods leads to solving a Sylvester equation. By exploiting the properties of the circulant and downsampling matrices associated with the fusion problem, a closed-form solution for the corresponding Sylvester equation is obtained explicitly, getting rid of any iterative update step. Coupled with the alternating direction method of multipliers and the block coordinate descent method, the proposed algorithm can be easily generalized to incorporate prior information for the fusion problem, allowing a Bayesian estimator. Simulation results show that the proposed algorithm achieves the same performance as the existing algorithms with the advantage of significantly decreasing the computational complexity of these algorithms.

Journal ArticleDOI
TL;DR: The proposed spatiotemporal saliency detection method is robust enough to estimate the object and background in complex scenes with various motion patterns and appearances and introduces local as well as global contrast saliency measures using the foreground and background information estimated from the gradient flow field.
Abstract: We present a novel spatiotemporal saliency detection method to estimate salient regions in videos based on the gradient flow field and energy optimization. The proposed gradient flow field incorporates two distinctive features: 1) intra-frame boundary information and 2) inter-frame motion information together for indicating the salient regions. Based on the effective utilization of both intra-frame and inter-frame information in the gradient flow field, our algorithm is robust enough to estimate the object and background in complex scenes with various motion patterns and appearances. Then, we introduce local as well as global contrast saliency measures using the foreground and background information estimated from the gradient flow field. These enhanced contrast saliency cues uniformly highlight an entire object. We further propose a new energy function to encourage the spatiotemporal consistency of the output saliency maps, which is seldom explored in previous video saliency methods. The experimental results show that the proposed algorithm outperforms state-of-the-art video saliency detection methods.

Journal ArticleDOI
TL;DR: A new no-reference (NR)/ blind sharpness metric in the autoregressive (AR) parameter space is established via the analysis of AR model parameters, first calculating the energy- and contrast-differences in the locally estimated AR coefficients in a pointwise way, and then quantifying the image sharpness with percentile pooling to predict the overall score.
Abstract: In this paper, we propose a new no-reference (NR)/ blind sharpness metric in the autoregressive (AR) parameter space. Our model is established via the analysis of AR model parameters, first calculating the energy- and contrast-differences in the locally estimated AR coefficients in a pointwise way, and then quantifying the image sharpness with percentile pooling to predict the overall score. In addition to the luminance domain, we further consider the inevitable effect of color information on visual perception to sharpness and thereby extend the above model to the widely used YIQ color space. Validation of our technique is conducted on the subsets with blurring artifacts from four large-scale image databases (LIVE, TID2008, CSIQ, and TID2013). Experimental results confirm the superiority and efficiency of our method over existing NR algorithms, the state-of-the-art blind sharpness/blurriness estimators, and classical full-reference quality evaluators. Furthermore, the proposed metric can be also extended to stereoscopic images based on binocular rivalry, and attains remarkably high performance on LIVE3D-I and LIVE3D-II databases.

Journal ArticleDOI
TL;DR: This paper study subspace clustering for multi-view data while keeping individual views well encapsulated, and presents a novel objective function coupled with an angular based regularizer that refines the angular-based data correlation.
Abstract: More often than not, a multimedia data described by multiple features, such as color and shape features, can be naturally decomposed of multi-views. Since multi-views provide complementary information to each other, great endeavors have been dedicated by leveraging multiple views instead of a single view to achieve the better clustering performance. To effectively exploit data correlation consensus among multi-views, in this paper, we study subspace clustering for multi-view data while keeping individual views well encapsulated. For characterizing data correlations, we generate a similarity matrix in a way that high affinity values are assigned to data objects within the same subspace across views, while the correlations among data objects from distinct subspaces are minimized. Before generating this matrix, however, we should consider that multi-view data in practice might be corrupted by noise. The corrupted data will significantly downgrade clustering results. We first present a novel objective function coupled with an angular based regularizer. By minimizing this function, multiple sparse vectors are obtained for each data object as its multiple representations. In fact, these sparse vectors result from reaching data correlation consensus on all views. For tackling noise corruption, we present a sparsity-based approach that refines the angular-based data correlation. Using this approach, a more ideal data similarity matrix is generated for multi-view data. Spectral clustering is then applied to the similarity matrix to obtain the final subspace clustering. Extensive experiments have been conducted to validate the effectiveness of our proposed approach.

Journal ArticleDOI
TL;DR: It is shown that the linear domain model can better represent prior information for better estimation of reflectance and illumination than the logarithmic domain.
Abstract: In this paper, a new probabilistic method for image enhancement is presented based on a simultaneous estimation of illumination and reflectance in the linear domain We show that the linear domain model can better represent prior information for better estimation of reflectance and illumination than the logarithmic domain A maximum a posteriori (MAP) formulation is employed with priors of both illumination and reflectance To estimate illumination and reflectance effectively, an alternating direction method of multipliers is adopted to solve the MAP problem The experimental results show the satisfactory performance of the proposed method to obtain reflectance and illumination with visually pleasing enhanced results and a promising convergence rate Compared with other testing methods, the proposed method yields comparable or better results on both subjective and objective assessments

Journal ArticleDOI
TL;DR: A novel face identification framework capable of handling the full range of pose variations within ±90° of yaw is proposed and consistently outperforms single-task-based baselines as well as state-of-the-art methods for the pose problem.
Abstract: Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification framework capable of handling the full range of pose variations within ±90° of yaw. The proposed framework first transforms the original pose-invariant face recognition problem into a partial frontal face recognition problem. A robust patch-based face representation scheme is then developed to represent the synthesized partial frontal faces. For each patch, a transformation dictionary is learnt under the proposed multi-task learning scheme. The transformation dictionary transforms the features of different poses into a discriminative subspace. Finally, face matching is performed at patch level rather than at the holistic level. Extensive and systematic experimentation on FERET, CMU-PIE, and Multi-PIE databases shows that the proposed method consistently outperforms single-task-based baselines as well as state-of-the-art methods for the pose problem. We further extend the proposed algorithm for the unconstrained face verification problem and achieve top-level performance on the challenging LFW data set.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed multiresolution-GFT scheme outperforms H.264 intra by 6.8 dB on average in peak signal-to-noise ratio at the same bit rate.
Abstract: Piecewise smooth (PWS) images (e.g., depth maps or animation images) contain unique signal characteristics such as sharp object boundaries and slowly varying interior surfaces. Leveraging on recent advances in graph signal processing, in this paper, we propose to compress the PWS images using suitable graph Fourier transforms (GFTs) to minimize the total signal representation cost of each pixel block, considering both the sparsity of the signal’s transform coefficients and the compactness of transform description. Unlike fixed transforms, such as the discrete cosine transform, we can adapt GFT to a particular class of pixel blocks. In particular, we select one among a defined search space of GFTs to minimize total representation cost via our proposed algorithms, leveraging on graph optimization techniques, such as spectral clustering and minimum graph cuts. Furthermore, for practical implementation of GFT, we introduce two techniques to reduce computation complexity. First, at the encoder, we low-pass filter and downsample a high-resolution (HR) pixel block to obtain a low-resolution (LR) one, so that a LR-GFT can be employed. At the decoder, upsampling and interpolation are performed adaptively along HR boundaries coded using arithmetic edge coding, so that sharp object boundaries can be well preserved. Second, instead of computing GFT from a graph in real-time via eigen-decomposition, the most popular LR-GFTs are pre-computed and stored in a table for lookup during encoding and decoding. Using depth maps and computer-graphics images as examples of the PWS images, experimental results show that our proposed multiresolution-GFT scheme outperforms H.264 intra by 6.8 dB on average in peak signal-to-noise ratio at the same bit rate.

Journal ArticleDOI
TL;DR: Both theoretical analysis and experimental results prove that the proposed gradient domain GIF can produce better resultant images, especially near the edges, where halos appear in the original GIF.
Abstract: Guided image filter (GIF) is a well-known local filter for its edge-preserving property and low computational complexity. Unfortunately, the GIF may suffer from halo artifacts, because the local linear model used in the GIF cannot represent the image well near some edges. In this paper, a gradient domain GIF is proposed by incorporating an explicit first-order edge-aware constraint. The edge-aware constraint makes edges be preserved better. To illustrate the efficiency of the proposed filter, the proposed gradient domain GIF is applied for single-image detail enhancement, tone mapping of high dynamic range images and image saliency detection. Both theoretical analysis and experimental results prove that the proposed gradient domain GIF can produce better resultant images, especially near the edges, where halos appear in the original GIF.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed algorithm detects and removes rain or snow streaks efficiently, outperforming conventional algorithms.
Abstract: A novel algorithm to remove rain or snow streaks from a video sequence using temporal correlation and low-rank matrix completion is proposed in this paper. Based on the observation that rain streaks are too small and move too fast to affect the optical flow estimation between consecutive frames, we obtain an initial rain map by subtracting temporally warped frames from a current frame. Then, we decompose the initial rain map into basis vectors based on the sparse representation, and classify those basis vectors into rain streak ones and outliers with a support vector machine. We then refine the rain map by excluding the outliers. Finally, we remove the detected rain streaks by employing a low-rank matrix completion technique. Furthermore, we extend the proposed algorithm to stereo video deraining. Experimental results demonstrate that the proposed algorithm detects and removes rain or snow streaks efficiently, outperforming conventional algorithms.

Journal ArticleDOI
TL;DR: A discriminative shared Gaussian process latent variable model (DS-GPLVM) for multiview and view-invariant classification of facial expressions from multiple views is proposed and validated.
Abstract: Images of facial expressions are often captured from various views as a result of either head movements or variable camera position. Existing methods for multiview and/or view-invariant facial expression recognition typically perform classification of the observed expression using either classifiers learned separately for each view or a single classifier learned for all views. However, these approaches ignore the fact that different views of a facial expression are just different manifestations of the same facial expression. By accounting for this redundancy, we can design more effective classifiers for the target task. To this end, we propose a discriminative shared Gaussian process latent variable model (DS-GPLVM) for multiview and view-invariant classification of facial expressions from multiple views. In this model, we first learn a discriminative manifold shared by multiple views of a facial expression. Subsequently, we perform facial expression classification in the expression manifold. Finally, classification of an observed facial expression is carried out either in the view-invariant manner (using only a single view of the expression) or in the multiview manner (using multiple views of the expression). The proposed model can also be used to perform fusion of different facial features in a principled manner. We validate the proposed DS-GPLVM on both posed and spontaneously displayed facial expressions from three publicly available datasets (MultiPIE, labeled face parts in the wild, and static facial expressions in the wild). We show that this model outperforms the state-of-the-art methods for multiview and view-invariant facial expression classification, and several state-of-the-art methods for multiview learning and feature fusion.

Journal ArticleDOI
TL;DR: The proposed joint sparse representation model dynamically removes unreliable features to be fused for tracking by using the advantages of sparse representation and is extended into a general kernelized framework, which is able to perform feature fusion on various kernel spaces.
Abstract: Visual tracking using multiple features has been proved as a robust approach because features could complement each other. Since different types of variations such as illumination, occlusion, and pose may occur in a video sequence, especially long sequence videos, how to properly select and fuse appropriate features has become one of the key problems in this approach. To address this issue, this paper proposes a new joint sparse representation model for robust feature-level fusion. The proposed method dynamically removes unreliable features to be fused for tracking by using the advantages of sparse representation. In order to capture the non-linear similarity of features, we extend the proposed method into a general kernelized framework, which is able to perform feature fusion on various kernel spaces. As a result, robust tracking performance is obtained. Both the qualitative and quantitative experimental results on publicly available videos show that the proposed method outperforms both sparse representation-based and fusion based-trackers.

Journal ArticleDOI
TL;DR: This work introduces a class of structured sparsity-inducing norms to model moving objects in videos and proposes a saliency measurement to dynamically estimate the support of the foreground.
Abstract: Low rank and sparse representation based methods, which make few specific assumptions about the background, have recently attracted wide attention in background modeling. With these methods, moving objects in the scene are modeled as pixel-wised sparse outliers. However, in many practical scenarios, the distributions of these moving parts are not truly pixel-wised sparse but structurally sparse. Meanwhile a robust analysis mechanism is required to handle background regions or foreground movements with varying scales. Based on these two observations, we first introduce a class of structured sparsity-inducing norms to model moving objects in videos. In our approach, we regard the observed sequence as being constituted of two terms, a low-rank matrix (background) and a structured sparse outlier matrix (foreground). Next, in virtue of adaptive parameters for dynamic videos, we propose a saliency measurement to dynamically estimate the support of the foreground. Experiments on challenging well known data sets demonstrate that the proposed approach outperforms the state-of-the-art methods and works effectively on a wide range of complex videos.

Journal ArticleDOI
TL;DR: A co-transduction algorithm is devised to fuse both boundary and objectness labels based on an inter propagation scheme, which achieves superior performance compared with the newest state-of-the-arts in terms of different evaluation metrics.
Abstract: In this paper, we propose a novel label propagation-based method for saliency detection. A key observation is that saliency in an image can be estimated by propagating the labels extracted from the most certain background and object regions. For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme. For images of complex scenes, we further deploy a three-cue-center-biased objectness measure to pick out and propagate foreground labels. A co-transduction algorithm is devised to fuse both boundary and objectness labels based on an inter propagation scheme. The compactness criterion decides whether the incorporation of objectness labels is necessary, thus greatly enhancing computational efficiency. Results on five benchmark data sets with pixelwise accurate annotations show that the proposed method achieves superior performance compared with the newest state-of-the-arts in terms of different evaluation metrics.

Journal ArticleDOI
TL;DR: An improved algorithm for the segmentation of cytoplasm and nuclei from clumps of overlapping cervical cells is presented and it is demonstrated that the method of cell nuclei segmentation is competitive when compared with the current state of the art.
Abstract: In this paper, we present an improved algorithm for the segmentation of cytoplasm and nuclei from clumps of overlapping cervical cells. This problem is notoriously difficult because of the degree of overlap among cells, the poor contrast of cell cytoplasm and the presence of mucus, blood, and inflammatory cells. Our methodology addresses these issues by utilizing a joint optimization of multiple level set functions, where each function represents a cell within a clump, that have both unary (intracell) and pairwise (intercell) constraints. The unary constraints are based on contour length, edge strength, and cell shape, while the pairwise constraint is computed based on the area of the overlapping regions. In this way, our methodology enables the analysis of nuclei and cytoplasm from both free-lying and overlapping cells. We provide a systematic evaluation of our methodology using a database of over 900 images generated by synthetically overlapping images of free-lying cervical cells, where the number of cells within a clump is varied from 2 to 10 and the overlap coefficient between pairs of cells from 0.1 to 0.5. This quantitative assessment demonstrates that our methodology can successfully segment clumps of up to 10 cells, provided the overlap between pairs of cells is $\sim 0.5$ . We also evaluate our approach quantitatively and qualitatively on a set of 16 extended depth of field images, where we are able to segment a total of 645 cells, of which only $\sim 10\%$ are free-lying. Finally, we demonstrate that our method of cell nuclei segmentation is competitive when compared with the current state of the art.

Journal ArticleDOI
TL;DR: A new video database is presented: CVD2014-Camera Video Database, which uses real cameras rather than introducing distortions via post-processing, which results in a complex distortion space in regard to the video acquisition process.
Abstract: This paper presents a new database, CID2013, to address the issue of using no-reference (NR) image quality assessment algorithms on images with multiple distortions. Current NR algorithms struggle to handle images with many concurrent distortion types, such as real photographic images captured by different digital cameras. The database consists of six image sets; on average, 30 subjects have evaluated 12–14 devices depicting eight different scenes for a total of 79 different cameras, 480 images, and 188 subjects (67% female). The subjective evaluation method was a hybrid absolute category rating-pair comparison developed for the study and presented in this paper. This method utilizes a slideshow of all images within a scene to allow the test images to work as references to each other. In addition to mean opinion score value, the images are also rated using sharpness, graininess, lightness, and color saturation scales. The CID2013 database contains images used in the experiments with the full subjective data plus extensive background information from the subjects. The database is made freely available for the research community.

Journal ArticleDOI
TL;DR: This paper proposes a novel computationally efficient single image SR method that learns multiple linear mappings (MLM) to directly transform LR feature subspaces into HR subsp spaces and indicates that this approach is both quantitatively and qualitatively superior to other application-oriented SR methods, while maintaining relatively low time and space complexity.
Abstract: Example learning-based superresolution (SR) algorithms show promise for restoring a high-resolution (HR) image from a single low-resolution (LR) input. The most popular approaches, however, are either time- or space-intensive, which limits their practical applications in many resource-limited settings. In this paper, we propose a novel computationally efficient single image SR method that learns multiple linear mappings (MLM) to directly transform LR feature subspaces into HR subspaces. In particular, we first partition the large nonlinear feature space of LR images into a cluster of linear subspaces. Multiple LR subdictionaries are then learned, followed by inferring the corresponding HR subdictionaries based on the assumption that the LR–HR features share the same representation coefficients. We establish MLM from the input LR features to the desired HR outputs in order to achieve fast yet stable SR recovery. Furthermore, in order to suppress displeasing artifacts generated by the MLM-based method, we apply a fast nonlocal means algorithm to construct a simple yet effective similarity-based regularization term for SR enhancement. Experimental results indicate that our approach is both quantitatively and qualitatively superior to other application-oriented SR methods, while maintaining relatively low time and space complexity.

Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed SCI perceptual quality assessment scheme, consisting of the objective metric and the weighting strategy, can achieve better performance than 11 state-of-the-art IQA methods.
Abstract: Research on screen content images (SCIs) becomes important as they are increasingly used in multi-device communication applications. In this paper, we present a study on perceptual quality assessment of distorted SCIs subjectively and objectively. We construct a large-scale screen image quality assessment database (SIQAD) consisting of 20 source and 980 distorted SCIs. In order to get the subjective quality scores and investigate, which part (text or picture) contributes more to the overall visual quality, the single stimulus methodology with 11 point numerical scale is employed to obtain three kinds of subjective scores corresponding to the entire, textual, and pictorial regions, respectively. According to the analysis of subjective data, we propose a weighting strategy to account for the correlation among these three kinds of subjective scores. Furthermore, we design an objective metric to measure the visual quality of distorted SCIs by considering the visual difference of textual and pictorial regions. The experimental results demonstrate that the proposed SCI perceptual quality assessment scheme, consisting of the objective metric and the weighting strategy, can achieve better performance than 11 state-of-the-art IQA methods. To the best of our knowledge, the SIQAD is the first large-scale database published for quality evaluation of SCIs, and this research is the first attempt to explore the perceptual quality assessment of distorted SCIs.

Journal ArticleDOI
TL;DR: This paper tentatively categorizes the stripes in remote sensing images in a more comprehensive manner and proposes to treat the multispectral images as a spectral-spatial volume and pose an anisotropic spectral- spatial total variation regularization to enhance the smoothness of solution along both the spectral and spatial dimension.
Abstract: Multispectral remote sensing images often suffer from the common problem of stripe noise, which greatly degrades the imaging quality and limits the precision of the subsequent processing. The conventional destriping approaches usually remove stripe noise band by band, and show their limitations on different types of stripe noise. In this paper, we tentatively categorize the stripes in remote sensing images in a more comprehensive manner. We propose to treat the multispectral images as a spectral-spatial volume and pose an anisotropic spectral-spatial total variation regularization to enhance the smoothness of solution along both the spectral and spatial dimension. As a result, a more comprehensive stripes and random noise are perfectly removed, while the edges and detail information are well preserved. In addition, the split Bregman iteration method is employed to solve the resulting minimization problem, which highly reduces the computational load. We extensively validate our method under various stripe categories and show comparison with other approaches with respect to result quality, running time, and quantitative assessments.