scispace - formally typeset
Search or ask a question

Showing papers on "Real image published in 2016"


Posted Content
TL;DR: This work develops a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors, and makes several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training.
Abstract: With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator's output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a 'self-regularization' term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.

1,059 citations


Posted Content
TL;DR: This work evaluates encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation, which allows to reconstruct and modify real images of faces conditioning on arbitrary attributes.
Abstract: Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications.

627 citations


Book ChapterDOI
08 Oct 2016
TL;DR: In this paper, a convolutional network is trained on renderings of synthetic 3D models of cars and chairs to predict an RGB image and a depth map of the object as seen from an arbitrary view.
Abstract: We present a convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object. Concretely, the network can predict an RGB image and a depth map of the object as seen from an arbitrary view. Several of these depth maps fused together give a full point cloud of the object. The point cloud can in turn be transformed into a surface mesh. The network is trained on renderings of synthetic 3D models of cars and chairs. It successfully deals with objects on cluttered background and generates reasonable predictions for real images of cars.

430 citations


Posted Content
TL;DR: In this paper, a collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision free flight.
Abstract: Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: this https URL

405 citations


Book ChapterDOI
08 Oct 2016
TL;DR: In this paper, the authors propose a method to understand 3D object structure from a single image by solving an optimization task given 2D keypoint positions, or training on synthetic data with ground truth 3D information.
Abstract: Understanding 3D object structure from a single image is an important but difficult task in computer vision, mostly due to the lack of 3D object annotations in real images. Previous work tackles this problem by either solving an optimization task given 2D keypoint positions, or training on synthetic data with ground truth 3D information.

348 citations


Posted Content
Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.

348 citations


Journal ArticleDOI
TL;DR: A new algorithm for automatic crack detection from 2D pavement images that provides very robust and precise results in a wide range of situations, in a fully unsupervised manner, which is beyond the current state of the art.
Abstract: This paper proposes a new algorithm for automatic crack detection from 2D pavement images. It strongly relies on the localization of minimal paths within each image, a path being a series of neighboring pixels and its score being the sum of their intensities. The originality of the approach stems from the proposed way to select a set of minimal paths and the two postprocessing steps introduced to improve the quality of the detection. Such an approach is a natural way to take account of both the photometric and geometric characteristics of pavement images. An intensive validation is performed on both synthetic and real images (from five different acquisition systems), with comparisons to five existing methods. The proposed algorithm provides very robust and precise results in a wide range of situations, in a fully unsupervised manner, which is beyond the current state of the art.

292 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: In this article, a CNN-based approach is proposed for reconstructing a 3D face from a single image, which is based on a convolutional-neural-network (CNN) architecture.
Abstract: Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.

266 citations


Posted Content
TL;DR: The proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image and successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.
Abstract: Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.

191 citations


Proceedings Article
05 Dec 2016
TL;DR: This paper introduces an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data to generate a large set of photorealistic synthetic images of humans with 3D pose annotations.
Abstract: This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images.

164 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: WarpNet as mentioned in this paper aligns an object in one image with a different object in another by using the output of the network as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation.
Abstract: We present an approach to matching images of objects in fine-grained datasets without using part annotations, with an application to the challenging problem of weakly supervised single-view reconstruction. This is in contrast to prior works that require part annotations, since matching objects across class and pose variations is challenging with appearance features alone. We overcome this challenge through a novel deep learning architecture, WarpNet, that aligns an object in one image with a different object in another. We exploit the structure of the fine-grained dataset to create artificial data for training this network in an unsupervised-discriminative learning approach. The output of the network acts as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation. On the CUB-200-2011 dataset of bird categories, we improve the AP over an appearance-only network by 13.6%. We further demonstrate that our WarpNet matches, together with the structure of fine-grained datasets, allow single-view reconstructions with quality comparable to using annotated point correspondences.

Posted Content
Wei Shen1, Rujie Liu1
TL;DR: Instead of manipulating the whole image, this work proposes to learn the corresponding residual image defined as the difference between images before and after the manipulation, so that the manipulation can be operated efficiently with modest pixel modification.
Abstract: Face attributes are interesting due to their detailed description of human faces. Unlike prior researches working on attribute prediction, we address an inverse and more challenging problem called face attribute manipulation which aims at modifying a face image according to a given attribute value. Instead of manipulating the whole image, we propose to learn the corresponding residual image defined as the difference between images before and after the manipulation. In this way, the manipulation can be operated efficiently with modest pixel modification. The framework of our approach is based on the Generative Adversarial Network. It consists of two image transformation networks and a discriminative network. The transformation networks are responsible for the attribute manipulation and its dual operation and the discriminative network is used to distinguish the generated images from real images. We also apply dual learning to allow transformation networks to learn from each other. Experiments show that residual images can be effectively learned and used for attribute manipulations. The generated images remain most of the details in attribute-irrelevant areas.

Proceedings Article
14 Oct 2016
TL;DR: A novel neural network architecture is introduced that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input, and it is shown that the GAN based approach performs best on real image data.
Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: This study presents a weakly supervised approach that discovers the discriminative structures of sketch images, given pairs of Sketch images and web images, using a deep convolutional neural network, named SketchNet.
Abstract: In this study, we present a weakly supervised approach that discovers the discriminative structures of sketch images, given pairs of sketch images and web images. In contrast to traditional approaches that use global appearance features or relay on keypoint features, our aim is to automatically learn the shared latent structures that exist between sketch images and real images, even when there are significant appearance differences across its relevant real images. To accomplish this, we propose a deep convolutional neural network, named SketchNet. We firstly develop a triplet composed of sketch, positive and negative real image as the input of our neural network. To discover the coherent visual structures between the sketch and its positive pairs, we introduce the softmax as the loss function. Then a ranking mechanism is introduced to make the positive pairs obtain a higher score comparing over negative ones to achieve robust representation. Finally, we formalize above-mentioned constrains into the unified objective function, and create an ensemble feature representation to describe the sketch images. Experiments on the TUBerlin sketch benchmark demonstrate the effectiveness of our model and show that deep feature representation brings substantial improvements over other state-of-the-art methods on sketch classification.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: In this article, a convolutional neural architecture is proposed to estimate reflectance maps of specular materials in natural lighting conditions, which can directly predict a reflectance map from the image itself.
Abstract: Undoing the image formation process and therefore decomposing appearance into its intrinsic properties is a challenging task due to the under-constrained nature of this inverse problem. While significant progress has been made on inferring shape, materials and illumination from images only, progress in an unconstrained setting is still limited. We propose a convolutional neural architecture to estimate reflectance maps of specular materials in natural lighting conditions. We achieve this in an end-to-end learning formulation that directly predicts a reflectance map from the image itself. We show how to improve estimates by facilitating additional supervision in an indirect scheme that first predicts surface orientation and afterwards predicts the reflectance map by a learning-based sparse data interpolation. In order to analyze performance on this difficult task, we propose a new challenge of Specular MAterials on SHapes with complex IllumiNation (SMASHINg) using both synthetic and real images. Furthermore, we show the application of our method to a range of image editing tasks on real images.

Journal ArticleDOI
01 Dec 2016
TL;DR: In this article, a spectral-spatial hyperspectral image classification method based on K nearest neighbor (KNN) is proposed, which consists of the following steps: first, the support vector machine is adopted to obtain the initial classification probability maps which reflect the probability that each pixel belongs to different classes, then, the obtained pixel-wise probability maps are refined with the proposed KNN filtering algorithm that is based on matching and averaging nonlocal neighborhoods.
Abstract: Fusion of spectral and spatial information is an effective way in improving the accuracy of hyperspectral image classification. In this paper, a novel spectral–spatial hyperspectral image classification method based on K nearest neighbor (KNN) is proposed, which consists of the following steps. First, the support vector machine is adopted to obtain the initial classification probability maps which reflect the probability that each hyperspectral pixel belongs to different classes. Then, the obtained pixel-wise probability maps are refined with the proposed KNN filtering algorithm that is based on matching and averaging nonlocal neighborhoods. The proposed method does not need sophisticated segmentation and optimization strategies while still being able to make full use of the nonlocal principle of real images by using KNN, and thus, providing competitive classification with fast computation. Experiments performed on two real hyperspectral data sets show that the classification results obtained by the proposed method are comparable to several recently proposed hyperspectral image classification methods.

Journal ArticleDOI
TL;DR: Comment on the scalability of the SpaceWarps system to the wide field survey era, based on the projection that searches of 105 images could be performed by a crowd of 105 volunteers in 6 days.
Abstract: We describe SpaceWarps, a novel gravitational lens discovery service that yields samples of high purity and completeness through crowd-sourced visual inspection. Carefully produced colour composite images are displayed to volunteers via a webbased classification interface, which records their estimates of the positions of candidate lensed features. Images of simulated lenses, as well as real images which lack lenses, are inserted into the image stream at random intervals; this training set is used to give the volunteers instantaneous feedback on their performance, as well as to calibrate a model of the system that provides dynamical updates to the probability that a classified image contains a lens. Low probability systems are retired from the site periodically, concentrating the sample towards a set of lens candidates. Having divided 160 square degrees of Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) imaging into some 430,000 overlapping 82 by 82 arcsecond tiles and displaying them on the site, we were joined by around 37,000 volunteers who contributed 11 million image classifications over the course of 8 months. This Stage 1 search reduced the sample to 3381 images containing candidates; these were then refined in Stage 2 to yield a sample that we expect to be over 90% complete and 30% pure, based on our analysis of the volunteers performance on training images. We comment on the scalability of the SpaceWarps system to the wide field survey era, based on our projection that searches of 105 images could be performed by a crowd of 105 volunteers in 6 days.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A novel blind image denoising algorithm which can cope with real-world noisy images even when the noise model is not provided is proposed, realized by modeling image noise with mixture of Gaussian distribution (MoG) which can approximate large varieties of continuous distributions.
Abstract: Traditional image denoising algorithms always assume the noise to be homogeneous white Gaussian distributed. However, the noise on real images can be much more complex empirically. This paper addresses this problem and proposes a novel blind image denoising algorithm which can cope with real-world noisy images even when the noise model is not provided. It is realized by modeling image noise with mixture of Gaussian distribution (MoG) which can approximate large varieties of continuous distributions. As the number of components for MoG is unknown practically, this work adopts Bayesian nonparametric technique and proposes a novel Low-rank MoG filter (LR-MoG) to recover clean signals (patches) from noisy ones contaminated by MoG noise. Based on LR-MoG, a novel blind image denoising approach is developed. To test the proposed method, this study conducts extensive experiments on synthesis and real images. Our method achieves the state-of the-art performance consistently.

Posted Content
TL;DR: In this paper, an image-based synthesis engine was proposed to generate a large set of photorealistic synthetic images of humans with 3D pose annotations using 3D Motion Capture (MoCap) data.
Abstract: This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images.

Proceedings ArticleDOI
25 Sep 2016
TL;DR: In this article, the authors propose a cost function that is optimized during training, based on the classical optical flow constraint, which is differentiable with respect to the motion field and allows backpropagation of the error to previous layers.
Abstract: Traditional methods for motion estimation estimate the motion field F between a pair of images as the one that minimizes a predesigned cost function. In this paper, we propose a direct method and train a Convolutional Neural Network (CNN) that when, at test time, is given a pair of images as input it produces a dense motion field F at its output layer. In the absence of large datasets with ground truth motion that would allow classical supervised training, we propose to train the network in an unsupervised manner. The proposed cost function that is optimized during training, is based on the classical optical flow constraint. The latter is differentiable with respect to the motion field and, therefore, allows backpropagation of the error to previous layers of the network. Our method is tested on both synthetic and real image sequences and performs similarly to the state-of-the-art methods.

Journal ArticleDOI
TL;DR: This study combines local statistics with the NLM filter to reduce speckle in ultrasound images and demonstrates that the proposed method outperforms the original NLM, as well as many previously developed methods.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A gradient activation method is introduced to automatically select a subset of gradients of the latent image in a cutting-plane-based optimization scheme for kernel estimation, which greatly improves the accuracy and flexibility and affords great convenience for handling noise and outliers.
Abstract: Blind image deconvolution is an ill-posed inverse problem which is often addressed through the application of appropriate prior. Although some priors are informative in general, many images do not strictly conform to this, leading to degraded performance in the kernel estimation. More critically, real images may be contaminated by nonuniform noise such as saturation and outliers. Methods for removing specific image areas based on some priors have been proposed, but they operate either manually or by defining fixed criteria. We show here that a subset of the image gradients are adequate to estimate the blur kernel robustly, no matter the gradient image is sparse or not. We thus introduce a gradient activation method to automatically select a subset of gradients of the latent image in a cutting-plane-based optimization scheme for kernel estimation. No extra assumption is used in our model, which greatly improves the accuracy and flexibility. More importantly, the proposed method affords great convenience for handling noise and outliers. Experiments on both synthetic data and real-world images demonstrate the effectiveness and robustness of the proposed method in comparison with the state-of-the-art methods.

Proceedings Article
01 Jan 2016
TL;DR: An auxiliary image regularization technique is proposed, optimized by the stochastic Alternating Direction Method of Multipliers (ADMM) algorithm, that automatically exploits the mutual context information among training images and encourages the model to select reliable images to robustify the learning process.
Abstract: Precisely-labeled data sets with sufficient amount of samples are notably important for training deep convolutional neural networks (CNNs). However, many of the available real-world data sets contain erroneously labeled samples and the error in labels of training sample makes it a daunting task to learn a well-performing deep CNN model. In this work, we consider the problem of training a deep CNN model for image classification with mislabeled training samples - an issue that is common in real image data sets with tags supplied by amateur users. To solve this problem, we propose an auxiliary image regularization technique, which automatically exploits the mutual context information among training images and encourages the model to select reliable images to robustify the learning process. Comprehensive experiments on benchmark data sets clearly demonstrate our proposed regularized CNN model is resistant to label noise in training data.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: This paper simultaneously explore spectral and spatial correlation via low-rank regularizations, and formulate the restoration problem into a variational optimization model, which can be solved via an iterative numerical algorithm.
Abstract: Conventional scanning and multiplexing techniques for hyperspectral imaging suffer from limited temporal and/or spatial resolution. To resolve this issue, coding techniques are becoming increasingly popular in developing snapshot systems for high-resolution hyperspectral imaging. For such systems, it is a critical task to accurately restore the 3D hyperspectral image from its corresponding coded 2D image. In this paper, we propose an effective method for coded hyperspectral image restoration, which exploits extensive structure sparsity in the hyperspectral image. Specifically, we simultaneously explore spectral and spatial correlation via low-rank regularizations, and formulate the restoration problem into a variational optimization model, which can be solved via an iterative numerical algorithm. Experimental results using both synthetic data and real images show that the proposed method can significantly outperform the state-of-the-art methods on several popular coding-based hyperspectral imaging systems.

Journal ArticleDOI
TL;DR: A robust 2D Otsu’s thresholding method that improves the robustness to Salt&Pepper noise and Gaussian noise significantly and introduces a region post-processing step to deal with the pixels of noise and edges.

Book ChapterDOI
20 Nov 2016
TL;DR: In this paper, the authors propose a non-convex objective function for fitting a 3D morphable model to single face images using only sparse geometric features (edges and landmark points), which can be viewed as forming soft correspondences between model and image edges.
Abstract: In this paper we explore the problem of fitting a 3D morphable model to single face images using only sparse geometric features (edges and landmark points). Previous approaches to this problem are based on nonlinear optimisation of an edge-derived cost that can be viewed as forming soft correspondences between model and image edges. We propose a novel approach, that explicitly computes hard correspondences. The resulting objective function is non-convex but we show that a good initialisation can be obtained efficiently using alternating linear least squares in a manner similar to the iterated closest point algorithm. We present experimental results on both synthetic and real images and show that our approach outperforms methods that use soft correspondence and other recent methods that rely solely on geometric features.

Posted Content
TL;DR: The authors propose to overcome the sparsity of supervision problem via synthetically generated images, and then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images.
Abstract: Distinguishing subtle differences in attributes is valuable, yet learning to make visual comparisons remains non-trivial. Not only is the number of possible comparisons quadratic in the number of training images, but also access to images adequately spanning the space of fine-grained visual differences is limited. We propose to overcome the sparsity of supervision problem via synthetically generated images. Building on a state-of-the-art image generation engine, we sample pairs of training images exhibiting slight modifications of individual attributes. Augmenting real training image pairs with these examples, we then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images. Our results on datasets of faces and fashion images show the great promise of bootstrapping imperfect image generators to counteract sample sparsity for learning to rank.

Journal ArticleDOI
Lu Jia1, Ming Li1, Peng Zhang1, Yan Wu1, Huahui Zhu1 
TL;DR: Experimental results on real images demonstrate the effectiveness, especially the strong noise immunity, of the LIMKKM method and illustrate that it is suitable for SAR image change detection.
Abstract: Performance of the k-means clustering algorithm for synthetic aperture radar (SAR) image change detection is usually worsened by the inherent existence of the speckle noise. Therefore, in this letter, an unsupervised multiple kernel k-means clustering algorithm with local-neighborhood information ( LIMKKM algorithm) is proposed for SAR image change detection. The LIMKKM algorithm contributes in two aspects. First, it fuses various features through a weighted summation kernel by automatically and optimally computing the kernel weights. Here, the intensity and texture features of the ratio image are fused. Second, it incorporates the local-neighborhood information into its clustering objective function for providing strong noise immunity. The LIMKKM change detection algorithm is carried out in a train–test way to lighten the computational burden. Experimental results on real images demonstrate the effectiveness, especially the strong noise immunity, of the LIMKKM method and illustrate that it is suitable for SAR image change detection.

Proceedings ArticleDOI
25 Feb 2016
TL;DR: This work explores the previously proposed approach of direct blind deconvolution and denoising with convolutional neural networks (CNN) in a situation where the blur kernels are partially constrained and evaluates the behavior and limits of the CNNs with respect to blur direction range and length.
Abstract: In this work we explore the previously proposed approach of direct blind deconvolution and denoising with convolutional neural networks (CNN) in a situation where the blur kernels are partially constrained. We focus on blurred images from a real-life traffic surveillance system, on which we, for the first time, demonstrate that neural networks trained on artificial data provide superior reconstruction quality on real images compared to traditional blind deconvolution methods. The training data is easy to obtain by blurring sharp photos from a target system with a very rough approximation of the expected blur kernels, thereby allowing custom CNNs to be trained for a specific application (image content and blur range). Additionally, we evaluate the behavior and limits of the CNNs with respect to blur direction range and length.

Journal ArticleDOI
TL;DR: This paper presents a blind deblurring pipeline able to restore real camera systems by slightly extending their DOF and recovering sharpness in regions slightly out of focus by relying first on the estimation of the spatially varying defocus blur.
Abstract: Real camera systems have a limited depth of field (DOF) which may cause an image to be degraded due to visible misfocus or too shallow DOF. In this paper, we present a blind deblurring pipeline able to restore such images by slightly extending their DOF and recovering sharpness in regions slightly out of focus. To address this severely ill-posed problem, our algorithm relies first on the estimation of the spatially varying defocus blur. Drawing on local frequency image features, a machine learning approach based on the recently introduced regression tree fields is used to train a model able to regress a coherent defocus blur map of the image, labeling each pixel by the scale of a defocus point spread function. A non-blind spatially varying deblurring algorithm is then used to properly extend the DOF of the image. The good performance of our algorithm is assessed both quantitatively, using realistic ground truth data obtained with a novel approach based on a plenoptic camera, and qualitatively with real images.