Top 369 papers published in the topic of Real image in 2016

Posted Content•

Learning from Simulated and Unsupervised Images through Adversarial Training

[...]

Ashish Shrivastava¹, Tomas Pfister¹, Oncel Tuzel¹, Joshua M. Susskind¹, Wenda Wang¹, Russell Webb¹ - Show less +2 more•Institutions (1)

Apple Inc.¹

22 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work develops a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors, and makes several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training.

...read moreread less

Abstract: With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator's output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a 'self-regularization' term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.

...read moreread less

1,059 citations

Posted Content•

Invertible Conditional GANs for image editing.

[...]

Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Alvarez

19 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work evaluates encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation, which allows to reconstruct and modify real images of faces conditioning on arbitrary attributes.

...read moreread less

Abstract: Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications.

...read moreread less

627 citations

Book Chapter•DOI•

Multi-view 3D Models from Single Images with a Convolutional Network

[...]

Maxim Tatarchenko¹, Alexey Dosovitskiy¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

08 Oct 2016

TL;DR: In this paper, a convolutional network is trained on renderings of synthetic 3D models of cars and chairs to predict an RGB image and a depth map of the object as seen from an arbitrary view.

...read moreread less

Abstract: We present a convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object. Concretely, the network can predict an RGB image and a depth map of the object as seen from an arbitrary view. Several of these depth maps fused together give a full point cloud of the object. The point cloud can in turn be transformed into a surface mesh. The network is trained on renderings of synthetic 3D models of cars and chairs. It successfully deals with objects on cluttered background and generates reasonable predictions for real images of cars.

...read moreread less

430 citations

Posted Content•

CAD2RL: Real Single-Image Flight without a Single Real Image

[...]

Fereshteh Sadeghi¹, Sergey Levine²•Institutions (2)

University of Washington¹, Google²

13 Nov 2016-arXiv: Learning

TL;DR: In this paper, a collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision free flight.

...read moreread less

Abstract: Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: this https URL

...read moreread less

405 citations

Book Chapter•DOI•

Single Image 3D Interpreter Network

[...]

Jiajun Wu¹, Tianfan Xue¹, Joseph J. Lim¹, Joseph J. Lim², Yuandong Tian³, Joshua B. Tenenbaum¹, Antonio Torralba¹, William T. Freeman¹, William T. Freeman⁴ - Show less +5 more•Institutions (4)

Massachusetts Institute of Technology¹, Stanford University², Facebook³, Google⁴

08 Oct 2016

TL;DR: In this paper, the authors propose a method to understand 3D object structure from a single image by solving an optimization task given 2D keypoint positions, or training on synthetic data with ground truth 3D information.

...read moreread less

Abstract: Understanding 3D object structure from a single image is an important but difficult task in computer vision, mostly due to the lack of 3D object annotations in real images. Previous work tackles this problem by either solving an optimization task given 2D keypoint positions, or training on synthetic data with ground truth 3D information.

...read moreread less

348 citations

Posted Content•

Amortised MAP Inference for Image Super-resolution

[...]

Casper Kaae Sønderby¹, Jose Caballero², Lucas Theis³, Wenzhe Shi⁴, Ferenc Huszar⁵ - Show less +1 more•Institutions (5)

University of Copenhagen¹, University of Jaén², University of Tübingen³, Imperial College London⁴, Twitter⁵

14 Oct 2016-arXiv: Computer Vision and Pattern Recognition

Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.

...read moreread less

348 citations

Journal Article•DOI•

Automatic Crack Detection on Two-Dimensional Pavement Images: An Algorithm Based on Minimal Path Selection

[...]

Rabih Amhaz¹, Sylvie Chambon², Jérôme Idier³, Vincent Baltazart¹•Institutions (3)

IFSTTAR¹, University of Toulouse², École centrale de Nantes³

01 Oct 2016-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A new algorithm for automatic crack detection from 2D pavement images that provides very robust and precise results in a wide range of situations, in a fully unsupervised manner, which is beyond the current state of the art.

...read moreread less

Abstract: This paper proposes a new algorithm for automatic crack detection from 2D pavement images. It strongly relies on the localization of minimal paths within each image, a path being a series of neighboring pixels and its score being the sum of their intensities. The originality of the approach stems from the proposed way to select a set of minimal paths and the two postprocessing steps introduced to improve the quality of the detection. Such an approach is a natural way to take account of both the photometric and geometric characteristics of pavement images. An intensive validation is performed on both synthetic and real images (from five different acquisition systems), with comparisons to five existing methods. The proposed algorithm provides very robust and precise results in a wide range of situations, in a fully unsupervised manner, which is beyond the current state of the art.

...read moreread less

292 citations

Proceedings Article•DOI•

3D Face Reconstruction by Learning from Synthetic Data

[...]

Elad Richardson¹, Matan Sela¹, Ron Kimmel¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Oct 2016

TL;DR: In this article, a CNN-based approach is proposed for reconstructing a 3D face from a single image, which is based on a convolutional-neural-network (CNN) architecture.

...read moreread less

Abstract: Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.

...read moreread less

266 citations

Posted Content•

3D Face Reconstruction by Learning from Synthetic Data

[...]

Elad Richardson¹, Matan Sela¹, Ron Kimmel¹•Institutions (1)

Technion – Israel Institute of Technology¹

14 Sep 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image and successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.

...read moreread less

Abstract: Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.

...read moreread less

191 citations

Proceedings Article•

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild

[...]

Grégory Rogez, Cordelia Schmid

05 Dec 2016

TL;DR: This paper introduces an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data to generate a large set of photorealistic synthetic images of humans with 3D pose annotations.

...read moreread less

Abstract: This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images.

...read moreread less

164 citations

Proceedings Article•DOI•

WarpNet: Weakly Supervised Matching for Single-View Reconstruction

[...]

Angjoo Kanazawa¹, David W. Jacobs¹, Manmohan Chandraker•Institutions (1)

University of Maryland, College Park¹

01 Jun 2016

TL;DR: WarpNet as mentioned in this paper aligns an object in one image with a different object in another by using the output of the network as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation.

...read moreread less

Abstract: We present an approach to matching images of objects in fine-grained datasets without using part annotations, with an application to the challenging problem of weakly supervised single-view reconstruction. This is in contrast to prior works that require part annotations, since matching objects across class and pose variations is challenging with appearance features alone. We overcome this challenge through a novel deep learning architecture, WarpNet, that aligns an object in one image with a different object in another. We exploit the structure of the fine-grained dataset to create artificial data for training this network in an unsupervised-discriminative learning approach. The output of the network acts as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation. On the CUB-200-2011 dataset of bird categories, we improve the AP over an appearance-only network by 13.6%. We further demonstrate that our WarpNet matches, together with the structure of fine-grained datasets, allow single-view reconstructions with quality comparable to using annotated point correspondences.

...read moreread less

Posted Content•

Learning Residual Images for Face Attribute Manipulation

[...]

Wei Shen¹, Rujie Liu¹•Institutions (1)

Fujitsu¹

16 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Instead of manipulating the whole image, this work proposes to learn the corresponding residual image defined as the difference between images before and after the manipulation, so that the manipulation can be operated efficiently with modest pixel modification.

...read moreread less

Abstract: Face attributes are interesting due to their detailed description of human faces. Unlike prior researches working on attribute prediction, we address an inverse and more challenging problem called face attribute manipulation which aims at modifying a face image according to a given attribute value. Instead of manipulating the whole image, we propose to learn the corresponding residual image defined as the difference between images before and after the manipulation. In this way, the manipulation can be operated efficiently with modest pixel modification. The framework of our approach is based on the Generative Adversarial Network. It consists of two image transformation networks and a discriminative network. The transformation networks are responsible for the attribute manipulation and its dual operation and the discriminative network is used to distinguish the generated images from real images. We also apply dual learning to allow transformation networks to learn from each other. Experiments show that residual images can be effectively learned and used for attribute manipulations. The generated images remain most of the details in attribute-irrelevant areas.

...read moreread less

Proceedings Article•

Amortised MAP Inference for Image Super-resolution

[...]

Casper Kaae Sønderby¹, Jose Caballero², Lucas Theis³, Wenzhe Shi⁴, Ferenc Huszar⁵ - Show less +1 more•Institutions (5)

University of Copenhagen¹, University of Jaén², University of Tübingen³, Imperial College London⁴, Twitter⁵

14 Oct 2016

TL;DR: A novel neural network architecture is introduced that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input, and it is shown that the GAN based approach performs best on real image data.

...read moreread less

Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.

...read moreread less

Proceedings Article•DOI•

SketchNet: Sketch Classification with Web Images

[...]

Hua Zhang¹, Si Liu¹, Changqing Zhang², Wenqi Ren², Rui Wang², Xiaochun Cao¹ - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, Tianjin University²

01 Jun 2016

TL;DR: This study presents a weakly supervised approach that discovers the discriminative structures of sketch images, given pairs of Sketch images and web images, using a deep convolutional neural network, named SketchNet.

...read moreread less

Abstract: In this study, we present a weakly supervised approach that discovers the discriminative structures of sketch images, given pairs of sketch images and web images. In contrast to traditional approaches that use global appearance features or relay on keypoint features, our aim is to automatically learn the shared latent structures that exist between sketch images and real images, even when there are significant appearance differences across its relevant real images. To accomplish this, we propose a deep convolutional neural network, named SketchNet. We firstly develop a triplet composed of sketch, positive and negative real image as the input of our neural network. To discover the coherent visual structures between the sketch and its positive pairs, we introduce the softmax as the loss function. Then a ranking mechanism is introduced to make the positive pairs obtain a higher score comparing over negative ones to achieve robust representation. Finally, we formalize above-mentioned constrains into the unified objective function, and create an ensemble feature representation to describe the sketch images. Experiments on the TUBerlin sketch benchmark demonstrate the effectiveness of our model and show that deep feature representation brings substantial improvements over other state-of-the-art methods on sketch classification.

...read moreread less

Proceedings Article•DOI•

Deep Reflectance Maps

[...]

Konstantinos Rematas¹, Tobias Ritschel², Mario Fritz³, Efstratios Gavves⁴, Tinne Tuytelaars¹ - Show less +1 more•Institutions (4)

Katholieke Universiteit Leuven¹, Saarland University², Max Planck Society³, University of Amsterdam⁴

01 Jan 2016

TL;DR: In this article, a convolutional neural architecture is proposed to estimate reflectance maps of specular materials in natural lighting conditions, which can directly predict a reflectance map from the image itself.

...read moreread less

Abstract: Undoing the image formation process and therefore decomposing appearance into its intrinsic properties is a challenging task due to the under-constrained nature of this inverse problem. While significant progress has been made on inferring shape, materials and illumination from images only, progress in an unconstrained setting is still limited. We propose a convolutional neural architecture to estimate reflectance maps of specular materials in natural lighting conditions. We achieve this in an end-to-end learning formulation that directly predicts a reflectance map from the image itself. We show how to improve estimates by facilitating additional supervision in an indirect scheme that first predicts surface orientation and afterwards predicts the reflectance map by a learning-based sparse data interpolation. In order to analyze performance on this difficult task, we propose a new challenge of Specular MAterials on SHapes with complex IllumiNation (SMASHINg) using both synthetic and real images. Furthermore, we show the application of our method to a range of image editing tasks on real images.

...read moreread less

Journal Article•DOI•

Spectral–Spatial Hyperspectral Image Classification Based on KNN

[...]

Kunshan Huang¹, Shutao Li¹, Xudong Kang¹, Leyuan Fang¹•Institutions (1)

Hunan University¹

01 Dec 2016

TL;DR: In this article, a spectral-spatial hyperspectral image classification method based on K nearest neighbor (KNN) is proposed, which consists of the following steps: first, the support vector machine is adopted to obtain the initial classification probability maps which reflect the probability that each pixel belongs to different classes, then, the obtained pixel-wise probability maps are refined with the proposed KNN filtering algorithm that is based on matching and averaging nonlocal neighborhoods.

...read moreread less

Abstract: Fusion of spectral and spatial information is an effective way in improving the accuracy of hyperspectral image classification. In this paper, a novel spectral–spatial hyperspectral image classification method based on K nearest neighbor (KNN) is proposed, which consists of the following steps. First, the support vector machine is adopted to obtain the initial classification probability maps which reflect the probability that each hyperspectral pixel belongs to different classes. Then, the obtained pixel-wise probability maps are refined with the proposed KNN filtering algorithm that is based on matching and averaging nonlocal neighborhoods. The proposed method does not need sophisticated segmentation and optimization strategies while still being able to make full use of the nonlocal principle of real images by using KNN, and thus, providing competitive classification with fast computation. Experiments performed on two real hyperspectral data sets show that the classification results obtained by the proposed method are comparable to several recently proposed hyperspectral image classification methods.

...read moreread less

Journal Article•DOI•

Space Warps – I. Crowdsourcing the discovery of gravitational lenses

[...]

Philip J. Marshall¹, Philip J. Marshall², Aprajita Verma¹, Anupreeta More³, Christopher P. Davis², Surhud More³, Amit Kapadia⁴, Michael Parrish⁴, Chris Snyder⁴, Julianne K. Wilcox¹, Elisabeth Baeten¹, Christine Macmillan¹, Claude Cornen¹, M. Baumer², Edwin Simpson¹, Chris Lintott¹, David Miller⁴, Edward Paget⁴, R. Simpson⁴, Arfon M. Smith⁴, Rafael Küng⁵, Prasenjit Saha⁵, Thomas E. Collett⁶ - Show less +19 more•Institutions (6)

University of Oxford¹, Stanford University², University of Tokyo³, Adler Planetarium⁴, University of Zurich⁵, Institute of Cosmology and Gravitation, University of Portsmouth⁶

11 Jan 2016-Monthly Notices of the Royal Astronomical Society

TL;DR: Comment on the scalability of the SpaceWarps system to the wide field survey era, based on the projection that searches of 105 images could be performed by a crowd of 105 volunteers in 6 days.

...read moreread less

Abstract: We describe SpaceWarps, a novel gravitational lens discovery service that yields samples of high purity and completeness through crowd-sourced visual inspection. Carefully produced colour composite images are displayed to volunteers via a webbased classification interface, which records their estimates of the positions of candidate lensed features. Images of simulated lenses, as well as real images which lack lenses, are inserted into the image stream at random intervals; this training set is used to give the volunteers instantaneous feedback on their performance, as well as to calibrate a model of the system that provides dynamical updates to the probability that a classified image contains a lens. Low probability systems are retired from the site periodically, concentrating the sample towards a set of lens candidates. Having divided 160 square degrees of Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) imaging into some 430,000 overlapping 82 by 82 arcsecond tiles and displaying them on the site, we were joined by around 37,000 volunteers who contributed 11 million image classifications over the course of 8 months. This Stage 1 search reduced the sample to 3381 images containing candidates; these were then refined in Stage 2 to yield a sample that we expect to be over 90% complete and 30% pure, based on our analysis of the volunteers performance on training images. We comment on the scalability of the SpaceWarps system to the wide field survey era, based on our projection that searches of 105 images could be performed by a crowd of 105 volunteers in 6 days.

...read moreread less

Proceedings Article•DOI•

From Noise Modeling to Blind Image Denoising

[...]

Fengyuan Zhu¹, Guangyong Chen¹, Pheng-Ann Heng¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Jun 2016

TL;DR: A novel blind image denoising algorithm which can cope with real-world noisy images even when the noise model is not provided is proposed, realized by modeling image noise with mixture of Gaussian distribution (MoG) which can approximate large varieties of continuous distributions.

...read moreread less

Abstract: Traditional image denoising algorithms always assume the noise to be homogeneous white Gaussian distributed. However, the noise on real images can be much more complex empirically. This paper addresses this problem and proposes a novel blind image denoising algorithm which can cope with real-world noisy images even when the noise model is not provided. It is realized by modeling image noise with mixture of Gaussian distribution (MoG) which can approximate large varieties of continuous distributions. As the number of components for MoG is unknown practically, this work adopts Bayesian nonparametric technique and proposes a novel Low-rank MoG filter (LR-MoG) to recover clean signals (patches) from noisy ones contaminated by MoG noise. Based on LR-MoG, a novel blind image denoising approach is developed. To test the proposed method, this study conducts extensive experiments on synthesis and real images. Our method achieves the state-of the-art performance consistently.

...read moreread less

Posted Content•

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild

[...]

Grégory Rogez, Cordelia Schmid

07 Jul 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, an image-based synthesis engine was proposed to generate a large set of photorealistic synthetic images of humans with 3D pose annotations using 3D Motion Capture (MoCap) data.

...read moreread less

Abstract: This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images.

...read moreread less

Proceedings Article•DOI•

Unsupervised convolutional neural networks for motion estimation

[...]

Aria Ahmadi¹, Ioannis Patras¹•Institutions (1)

Queen Mary University of London¹

25 Sep 2016

TL;DR: In this article, the authors propose a cost function that is optimized during training, based on the classical optical flow constraint, which is differentiable with respect to the motion field and allows backpropagation of the error to previous layers.

...read moreread less

Abstract: Traditional methods for motion estimation estimate the motion field F between a pair of images as the one that minimizes a predesigned cost function. In this paper, we propose a direct method and train a Convolutional Neural Network (CNN) that when, at test time, is given a pair of images as input it produces a dense motion field F at its output layer. In the absence of large datasets with ground truth motion that would allow classical supervised training, we propose to train the network in an unsupervised manner. The proposed cost function that is optimized during training, is based on the classical optical flow constraint. The latter is differentiable with respect to the motion field and, therefore, allows backpropagation of the error to previous layers of the network. Our method is tested on both synthetic and real image sequences and performs similarly to the state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Local statistics and non-local mean filter for speckle noise reduction in medical ultrasound image

[...]

Jian Yang¹, Jingfan Fan¹, Danni Ai¹, Xuehu Wang¹, Yongchang Zheng², Songyuan Tang¹, Yongtian Wang¹ - Show less +3 more•Institutions (2)

Beijing Institute of Technology¹, Peking Union Medical College Hospital²

26 Jun 2016-Neurocomputing

TL;DR: This study combines local statistics with the NLM filter to reduce speckle in ultrasound images and demonstrates that the proposed method outperforms the original NLM, as well as many previously developed methods.

...read moreread less

Proceedings Article•DOI•

Blind Image Deconvolution by Automatic Gradient Activation

[...]

Dong Gong, Mingkui Tan¹, Yanning Zhang, Anton van den Hengel¹, Qinfeng Shi - Show less +1 more•Institutions (1)

University of Adelaide¹

01 Jun 2016

TL;DR: A gradient activation method is introduced to automatically select a subset of gradients of the latent image in a cutting-plane-based optimization scheme for kernel estimation, which greatly improves the accuracy and flexibility and affords great convenience for handling noise and outliers.

...read moreread less

Abstract: Blind image deconvolution is an ill-posed inverse problem which is often addressed through the application of appropriate prior. Although some priors are informative in general, many images do not strictly conform to this, leading to degraded performance in the kernel estimation. More critically, real images may be contaminated by nonuniform noise such as saturation and outliers. Methods for removing specific image areas based on some priors have been proposed, but they operate either manually or by defining fixed criteria. We show here that a subset of the image gradients are adequate to estimate the blur kernel robustly, no matter the gradient image is sparse or not. We thus introduce a gradient activation method to automatically select a subset of gradients of the latent image in a cutting-plane-based optimization scheme for kernel estimation. No extra assumption is used in our model, which greatly improves the accuracy and flexibility. More importantly, the proposed method affords great convenience for handling noise and outliers. Experiments on both synthetic data and real-world images demonstrate the effectiveness and robustness of the proposed method in comparison with the state-of-the-art methods.

...read moreread less

Proceedings Article•

Auxiliary Image Regularization for Deep CNNs with Noisy Labels

[...]

Samaneh Azadi¹, Jiashi Feng¹, Jiashi Feng², Stefanie Jegelka³, Trevor Darrell¹ - Show less +1 more•Institutions (3)

University of California, Berkeley¹, National University of Singapore², Massachusetts Institute of Technology³

01 Jan 2016

TL;DR: An auxiliary image regularization technique is proposed, optimized by the stochastic Alternating Direction Method of Multipliers (ADMM) algorithm, that automatically exploits the mutual context information among training images and encourages the model to select reliable images to robustify the learning process.

...read moreread less

Abstract: Precisely-labeled data sets with sufficient amount of samples are notably important for training deep convolutional neural networks (CNNs). However, many of the available real-world data sets contain erroneously labeled samples and the error in labels of training sample makes it a daunting task to learn a well-performing deep CNN model. In this work, we consider the problem of training a deep CNN model for image classification with mislabeled training samples - an issue that is common in real image data sets with tags supplied by amateur users. To solve this problem, we propose an auxiliary image regularization technique, which automatically exploits the mutual context information among training images and encourages the model to select reliable images to robustify the learning process. Comprehensive experiments on benchmark data sets clearly demonstrate our proposed regularized CNN model is resistant to label noise in training data.

...read moreread less

Proceedings Article•DOI•

Exploiting Spectral-Spatial Correlation for Coded Hyperspectral Image Restoration

[...]

Ying Fu¹, Yinqiang Zheng², Imari Sato², Yoichi Sato¹•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

01 Jun 2016

TL;DR: This paper simultaneously explore spectral and spatial correlation via low-rank regularizations, and formulate the restoration problem into a variational optimization model, which can be solved via an iterative numerical algorithm.

...read moreread less

Abstract: Conventional scanning and multiplexing techniques for hyperspectral imaging suffer from limited temporal and/or spatial resolution. To resolve this issue, coding techniques are becoming increasingly popular in developing snapshot systems for high-resolution hyperspectral imaging. For such systems, it is a critical task to accurately restore the 3D hyperspectral image from its corresponding coded 2D image. In this paper, we propose an effective method for coded hyperspectral image restoration, which exploits extensive structure sparsity in the hyperspectral image. Specifically, we simultaneously explore spectral and spatial correlation via low-rank regularizations, and formulate the restoration problem into a variational optimization model, which can be solved via an iterative numerical algorithm. Experimental results using both synthetic data and real images show that the proposed method can significantly outperform the state-of-the-art methods on several popular coding-based hyperspectral imaging systems.

...read moreread less

Journal Article•DOI•

A robust 2D Otsu’s thresholding method in image segmentation

[...]

Chunshi Sha¹, Jian Hou¹, Hongxia Cui¹•Institutions (1)

Bohai University¹

01 Nov 2016-Journal of Visual Communication and Image Representation

TL;DR: A robust 2D Otsu’s thresholding method that improves the robustness to Salt&Pepper noise and Gaussian noise significantly and introduces a region post-processing step to deal with the pixels of noise and edges.

...read moreread less

Book Chapter•DOI•

Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences

[...]

Anil Bas¹, William A. P. Smith¹, Timo Bolkart², Stefanie Wuhrer³•Institutions (3)

University of York¹, Saarland University², French Institute for Research in Computer Science and Automation³

20 Nov 2016

TL;DR: In this paper, the authors propose a non-convex objective function for fitting a 3D morphable model to single face images using only sparse geometric features (edges and landmark points), which can be viewed as forming soft correspondences between model and image edges.

...read moreread less

Abstract: In this paper we explore the problem of fitting a 3D morphable model to single face images using only sparse geometric features (edges and landmark points). Previous approaches to this problem are based on nonlinear optimisation of an edge-derived cost that can be viewed as forming soft correspondences between model and image edges. We propose a novel approach, that explicitly computes hard correspondences. The resulting objective function is non-convex but we show that a good initialisation can be obtained efficiently using alternating linear least squares in a manner similar to the iterated closest point algorithm. We present experimental results on both synthetic and real images and show that our approach outperforms methods that use soft correspondence and other recent methods that rely solely on geometric features.

...read moreread less

Posted Content•

Dense Supervision for Visual Comparisons via Synthetic Images.

[...]

Aron Yu, Kristen Grauman

19 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The authors propose to overcome the sparsity of supervision problem via synthetically generated images, and then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images.

...read moreread less

Abstract: Distinguishing subtle differences in attributes is valuable, yet learning to make visual comparisons remains non-trivial. Not only is the number of possible comparisons quadratic in the number of training images, but also access to images adequately spanning the space of fine-grained visual differences is limited. We propose to overcome the sparsity of supervision problem via synthetically generated images. Building on a state-of-the-art image generation engine, we sample pairs of training images exhibiting slight modifications of individual attributes. Augmenting real training image pairs with these examples, we then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images. Our results on datasets of faces and fashion images show the great promise of bootstrapping imperfect image generators to counteract sample sparsity for learning to rank.

...read moreread less

Journal Article•DOI•

SAR Image Change Detection Based on Multiple Kernel K-Means Clustering With Local-Neighborhood Information

[...]

Lu Jia¹, Ming Li¹, Peng Zhang¹, Yan Wu¹, Huahui Zhu¹ - Show less +1 more•Institutions (1)

Xidian University¹

25 Apr 2016-IEEE Geoscience and Remote Sensing Letters

TL;DR: Experimental results on real images demonstrate the effectiveness, especially the strong noise immunity, of the LIMKKM method and illustrate that it is suitable for SAR image change detection.

...read moreread less

Abstract: Performance of the k-means clustering algorithm for synthetic aperture radar (SAR) image change detection is usually worsened by the inherent existence of the speckle noise. Therefore, in this letter, an unsupervised multiple kernel k-means clustering algorithm with local-neighborhood information ( LIMKKM algorithm) is proposed for SAR image change detection. The LIMKKM algorithm contributes in two aspects. First, it fuses various features through a weighted summation kernel by automatically and optimally computing the kernel weights. Here, the intensity and texture features of the ratio image are fused. Second, it incorporates the local-neighborhood information into its clustering objective function for providing strong noise immunity. The LIMKKM change detection algorithm is carried out in a train–test way to lighten the computational burden. Experimental results on real images demonstrate the effectiveness, especially the strong noise immunity, of the LIMKKM method and illustrate that it is suitable for SAR image change detection.

...read moreread less

Proceedings Article•DOI•

CNN for license plate motion deblurring

[...]

Pavel Svoboda¹, Michal Hradis¹, Lukas Marsik¹, Pavel Zemcik¹•Institutions (1)

Brno University of Technology¹

25 Feb 2016

TL;DR: This work explores the previously proposed approach of direct blind deconvolution and denoising with convolutional neural networks (CNN) in a situation where the blur kernels are partially constrained and evaluates the behavior and limits of the CNNs with respect to blur direction range and length.

...read moreread less

Abstract: In this work we explore the previously proposed approach of direct blind deconvolution and denoising with convolutional neural networks (CNN) in a situation where the blur kernels are partially constrained. We focus on blurred images from a real-life traffic surveillance system, on which we, for the first time, demonstrate that neural networks trained on artificial data provide superior reconstruction quality on real images compared to traditional blind deconvolution methods. The training data is easy to obtain by blurring sharp photos from a target system with a very rough approximation of the expected blur kernels, thereby allowing custom CNNs to be trained for a specific application (image content and blur range). Additionally, we evaluate the behavior and limits of the CNNs with respect to blur direction range and length.

...read moreread less

Journal Article•DOI•

Non-Parametric Blur Map Regression for Depth of Field Extension

[...]

Laurent D'Andres¹, Jordi Salvador¹, Axel Kochale¹, Sabine Süsstrunk²•Institutions (2)

Harvard University¹, École Polytechnique Fédérale de Lausanne²

01 Apr 2016-IEEE Transactions on Image Processing

TL;DR: This paper presents a blind deblurring pipeline able to restore real camera systems by slightly extending their DOF and recovering sharpness in regions slightly out of focus by relying first on the estimation of the spatially varying defocus blur.

...read moreread less

Abstract: Real camera systems have a limited depth of field (DOF) which may cause an image to be degraded due to visible misfocus or too shallow DOF. In this paper, we present a blind deblurring pipeline able to restore such images by slightly extending their DOF and recovering sharpness in regions slightly out of focus. To address this severely ill-posed problem, our algorithm relies first on the estimation of the spatially varying defocus blur. Drawing on local frequency image features, a machine learning approach based on the recently introduced regression tree fields is used to train a model able to regress a coherent defocus blur map of the image, labeling each pixel by the scale of a defocus point spread function. A non-blind spatially varying deblurring algorithm is then used to properly extend the DOF of the image. The good performance of our algorithm is assessed both quantitatively, using realistic ground truth data obtained with a novel approach based on a plenoptic camera, and qualitatively with real images.

...read moreread less

Showing papers on "Real image published in 2016"