scispace - formally typeset
Search or ask a question

Showing papers on "Real image published in 2017"


Proceedings ArticleDOI
20 Mar 2017
TL;DR: This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator, and achieves the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for the purpose of robotic control.
Abstract: Bridging the ‘reality gap’ that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to 1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.

2,079 citations


Journal ArticleDOI
20 Jul 2017
TL;DR: This work presents a novel approach for image completion that results in images that are both locally and globally consistent, with a fully-convolutional neural network that can complete images of arbitrary resolutions by filling-in missing regions of any shape.
Abstract: We present a novel approach for image completion that results in images that are both locally and globally consistent. With a fully-convolutional neural network, we can complete images of arbitrary resolutions by filling-in missing regions of any shape. To train this image completion network to be consistent, we use global and local context discriminators that are trained to distinguish real images from completed ones. The global discriminator looks at the entire image to assess if it is coherent as a whole, while the local discriminator looks only at a small area centered at the completed region to ensure the local consistency of the generated patches. The image completion network is then trained to fool the both context discriminator networks, which requires it to generate images that are indistinguishable from real ones with regard to overall consistency as well as in details. We show that our approach can be used to complete a wide variety of scenes. Furthermore, in contrast with the patch-based approaches such as PatchMatch, our approach can generate fragments that do not appear elsewhere in the image, which allows us to naturally complete the images of objects with familiar and highly specific structures, such as faces.

1,961 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: SimGAN as mentioned in this paper uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors, and achieves state-of-the-art results on the MPIIGaze dataset without any labeled real data.
Abstract: With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulators output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a self-regularization term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.

1,724 citations


Proceedings ArticleDOI
Konstantinos Bousmalis1, Nathan Silberman1, David Dohan1, Dumitru Erhan1, Dilip Krishnan1 
01 Jul 2017
TL;DR: In this paper, a generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain by learning in an unsupervised manner a transformation in the pixel space from one domain to another.
Abstract: Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that have tried to either map representations between the two domains, or learn to extract features that are domain-invariant. In this work, we approach the problem in a new light by learning in an unsupervised manner a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training.

1,549 citations


Posted Content
TL;DR: In this article, the authors use domain randomization to train a real-world object detector that is accurate to $1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures.
Abstract: Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to $1.5$cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.

966 citations


Book ChapterDOI
19 Sep 2017
TL;DR: A novel approach combining multi-scale and irregular isothetic representations of the input contour, as an extension of a previous work, improves the representation of the contour by 1-D intervals, and achieves afterwards the decomposition of thecontour into maximal arcs or segments.
Abstract: The reconstruction of noisy digital shapes is a complex question and a lot of contributions have been proposed to address this problem , including blurred segment decomposition or adaptive tangential covering for instance In this article, we propose a novel approach combining multi-scale and irregular isothetic representations of the input contour, as an extension of a previous work [Vacavant et al, A Combined Multi-Scale/Irregular Algorithm for the Vectorization of Noisy Digital Contours , CVIU 2013] Our new algorithm improves the representation of the contour by 1-D intervals, and achieves afterwards the decomposition of the contour into maximal arcs or segments Our experiments with synthetic and real images show that our contribution can be employed as a relevant option for noisy shape reconstruction

646 citations


Posted Content
TL;DR: The 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains, is presented and a baseline performance analysis using various domain adaptation models that are currently popular in the field is provided.
Abstract: We present the 2017 Visual Domain Adaptation (VisDA) dataset and challenge, a large-scale testbed for unsupervised domain adaptation across visual domains. Unsupervised domain adaptation aims to solve the real-world problem of domain shift, where machine learning models trained on one domain must be transferred and adapted to a novel visual domain without additional supervision. The VisDA2017 challenge is focused on the simulation-to-reality shift and has two associated tasks: image classification and image segmentation. The goal in both tracks is to first train a model on simulated, synthetic data in the source domain and then adapt it to perform well on real image data in the unlabeled test domain. Our dataset is the largest one to date for cross-domain object classification, with over 280K images across 12 categories in the combined training, validation and testing domains. The image segmentation dataset is also large-scale with over 30K images across 18 categories in the three domains. We compare VisDA to existing cross-domain adaptation datasets and provide a baseline performance analysis using various domain adaptation models that are currently popular in the field.

504 citations


Proceedings ArticleDOI
12 Jul 2017
TL;DR: In this article, a collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision free flight.
Abstract: Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: this https URL

481 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: Zhang et al. as mentioned in this paper proposed a deep convolutional encoder-decoder network that takes an image and the corresponding trimap as inputs and predicts the alpha matte of the image.
Abstract: Image matting is a fundamental computer vision problem and has many applications. Previous algorithms have poor performance when an image has similar foreground and background colors or complicated textures. The main reasons are prior methods 1) only use low-level features and 2) lack high-level context. In this paper, we propose a novel deep learning based algorithm that can tackle both these problems. Our deep model has two parts. The first part is a deep convolutional encoder-decoder network that takes an image and the corresponding trimap as inputs and predict the alpha matte of the image. The second part is a small convolutional network that refines the alpha matte predictions of the first network to have more accurate alpha values and sharper edges. In addition, we also create a large-scale image matting dataset including 49300 training images and 1000 testing images. We evaluate our algorithm on the image matting benchmark, our testing set, and a wide variety of real images. Experimental results clearly demonstrate the superiority of our algorithm over previous methods.

353 citations


Proceedings ArticleDOI
29 Mar 2017
TL;DR: In this article, a variational generative adversarial network (GAN) is proposed to generate images in a specific category with randomly drawn values on a latent attribute vector, which can be applied to other tasks, such as image inpainting, super-resolution, and data augmentation for training better face recognition models.
Abstract: We present variational generative adversarial networks, a general learning framework that combines a variational auto-encoder with a generative adversarial network, for synthesizing images in fine-grained categories, such as faces of a specific person or objects in a category. Our approach models an image as a composition of label and latent attributes in a probabilistic model. By varying the fine-grained category label fed into the resulting generative model, we can generate images in a specific category with randomly drawn values on a latent attribute vector. Our approach has two novel aspects. First, we adopt a cross entropy loss for the discriminative and classifier network, but a mean discrepancy objective for the generative network. This kind of asymmetric loss function makes the GAN training more stable. Second, we adopt an encoder network to learn the relationship between the latent space and the real image space, and use pairwise feature matching to keep the structure of generated images. We experiment with natural images of faces, flowers, and birds, and demonstrate that the proposed models are capable of generating realistic and diverse samples with fine-grained category labels. We further show that our models can be applied to other tasks, such as image inpainting, super-resolution, and data augmentation for training better face recognition models.

345 citations


Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed a novel single image de-raining method called Image De-raining Conditional General Adversarial Network (ID-CGAN), which considers quantitative, visual and also discriminative performance into the objective function.
Abstract: Severe weather conditions such as rain and snow adversely affect the visual quality of images captured under such conditions thus rendering them useless for further usage and sharing. In addition, such degraded images drastically affect performance of vision systems. Hence, it is important to solve the problem of single image de-raining/de-snowing. However, this is a difficult problem to solve due to its inherent ill-posed nature. Existing approaches attempt to introduce prior information to convert it into a well-posed problem. In this paper, we investigate a new point of view in addressing the single image de-raining problem. Instead of focusing only on deciding what is a good prior or a good framework to achieve good quantitative and qualitative performance, we also ensure that the de-rained image itself does not degrade the performance of a given computer vision algorithm such as detection and classification. In other words, the de-rained result should be indistinguishable from its corresponding clear image to a given discriminator. This criterion can be directly incorporated into the optimization framework by using the recently introduced conditional generative adversarial networks (GANs). To minimize artifacts introduced by GANs and ensure better visual quality, a new refined loss function is introduced. Based on this, we propose a novel single image de-raining method called Image De-raining Conditional General Adversarial Network (ID-CGAN), which considers quantitative, visual and also discriminative performance into the objective function. Experiments evaluated on synthetic images and real images show that the proposed method outperforms many recent state-of-the-art single image de-raining methods in terms of quantitative and visual performance.

Proceedings ArticleDOI
TL;DR: It is proved mathematically that distortion and perceptual quality are at odds with each other, and generative-adversarial-nets (GANs) provide a principled way to approach the perception-distortion bound.
Abstract: Image restoration algorithms are typically evaluated by some distortion measure (e.g. PSNR, SSIM, IFC, VIF) or by human opinion scores that quantify perceived perceptual quality. In this paper, we prove mathematically that distortion and perceptual quality are at odds with each other. Specifically, we study the optimal probability for correctly discriminating the outputs of an image restoration algorithm from real images. We show that as the mean distortion decreases, this probability must increase (indicating worse perceptual quality). As opposed to the common belief, this result holds true for any distortion measure, and is not only a problem of the PSNR or SSIM criteria. We also show that generative-adversarial-nets (GANs) provide a principled way to approach the perception-distortion bound. This constitutes theoretical support to their observed success in low-level vision tasks. Based on our analysis, we propose a new methodology for evaluating image restoration methods, and use it to perform an extensive comparison between recent super-resolution algorithms.

Proceedings ArticleDOI
Wei Shen1, Rujie Liu1
01 Jul 2017
TL;DR: In this article, the authors proposed to learn the corresponding residual image defined as the difference between images before and after the manipulation, which can be operated efficiently with modest pixel modification, and applied dual learning to allow transformation networks to learn from each other.
Abstract: Face attributes are interesting due to their detailed description of human faces. Unlike prior researches working on attribute prediction, we address an inverse and more challenging problem called face attribute manipulation which aims at modifying a face image according to a given attribute value. Instead of manipulating the whole image, we propose to learn the corresponding residual image defined as the difference between images before and after the manipulation. In this way, the manipulation can be operated efficiently with modest pixel modification. The framework of our approach is based on the Generative Adversarial Network. It consists of two image transformation networks and a discriminative network. The transformation networks are responsible for the attribute manipulation and its dual operation and the discriminative network is used to distinguish the generated images from real images. We also apply dual learning to allow transformation networks to learn from each other. Experiments show that residual images can be effectively learned and used for attribute manipulations. The generated images remain most of the details in attribute-irrelevant areas.

Proceedings Article
01 Jan 2017
TL;DR: This work proposes MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape and derives differentiable projective functions from 3D shape to 2.
Abstract: 3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenge for learning-based approaches, as 3D object annotations in real images are scarce. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from the domain adaptation issue when tested on real data. In this work, we propose an end-to-end trainable framework, sequentially estimating 2.5D sketches and 3D object shapes. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image, and to transfer from synthetic to real data. Second, for 3D reconstruction from the 2.5D sketches, we can easily transfer the learned model on synthetic data to real images, as rendered 2.5D sketches are invariant to object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches, making the framework end-to-end trainable on real images, requiring no real-image annotations. Our framework achieves state-of-the-art performance on 3D shape reconstruction.

07 Jul 2017
TL;DR: In this paper, a CNN is trained to map observed images to velocities, using domain randomisation to enable generalisation to real world images without having seen a single real image.
Abstract: End-to-end control for robot manipulation and grasping is emerging as an attractive alternative to traditional pipelined approaches. However, end-to-end methods tend to either be slow to train, exhibit little or no generalisability, or lack the ability to accomplish long-horizon or multi-stage tasks. In this paper, we show how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task, which is analogous to a simple tidying routine, without having seen a single real image. This involves locating, reaching for, and grasping a cube, then locating a basket and dropping the cube inside. To achieve this, robot trajectories are computed in a simulator, to collect a series of control velocities which accomplish the task. Then, a CNN is trained to map observed images to velocities, using domain randomisation to enable generalisation to real world images. Results show that we are able to successfully accomplish the task in the real world with the ability to generalise to novel environments, including those with dynamic lighting conditions, distractor objects, and moving objects, including the basket itself. We believe our approach to be simple, highly scalable, and capable of learning long-horizon tasks that have until now not been shown with the state-of-the-art in end-to-end robot control.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: In this paper, the authors focus on the non-Lambertian object-level intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object.
Abstract: We focus on the non-Lambertian object-level intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object. Based on existing 3D models in the ShapeNet database, a large-scale object intrinsics database is rendered with HDR environment maps. Millions of synthetic images of objects and their corresponding albedo, shading, and specular ground-truth images are used to train an encoder-decoder CNN, which can decompose an image into the product of albedo and shading components along with an additive specular component. Our CNN delivers accurate and sharp results in this classical inverse problem of computer vision. Evaluated on our realistically synthetic dataset, our method consistently outperforms the state-of-the-art by a large margin. We train and test our CNN across different object categories. Perhaps surprising especially from the CNN classification perspective, our intrinsics CNN generalizes very well across categories. Our analysis shows that feature learning at the encoder stage is more crucial for developing a universal representation across categories. We apply our model to real images and videos from Internet, and observe robust and realistic intrinsics results. Quality non-Lambertian intrinsics could open up many interesting applications such as realistic product search based on material properties and image-based albedo/specular editing.

Posted Content
TL;DR: This paper is the first to examine texture control in deep image synthesis guided by sketch, color, and texture and develops a local texture loss in addition to adversarial and content loss to train the generative network.
Abstract: In this paper, we investigate deep image synthesis guided by sketch, color, and texture. Previous image synthesis methods can be controlled by sketch and color strokes but we are the first to examine texture control. We allow a user to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture. Our generative network learns to synthesize objects consistent with these texture suggestions. To achieve this, we develop a local texture loss in addition to adversarial and content loss to train the generative network. We conduct experiments using sketches generated from real images and textures sampled from a separate texture database and results show that our proposed algorithm is able to generate plausible images that are faithful to user controls. Ablation studies show that our proposed pipeline can generate more realistic images than adapting existing methods directly.

Posted Content
TL;DR: This paper shows how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task, which is analogous to a simple tidying routine, without having seen a single real image.
Abstract: End-to-end control for robot manipulation and grasping is emerging as an attractive alternative to traditional pipelined approaches. However, end-to-end methods tend to either be slow to train, exhibit little or no generalisability, or lack the ability to accomplish long-horizon or multi-stage tasks. In this paper, we show how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task, which is analogous to a simple tidying routine, without having seen a single real image. This involves locating, reaching for, and grasping a cube, then locating a basket and dropping the cube inside. To achieve this, robot trajectories are computed in a simulator, to collect a series of control velocities which accomplish the task. Then, a CNN is trained to map observed images to velocities, using domain randomisation to enable generalisation to real world images. Results show that we are able to successfully accomplish the task in the real world with the ability to generalise to novel environments, including those with dynamic lighting conditions, distractor objects, and moving objects, including the basket itself. We believe our approach to be simple, highly scalable, and capable of learning long-horizon tasks that have until now not been shown with the state-of-the-art in end-to-end robot control.

Journal ArticleDOI
TL;DR: An MRI image synthesis algorithm capable of synthesizing full‐head T2w images and FLAIR images and learns the nonlinear intensity mappings for synthesis using innovative features and a multi‐resolution design is described.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: A novel encoder-decoder architecture for 3D body shape and pose prediction using SMPL (a statistical body shape model) parameters as an input and corresponding training procedure as well as quantitative and qualitative analysis of the proposed method on artificial and real image datasets.
Abstract: © 2017. The copyright of this document resides with its authors. In this paper we present a novel method for 3D human body shape and pose prediction. Our work is motivated by the need to reduce our reliance on costly-to-obtain ground truth labels. To achieve this, we propose training an encoder-decoder network using a two step procedure as follows. During the first step, a decoder is trained to predict a body silhouette using SMPL [2] (a statistical body shape model) parameters as an input. During the second step, the whole network is trained on real image and corresponding silhouette pairs while the decoder is kept fixed. Such a procedure allows for an indirect learning of body shape and pose parameters from real images without requiring any ground truth parameter data. Our key contributions include: (a) a novel encoder-decoder architecture for 3D body shape and pose prediction, (b) corresponding training procedure as well as (c) quantitative and qualitative analysis of the proposed method on artificial and real image datasets.

Proceedings ArticleDOI
24 Mar 2017
TL;DR: A novel Convolutional Coding-based Rain Removal (CCRR) algorithm for automatically removing rain streaks from a single rainy image is proposed, which develops a new method for learning a set of convolutional low-rank filters for efficiently representing background clear image and rain streaks.
Abstract: We propose a novel Convolutional Coding-based Rain Removal (CCRR) algorithm for automatically removing rain streaks from a single rainy image. Our method first learns a set of generic sparsity-based and low-rank representation-based convolutional filters for efficiently representing background clear image and rain streaks, respectively. To this end, we first develop a new method for learning a set of convolutional low-rank filters. Then, using these learned filter, we propose an optimization problem to decompose a rainy image into a clear background image and a rain streak image. By working directly on the whole image, the proposed rain streak removal algorithm does not need to divide the image into overlapping patches for leaning local dictionaries. Extensive experiments on synthetic and real images show that the proposed method performs favorably compared to state-of-the-art rain streak removal algorithms.

Proceedings ArticleDOI
12 May 2017
TL;DR: A new method for calculating the air-light color, the color of an area of the image with no objects in line-of-sight, based on the haze-lines prior that was recently introduced, which performs on-par with current state of theart techniques and is more computationally efficient.
Abstract: Outdoor images taken in bad weather conditions, such as haze and fog, look faded and have reduced contrast. Recently there has been great success in single image dehazing, i.e., improving the visibility and restoring the colors from a single image. A crucial step in these methods is the calculation of the air-light color, the color of an area of the image with no objects in line-of-sight. We propose a new method for calculating the air-light. The method relies on the haze-lines prior that was recently introduced. This prior is based on the observation that the pixel values of a hazy image can be modeled as lines in RGB space that intersect at the air-light. We use Hough transform in RGB space to vote for the location of the air-light. We evaluate the proposed method on an existing dataset of real world images, as well as some synthetic and other real images. Our method performs on-par with current state-of-the-art techniques and is more computationally efficient.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: The authors propose to overcome the sparsity of supervision problem via synthetically generated images by augmenting real training image pairs with these examples, then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images.
Abstract: Distinguishing subtle differences in attributes is valuable, yet learning to make visual comparisons remains nontrivial. Not only is the number of possible comparisons quadratic in the number of training images, but also access to images adequately spanning the space of fine-grained visual differences is limited. We propose to overcome the sparsity of supervision problem via synthetically generated images. Building on a state-of-the-art image generation engine, we sample pairs of training images exhibiting slight modifications of individual attributes. Augmenting real training image pairs with these examples, we then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images. Our results on datasets of faces and fashion images show the great promise of bootstrapping imperfect image generators to counteract sample sparsity for learning to rank.

Posted Content
TL;DR: GC-RANSAC is shown experimentally, both on synthesized tests and real image pairs, to be more geometrically accurate than state-of-the-art methods on a range of problems, e.g. line fitting, homography, affine transformation, fundamental and essential matrix estimation.
Abstract: A novel method for robust estimation, called Graph-Cut RANSAC, GC-RANSAC in short, is introduced. To separate inliers and outliers, it runs the graph-cut algorithm in the local optimization (LO) step which is applied when a so-far-the-best model is found. The proposed LO step is conceptually simple, easy to implement, globally optimal and efficient. GC-RANSAC is shown experimentally, both on synthesized tests and real image pairs, to be more geometrically accurate than state-of-the-art methods on a range of problems, e.g. line fitting, homography, affine transformation, fundamental and essential matrix estimation. It runs in real-time for many problems at a speed approximately equal to that of the less accurate alternatives (in milliseconds on standard CPU).

Posted Content
TL;DR: A novel deep learning based algorithm that can tackle image matting problems when an image has similar foreground and background colors or complicated textures and evaluation results demonstrate the superiority of this algorithm over previous methods.
Abstract: Image matting is a fundamental computer vision problem and has many applications. Previous algorithms have poor performance when an image has similar foreground and background colors or complicated textures. The main reasons are prior methods 1) only use low-level features and 2) lack high-level context. In this paper, we propose a novel deep learning based algorithm that can tackle both these problems. Our deep model has two parts. The first part is a deep convolutional encoder-decoder network that takes an image and the corresponding trimap as inputs and predict the alpha matte of the image. The second part is a small convolutional network that refines the alpha matte predictions of the first network to have more accurate alpha values and sharper edges. In addition, we also create a large-scale image matting dataset including 49300 training images and 1000 testing images. We evaluate our algorithm on the image matting benchmark, our testing set, and a wide variety of real images. Experimental results clearly demonstrate the superiority of our algorithm over previous methods.

Posted Content
TL;DR: SfSNet as mentioned in this paper is an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance.
Abstract: We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.

Journal ArticleDOI
TL;DR: An active contour model which combines region-scalable fitting energy and optimized Laplacian of Gaussian (LoG) energy is proposed for image segmentation and has a higher segmentation accuracy and efficiency than other major region-based models.

Patent
Wenzhe Shi1, Christian Ledig1, Zehan Wang1, Lucas Theis1, Ferenc Huszar1 
15 Sep 2017
TL;DR: In this article, a neural network is trained to process received visual data to estimate a high-resolution version of the visual data using a training dataset and reference dataset, and a discriminator convolutional neural network parameterized by second weights and biases is trained by comparing characteristics of generated super-resolved image data to characteristics of the reference dataset.
Abstract: A neural network is trained to process received visual data to estimate a high-resolution version of the visual data using a training dataset and reference dataset. A set of training data is generated and a generator convolutional neural network parameterized by first weights and biases is trained by comparing characteristics of the training data to characteristics of the reference dataset. The first network is trained to generate super-resolved image data from low-resolution image data and the training includes modifying first weights and biases to optimize processed visual data based on the comparison between the characteristics of the training data and the characteristics of the reference dataset. A discriminator convolutional neural network parameterized by second weights and biases is trained by comparing characteristics of the generated super-resolved image data to characteristics of the reference dataset, and where the second network is trained to discriminate super-resolved image data from real image data.

Book ChapterDOI
13 Sep 2017
TL;DR: A deep learning approach to remove motion blur from a single image captured in the wild, i.e., in an uncontrolled setting, is proposed and both a novel convolutional neural network architecture and a dataset for blurry images with ground truth are designed.
Abstract: We propose a deep learning approach to remove motion blur from a single image captured in the wild, i.e., in an uncontrolled setting. Thus, we consider motion blur degradations that are due to both camera and object motion, and by occlusion and coming into view of objects. In this scenario, a model-based approach would require a very large set of parameters, whose fitting is a challenge on its own. Hence, we take a data-driven approach and design both a novel convolutional neural network architecture and a dataset for blurry images with ground truth. The network produces directly the sharp image as output and is built into three pyramid stages, which allow to remove blur gradually from a small amount, at the lowest scale, to the full amount, at the scale of the input image. To obtain corresponding blurry and sharp image pairs, we use videos from a high frame-rate video camera. For each small video clip we select the central frame as the sharp image and use the frame average as the corresponding blurred image. Finally, to ensure that the averaging process is a sufficient approximation to real blurry images we estimate optical flow and select frames with pixel displacements smaller than a pixel. We demonstrate state of the art performance on datasets with both synthetic and real images.

Journal ArticleDOI
TL;DR: A novel region-based active contour model based on two different local fitted images is proposed by constructing a novel local hybrid image fitting energy, which is minimized in a variational level set framework to guide the evolving of contour curves toward the desired boundaries.