Showing papers on "Real image published in 2019"

PDF

Open Access

Proceedings Article•DOI•

SinGAN: Learning a Generative Model From a Single Natural Image

[...]

Tamar Rott Shaham¹, Tali Dekel², Tomer Michaeli¹•Institutions (2)

Technion – Israel Institute of Technology¹, Google²

02 May 2019

TL;DR: SinGAN, an unconditional generative model that can be learned from a single natural image, is introduced, trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image.

...read moreread less

Abstract: We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.

...read moreread less

660 citations

Posted Content•

Interpreting the Latent Space of GANs for Semantic Face Editing

[...]

Yujun Shen¹, Jinjin Gu¹, Xiaoou Tang¹, Bolei Zhou¹•Institutions (1)

The Chinese University of Hong Kong¹

25 Jul 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: InterFaceGAN as discussed by the authors explores the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes, including gender, age, expression, and the presence of eyeglasses.

...read moreread less

Abstract: Despite the recent advance of Generative Adversarial Networks (GANs) in high-fidelity image synthesis, there lacks enough understanding of how GANs are able to map a latent code sampled from a random distribution to a photo-realistic image. Previous work assumes the latent space learned by GANs follows a distributed representation but observes the vector arithmetic phenomenon. In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs. In this framework, we conduct a detailed study on how different semantics are encoded in the latent space of GANs for face synthesis. We find that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations. We explore the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes. Besides manipulating gender, age, expression, and the presence of eyeglasses, we can even vary the face pose as well as fix the artifacts accidentally generated by GAN models. The proposed method is further applied to achieve real image manipulation when combined with GAN inversion methods or some encoder-involved models. Extensive results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable facial attribute representation.

...read moreread less

426 citations

Journal Article•DOI•

Locality Preserving Matching

[...]

Jiayi Ma¹, Ji Zhao, Junjun Jiang², Huabing Zhou³, Xiaojie Guo⁴ - Show less +1 more•Institutions (4)

Wuhan University¹, Harbin Institute of Technology², Wuhan Institute of Technology³, Tianjin University⁴

01 May 2019-International Journal of Computer Vision

TL;DR: The authors' method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds, and achieves better or favorably competitive performance in accuracy while intensively cutting time cost by more than two orders of magnitude.

...read moreread less

Abstract: Seeking reliable correspondences between two feature sets is a fundamental and important task in computer vision. This paper attempts to remove mismatches from given putative image feature correspondences. To achieve the goal, an efficient approach, termed as locality preserving matching (LPM), is designed, the principle of which is to maintain the local neighborhood structures of those potential true matches. We formulate the problem into a mathematical model, and derive a closed-form solution with linearithmic time and linear space complexities. Our method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds. To demonstrate the generality of our strategy for handling image matching problems, extensive experiments on various real image pairs for general feature matching, as well as for point set registration, visual homing and near-duplicate image retrieval are conducted. Compared with other state-of-the-art alternatives, our LPM achieves better or favorably competitive performance in accuracy while intensively cutting time cost by more than two orders of magnitude.

...read moreread less

416 citations

Proceedings Article•DOI•

Lightweight Image Super-Resolution with Information Multi-distillation Network

[...]

Zheng Hui¹, Xinbo Gao¹, Yunchu Yang¹, Xiumei Wang¹•Institutions (1)

Xidian University¹

15 Oct 2019

TL;DR: Zheng et al. as mentioned in this paper proposed a lightweight information multi-distillation network (IMDN) by constructing the cascaded information multidistillation blocks (IMDB), which contains distillation and selective fusion parts.

...read moreread less

Abstract: In recent years, single image super-resolution (SISR) methods using deep convolution neural network (CNN) have achieved impressive results. Thanks to the powerful representation capabilities of the deep networks, numerous previous ways can learn the complex non-linear mapping between low-resolution (LR) image patches and their high-resolution (HR) versions. However, excessive convolutions will limit the application of super-resolution technology in low computing power devices. Besides, super-resolution of any arbitrary scale factor is a critical issue in practical applications, which has not been well solved in the previous approaches. To address these issues, we propose a lightweight information multi-distillation network (IMDN) by constructing the cascaded information multi-distillation blocks (IMDB), which contains distillation and selective fusion parts. Specifically, the distillation module extracts hierarchical features step-by-step, and fusion module aggregates them according to the importance of candidate features, which is evaluated by the proposed contrast-aware channel attention mechanism. To process real images with any sizes, we develop an adaptive cropping strategy (ACS) to super-resolve block-wise image patches using the same well-trained model. Extensive experiments suggest that the proposed method performs favorably against the state-of-the-art SR algorithms in term of visual quality, memory footprint, and inference time. Code is available at \urlhttps://github.com/Zheng222/IMDN.

...read moreread less

386 citations

Proceedings Article•DOI•

Unprocessing Images for Learned Raw Denoising

[...]

Tim Brooks¹, Ben Mildenhall², Tianfan Xue¹, Jiawen Chen¹, Dillon Sharlet¹, Jonathan T. Barron¹ - Show less +2 more•Institutions (2)

Google¹, University of California, Berkeley²

15 Jun 2019

TL;DR: In this paper, the authors propose a technique to "unprocess" images by inverting each step of an image processing pipeline, thereby allowing them to synthesize realistic raw sensor measurements from commonly available Internet photos.

...read moreread less

Abstract: Machine learning techniques work best when the data used for training resembles the data used for evaluation. This holds true for learned single-image denoising algorithms, which are applied to real raw camera sensor readings but, due to practical constraints, are often trained on synthetic image data. Though it is understood that generalizing from synthetic to real images requires careful consideration of the noise properties of camera sensors, the other aspects of an image processing pipeline (such as gain, color correction, and tone mapping) are often overlooked, despite their significant effect on how raw measurements are transformed into finished images. To address this, we present a technique to “unprocess” images by inverting each step of an image processing pipeline, thereby allowing us to synthesize realistic raw sensor measurements from commonly available Internet photos. We additionally model the relevant components of an image processing pipeline when evaluating our loss function, which allows training to be aware of all relevant photometric processing that will occur after denoising. By unprocessing and processing training data and model outputs in this way, we are able to train a simple convolutional neural network that has 14%-38% lower error rates and is 9×-18× faster than the previous state of the art on the Darmstadt Noise Dataset, and generalizes to sensors outside of that dataset as well.

...read moreread less

369 citations

Proceedings Article•DOI•

Real Image Denoising With Feature Attention

[...]

Saeed Anwar¹, Nick Barnes²•Institutions (2)

Australian National University¹, Commonwealth Scientific and Industrial Research Organisation²

01 Oct 2019

TL;DR: In this paper, a single-stage blind real image denoising network (RIDNet) was proposed by employing a modular architecture, which uses residual on the residual structure to ease the flow of low-frequency information and apply feature attention to exploit the channel dependencies.

...read moreread less

Abstract: Deep convolutional neural networks perform better on images containing spatially invariant noise (synthetic noise); however, its performance is limited on real-noisy photographs and requires multiple stage network modeling. To advance the practicability of the denoising algorithms, this paper proposes a novel single-stage blind real image denoising network (RIDNet) by employing a modular architecture. We use residual on the residual structure to ease the flow of low-frequency information and apply feature attention to exploit the channel dependencies. Furthermore, the evaluation in terms of quantitative metrics and visual quality on three synthetic and four real noisy datasets against 19 state-of-the-art algorithms demonstrate the superiority of our RIDNet.

...read moreread less

285 citations

Proceedings Article•

DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction

[...]

Qiangeng Xu¹, Weiyue Wang¹, Duygu Ceylan², Radomir Mech², Ulrich Neumann¹ - Show less +1 more•Institutions (2)

University of Southern California¹, Adobe Systems²

06 Sep 2019

TL;DR: DISN as mentioned in this paper predicts the projected location for each 3D point on the 2D image and extracts local features from the image feature maps, which significantly improves the accuracy of the signed distance field prediction, especially for the detail-rich areas.

...read moreread less

Abstract: Reconstructing 3D shapes from single-view images has been a long-standing research problem. In this paper, we present DISN, a Deep Implicit Surface Net- work which can generate a high-quality detail-rich 3D mesh from a 2D image by predicting the underlying signed distance fields. In addition to utilizing global image features, DISN predicts the projected location for each 3D point on the 2D image and extracts local features from the image feature maps. Combin- ing global and local features significantly improves the accuracy of the signed distance field prediction, especially for the detail-rich areas. To the best of our knowledge, DISN is the first method that constantly captures details such as holes and thin structures present in 3D shapes from single-view images. DISN achieves the state-of-the-art single-view reconstruction performance on a variety of shape categories reconstructed from both synthetic and real images. Code is available at https://github.com/laughtervv/DISN. The supplemen- tary can be found at https://xharlie.github.io/images/neurips_ 2019_supp.pdf

...read moreread less

250 citations

Posted Content•

Real Image Denoising with Feature Attention

[...]

Saeed Anwar¹, Nick Barnes¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

16 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel single-stage blind real image denoising network (RIDNet) is proposed by employing a modular architecture that uses residual on the residual structure to ease the flow of low-frequency information and apply feature attention to exploit the channel dependencies.

...read moreread less

Abstract: Deep convolutional neural networks perform better on images containing spatially invariant noise (synthetic noise); however, their performance is limited on real-noisy photographs and requires multiple stage network modeling. To advance the practicability of denoising algorithms, this paper proposes a novel single-stage blind real image denoising network (RIDNet) by employing a modular architecture. We use a residual on the residual structure to ease the flow of low-frequency information and apply feature attention to exploit the channel dependencies. Furthermore, the evaluation in terms of quantitative metrics and visual quality on three synthetic and four real noisy datasets against 19 state-of-the-art algorithms demonstrate the superiority of our RIDNet.

...read moreread less

243 citations

Proceedings Article•DOI•

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data

[...]

Xiangyu Yue¹, Yang Zhang², Sicheng Zhao¹, Alberto Sangiovanni-Vincentelli¹, Kurt Keutzer¹, Boqing Gong³ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, University of Central Florida², Google³

01 Oct 2019

TL;DR: A new approach of domain randomization and pyramid consistency to learn a model with high generalizability for semantic segmentation of real-world self-driving scenes in a domain generalization fashion is proposed.

...read moreread less

Abstract: We propose to harness the potential of simulation for semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any information about target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First, we propose to randomize the synthetic images with styles of real images in terms of visual appearances using auxiliary datasets, in order to effectively learn domain-invariant representations. Second, we further enforce pyramid consistency across different "stylized" images and within an image, in order to learn domain-invariant and scale-invariant features, respectively. Extensive experiments are conducted on generalization from GTA and SYNTHIA to Cityscapes, BDDS, and Mapillary; and our method achieves superior results over the state-of-the-art techniques. Remarkably, our generalization results are on par with or even better than those obtained by state-of-the-art simulation-to-real domain adaptation methods, which access the target domain data at training time.

...read moreread less

228 citations

Journal Article•DOI•

Superpixel-Based Fast Fuzzy C-Means Clustering for Color Image Segmentation

[...]

Tao Lei¹, Xiaohong Jia¹, Yanning Zhang², Shigang Liu³, Hongying Meng⁴, Asoke K. Nandi⁴ - Show less +2 more•Institutions (4)

Shaanxi University of Science and Technology¹, Northwestern Polytechnical University², Shaanxi Normal University³, Brunel University London⁴

01 Sep 2019-IEEE Transactions on Fuzzy Systems

TL;DR: A superpixel-based fast FCM clustering algorithm that is significantly faster and more robust than state-of-the-art clustering algorithms for color image segmentation and implemented with histogram parameter on the superpixel image is proposed.

...read moreread less

Abstract: A great number of improved fuzzy c-means (FCM) clustering algorithms have been widely used for grayscale and color image segmentation. However, most of them are time-consuming and unable to provide desired segmentation results for color images due to two reasons. The first one is that the incorporation of local spatial information often causes a high computational complexity due to the repeated distance computation between clustering centers and pixels within a local neighboring window. The other one is that a regular neighboring window usually breaks up the real local spatial structure of images and thus leads to a poor segmentation. In this work, we propose a superpixel-based fast FCM clustering algorithm that is significantly faster and more robust than state-of-the-art clustering algorithms for color image segmentation. To obtain better local spatial neighborhoods, we first define a multiscale morphological gradient reconstruction operation to obtain a superpixel image with accurate contour. In contrast to traditional neighboring window of fixed size and shape, the superpixel image provides better adaptive and irregular local spatial neighborhoods that are helpful for improving color image segmentation. Second, based on the obtained superpixel image, the original color image is simplified efficiently and its histogram is computed easily by counting the number of pixels in each region of the superpixel image. Finally, we implement FCM with histogram parameter on the superpixel image to obtain the final segmentation result. Experiments performed on synthetic images and real images demonstrate that the proposed algorithm provides better segmentation results and takes less time than state-of-the-art clustering algorithms for color image segmentation.

...read moreread less

188 citations

Posted Content•

SinGAN: Learning a Generative Model from a Single Natural Image

[...]

Tamar Rott Shaham¹, Tali Dekel², Tomer Michaeli¹•Institutions (2)

Technion – Israel Institute of Technology¹, Google²

02 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: SinGAN as discussed by the authors uses a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image, which allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and fine textures of the training image.

...read moreread less

Proceedings Article•DOI•

Lightweight Image Super-Resolution with Information Multi-distillation Network

[...]

Zheng Hui¹, Xinbo Gao¹, Yunchu Yang¹, Xiumei Wang¹•Institutions (1)

Xidian University¹

26 Sep 2019-arXiv: Image and Video Processing

TL;DR: An adaptive cropping strategy (ACS) is developed to super-resolve block-wise image patches using the same well-trained model and performs favorably against the state-of-the-art SR algorithms in terms of visual quality, memory footprint, and inference time.

...read moreread less

Posted Content•

DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction

[...]

Qiangeng Xu¹, Weiyue Wang¹, Duygu Ceylan¹, Radomir Mech², Ulrich Neumann² - Show less +1 more•Institutions (2)

University of Southern California¹, Adobe Systems²

26 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: DISN, a Deep Implicit Surface Network which can generate a high-quality detail-rich 3D mesh from an 2D image by predicting the underlying signed distance fields by combining global and local features, achieves the state-of-the-art single-view reconstruction performance.

...read moreread less

Abstract: Reconstructing 3D shapes from single-view images has been a long-standing research problem. In this paper, we present DISN, a Deep Implicit Surface Network which can generate a high-quality detail-rich 3D mesh from an 2D image by predicting the underlying signed distance fields. In addition to utilizing global image features, DISN predicts the projected location for each 3D point on the 2D image, and extracts local features from the image feature maps. Combining global and local features significantly improves the accuracy of the signed distance field prediction, especially for the detail-rich areas. To the best of our knowledge, DISN is the first method that constantly captures details such as holes and thin structures present in 3D shapes from single-view images. DISN achieves the state-of-the-art single-view reconstruction performance on a variety of shape categories reconstructed from both synthetic and real images. Code is available at this https URL The supplementary can be found at this https URL

...read moreread less

Proceedings Article•DOI•

Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Data

[...]

Michael Danielczuk¹, Matthew Matl¹, Saurabh Gupta¹, Andrew Li¹, Andrew H. Lee¹, Jeffrey Mahler¹, Ken Goldberg¹ - Show less +3 more•Institutions (1)

University of California, Berkeley¹

20 May 2019

TL;DR: A method for automated dataset generation is presented and a variant of Mask R-CNN is trained with domain randomization on the generated dataset to perform category-agnostic instance segmentation without any hand-labeled data and the model is deployed in an instance-specific grasping pipeline to demonstrate its usefulness in a robotics application.

...read moreread less

Abstract: The ability to segment unknown objects in depth images has potential to enhance robot skills in grasping and object tracking. Recent computer vision research has demonstrated that Mask R-CNN can be trained to segment specific categories of objects in RGB images when massive hand-labeled datasets are available. As generating these datasets is time-consuming, we instead train with synthetic depth images. Many robots now use depth sensors, and recent results suggest training on synthetic depth data can transfer successfully to the real world. We present a method for automated dataset generation and rapidly generate a synthetic training dataset of 50,000 depth images and 320,000 object masks using simulated heaps of 3D CAD models. We train a variant of Mask R-CNN with domain randomization on the generated dataset to perform category-agnostic instance segmentation without any hand-labeled data and we evaluate the trained network, which we refer to as Synthetic Depth (SD) Mask R-CNN, on a set of real, high-resolution depth images of challenging, densely-cluttered bins containing objects with highly-varied geometry. SD Mask R-CNN outperforms point cloud clustering baselines by an absolute 15% in Average Precision and 20% in Average Recall on COCO benchmarks, and achieves performance levels similar to a Mask R-CNN trained on a massive, hand-labeled RGB dataset and fine-tuned on real images from the experimental setup. We deploy the model in an instance-specific grasping pipeline to demonstrate its usefulness in a robotics application. Code, the synthetic training dataset, and supplementary material are available at https://bit.ly/2letCuE.

...read moreread less

Posted Content•

Variational Denoising Network: Toward Blind Noise Modeling and Removal

[...]

Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng¹ - Show less +1 more•Institutions (1)

Xi'an Jiaotong University¹

29 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a new variational inference method, which integrates both noise estimation and image denoising into a unique Bayesian framework, for blind image Denoising, and presents an approximate posterior, parameterized by deep neural networks, presented by taking the intrinsic clean image and noise variances as latent variables conditioned on the input noisy image.

...read moreread less

Abstract: Blind image denoising is an important yet very challenging problem in computer vision due to the complicated acquisition process of real images. In this work we propose a new variational inference method, which integrates both noise estimation and image denoising into a unique Bayesian framework, for blind image denoising. Specifically, an approximate posterior, parameterized by deep neural networks, is presented by taking the intrinsic clean image and noise variances as latent variables conditioned on the input noisy image. This posterior provides explicit parametric forms for all its involved hyper-parameters, and thus can be easily implemented for blind image denoising with automatic noise estimation for the test noisy image. On one hand, as other data-driven deep learning methods, our method, namely variational denoising network (VDN), can perform denoising efficiently due to its explicit form of posterior expression. On the other hand, VDN inherits the advantages of traditional model-driven approaches, especially the good generalization capability of generative models. VDN has good interpretability and can be flexibly utilized to estimate and remove complicated non-i.i.d. noise collected in real scenarios. Comprehensive experiments are performed to substantiate the superiority of our method in blind image denoising.

...read moreread less

Posted Content•

Image Processing Using Multi-Code GAN Prior

[...]

Jinjin Gu¹, Yujun Shen¹, Bolei Zhou¹•Institutions (1)

The Chinese University of Hong Kong¹

15 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel approach is proposed, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks, by employing multiple latent codes to generate multiple feature maps at some intermediate layer of the generator and composing them with adaptive channel importance to recover the input image.

...read moreread less

Abstract: Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by back-propagation or by learning an additional encoder. However, the reconstructions from both of the methods are far from ideal. In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. In particular, we employ multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image. Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors. The resulting high-fidelity image reconstruction enables the trained GAN models as prior to many real-world applications, such as image colorization, super-resolution, image inpainting, and semantic manipulation. We further analyze the properties of the layer-wise representation learned by GAN models and shed light on what knowledge each layer is capable of representing.

...read moreread less

Journal Article•DOI•

Combining noise-to-image and image-to-image GANs: Brain MR image augmentation for tumor detection

[...]

Changhee Han¹, Leonardo Rundo², Ryosuke Araki³, Yudai Nagano¹, Yujiro Furukawa⁴, Giancarlo Mauri⁵, Hideki Nakayama¹, Hideaki Hayashi⁶ - Show less +4 more•Institutions (6)

University of Tokyo¹, University of Cambridge², Chubu University³, Jikei University School of Medicine⁴, University of Milano-Bicocca⁵, National Institute of Informatics⁶

15 Oct 2019-IEEE Access

TL;DR: Li et al. as discussed by the authors proposed a two-step GAN-based DA that generates and refines brain Magnetic Resonance (MR) images with/without tumors separately: (i) Progressive Growing of GAN (PGGANs), multi-stage noise-to-image GAN for high-resolution MR image generation, first generates realistic/diverse 256×256 images; (ii) Multimodal UNsupervised Image-toimage Translation (MUNIT) that combines GANs/Variational AutoEncoders or SimGAN that uses a DA-

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) achieve excellent computer-assisted diagnosis with sufficient annotated training data. However, most medical imaging datasets are small and fragmented. In this context, Generative Adversarial Networks (GANs) can synthesize realistic/diverse additional training images to fill the data lack in the real image distribution; researchers have improved classification by augmenting data with noise-to-image (e.g., random noise samples to diverse pathological images) or image-to-image GANs (e.g., a benign image to a malignant one). Yet, no research has reported results combining noise-to-image and image-to-image GANs for further performance boost. Therefore, to maximize the DA effect with the GAN combinations, we propose a two-step GAN-based DA that generates and refines brain Magnetic Resonance (MR) images with/without tumors separately: (i) Progressive Growing of GANs (PGGANs), multi-stage noise-to-image GAN for high-resolution MR image generation, first generates realistic/diverse 256×256 images; (ii) Multimodal UNsupervised Image-to-image Translation (MUNIT) that combines GANs/Variational AutoEncoders or SimGAN that uses a DA-focused GAN loss, further refines the texture/shape of the PGGAN-generated images similarly to the real ones. We thoroughly investigate CNN-based tumor classification results, also considering the influence of pre-training on ImageNet and discarding weird-looking GAN-generated images. The results show that, when combined with classic DA, our two-step GAN-based DA can significantly outperform the classic DA alone, in tumor detection (i.e., boosting sensitivity 93.67% to 97.48%) and also in other medical imaging tasks.

...read moreread less

Proceedings Article•DOI•

Deep Iterative Down-Up CNN for Image Denoising

[...]

Songhyun Yu¹, Bum Jun Park¹, Jechang Jeong¹•Institutions (1)

Hanyang University¹

16 Jun 2019

TL;DR: A deep iterative down-up convolutional neural network (DIDN) for image denoising, which repeatedly decreases and increases the resolution of the feature maps.

...read moreread less

Abstract: Networks using down-scaling and up-scaling of feature maps have been studied extensively in low-level vision research owing to efficient GPU memory usage and their capacity to yield large receptive fields. In this paper, we propose a deep iterative down-up convolutional neural network (DIDN) for image denoising, which repeatedly decreases and increases the resolution of the feature maps. The basic structure of the network is inspired by U-Net which was originally developed for semantic segmentation. We modify the down-scaling and up-scaling layers for image denoising task. Conventional denoising networks are trained to work with a single-level noise, or alternatively use noise information as inputs to address multi-level noise with a single model. Conversely, because the efficient memory usage of our network enables it to handle multiple parameters, it is capable of processing a wide range of noise levels with a single model without requiring noise-information inputs as a work-around. Consequently, our DIDN exhibits state-of-the-art performance using the benchmark dataset and also demonstrates its superiority in the NTIRE 2019 real image denoising challenge.

...read moreread less

Posted Content•

HOnnotate: A method for 3D Annotation of Hand and Object Poses

[...]

Shreyas Hampali¹, Mahdi Rad¹, Markus Oberweger¹, Vincent Lepetit¹•Institutions (1)

Graz University of Technology¹

02 Jul 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: O-3D is created, the first markerless dataset of color images with 3D annotations for both the hand and object, and a single RGB image-based method to predict the hand pose when interacting with objects under severe occlusions is developed.

...read moreread less

Abstract: We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method. Our motivation is the current lack of annotated real images for this problem, as estimating the 3D poses is challenging, mostly because of the mutual occlusions between the hand and the object. To tackle this challenge, we capture sequences with one or several RGB-D cameras and jointly optimize the 3D hand and object poses over all the frames simultaneously. This method allows us to automatically annotate each frame with accurate estimates of the poses, despite large mutual occlusions. With this method, we created HO-3D, the first markerless dataset of color images with 3D annotations for both the hand and object. This dataset is currently made of 77,558 frames, 68 sequences, 10 persons, and 10 objects. Using our dataset, we develop a single RGB image-based method to predict the hand pose when interacting with objects under severe occlusions and show it generalizes to objects not seen in the dataset.

...read moreread less

Proceedings Article•DOI•

NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results

[...]

Jianrui Cai¹, Shuhang Gu¹, Radu Timofte¹, Lei Zhang¹, Xiao Liu¹, Ding Yukang¹, Dongliang He¹, Chao Li¹, Yi Fu¹, Shilei Wen¹, Ruicheng Feng¹, Jinjin Gu¹, Yu Qiao¹, Chao Dong¹, Dongwon Park¹, Se Young Chun¹, Sanghoon Yoon¹, Junhyung Kwak¹, Donghee Son¹, Syed Waqas Zamir¹, Aditya Arora¹, Salman H. Khan¹, Fahad Shahbaz Khan¹, Ling Shao¹, Zhengping Wei¹, Lei Liu¹, Hong Cai¹, Darui Li¹, Fujie Gao¹, Zheng Hui¹, Xiumei Wang¹, Xinbo Gao¹, Guoan Cheng¹, Ai Matsune¹, Qiuyu Li¹, Leilei Zhu¹, Huaijuan Zang¹, Shu Zhan¹, Yajun Qiu¹, Ruxin wang¹, Jiawei Li¹, Yongcheng Jing¹, Mingli Song¹, Pengju Liu¹, Kai Zhang¹, Jingdong Liu¹, Jiye Liu¹, Hongzhi Zhang¹, Wangmeng Zuo¹, Wenyi Tang¹, Jing Liu¹, Youngjung Kim¹, Changyeop Shin¹, Minbeom Kim¹, Sungho Kim¹, Pablo Navarrete Michelini¹, Hanwen Liu¹, Dan Zhu¹, Xuan Xu¹, Xin Li¹, Furui Bai¹, Xiaopeng Sun¹, Lin Zha¹, Yuanfei Huang¹, Wen Lu¹, Yanpeng Cao¹, Du Chen¹, Zewei He¹, Sun Anshun¹, Siliang Tang¹, Fan Hongfei¹, Xiang Li¹, Li Guo¹, Zhang Wenjie¹, Zhang Yumei¹, Qingwen He¹, Jinghui Qin¹, Lishan Huang¹, Yukai Shi¹, Pengxu Wei¹, Wushao Wen¹, Liang Lin¹, Jun Yu¹, Guochen Xie¹, Mengyan Li¹, Rong Chen¹, Xiaotong Luo¹, Chen Hong¹, Yanyun Qu¹, Cuihua Li¹, Zhi-Song Liu¹, Li-Wen Wang¹, Chu-Tak Li¹, Can Zhao¹, Bowen Li¹, Chung-Chi Tsai¹, Shang-Chih Chuang¹, Joon-Hee Choi¹, Joon-Soo Kim¹, Xiaoyun Jiang¹, Ze Pan¹, Qunbo Lv¹, Zheng Tan¹, Peidong He¹ - Show less +100 more•Institutions (1)

Hong Kong Polytechnic University¹

16 Jun 2019

TL;DR: The 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) is reviewed with a focus on proposed solutions and results and the state-of-the-art in real-world single image super- resolution.

...read moreread less

Abstract: This paper reviewed the 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) with a focus on proposed solutions and results. The challenge had 1 track, which was aimed at the real-world single image super-resolution problem with an unknown scaling factor. Participants were mapping low-resolution images captured by a DSLR camera with a shorter focal length to their high-resolution images captured at a longer focal length. With this challenge, we introduced a novel real-world super-resolution dataset (RealSR). The track had 403 registered participants, and 36 teams competed in the final testing phase. They gauge the state-of-the-art in real-world single image super-resolution.

...read moreread less

Proceedings Article•DOI•

Unsupervised Domain-Specific Deblurring via Disentangled Representations

[...]

Boyu Lu¹, Jun-Cheng Chen¹, Rama Chellappa¹•Institutions (1)

University of Maryland, College Park¹

15 Jun 2019

TL;DR: This paper presents an unsupervised method for domain-specific, single-image deblurring based on disentangled representations, and enforce a KL divergence loss to regularize the distribution range of extracted blur attributes such that little content information is contained.

...read moreread less

Abstract: Image deblurring aims to restore the latent sharp images from the corresponding blurred ones. In this paper, we present an unsupervised method for domain-specific, single-image deblurring based on disentangled representations. The disentanglement is achieved by splitting the content and blur features in a blurred image using content encoders and blur encoders. We enforce a KL divergence loss to regularize the distribution range of extracted blur attributes such that little content information is contained. Meanwhile, to handle the unpaired training data, a blurring branch and the cycle-consistency loss are added to guarantee that the content structures of the deblurred results match the original images. We also add an adversarial loss on deblurred results to generate visually realistic images and a perceptual loss to further mitigate the artifacts. We perform extensive experiments on the tasks of face and text deblurring using both synthetic datasets and real images, and achieve improved results compared to recent state-of-the-art deblurring methods.

...read moreread less

Proceedings Article•DOI•

GRDN:Grouped Residual Dense Network for Real Image Denoising and GAN-Based Real-World Noise Modeling

[...]

Dong-Wook Kim¹, Jae Ryun Chung¹, Seung-Won Jung¹•Institutions (1)

Dongguk University¹

01 Jun 2019

TL;DR: A grouped residual dense network (GRDN) is proposed, which is an extended and generalized architecture of the state-of-the-art residual densenetwork (RDN) and a new generative adversarial network-based real-world noise modeling method is developed.

...read moreread less

Abstract: Recent research on image denoising has progressed with the development of deep learning architectures, especially convolutional neural networks. However, real-world image denoising is still very challenging because it is not possible to obtain ideal pairs of ground-truth images and real-world noisy images. Owing to the recent release of benchmark datasets, the interest of the image denoising community is now moving toward the real-world denoising problem. In this paper, we propose a grouped residual dense network (GRDN), which is an extended and generalized architecture of the state-of-the-art residual dense network (RDN). The core part of RDN is defined as grouped residual dense block (GRDB) and used as a building module of GRDN. We experimentally show that the image denoising performance can be significantly improved by cascading GRDBs. In addition to the network architecture design, we also develop a new generative adversarial network-based real-world noise modeling method. We demonstrate the superiority of the proposed methods by achieving the highest score in terms of both the peak signal-to-noise ratio and the structural similarity in the NTIRE2019 Real Image Denoising Challenge - Track 2:sRGB.

...read moreread less

Journal Article•DOI•

SAR Target Detection Based on SSD With Data Augmentation and Transfer Learning

[...]

Zhaocheng Wang¹, Lan Du¹, Jiashun Mao¹, Bin Liu¹, Dongwen Yang¹ - Show less +1 more•Institutions (1)

Xidian University¹

01 Jan 2019-IEEE Geoscience and Remote Sensing Letters

TL;DR: The single shot multibox detector (SSD), which is a real-time object detection method based on convolutional neural network, is applied to realize target detection for synthetic aperture radar (SAR) images and can obtain better detection performance than other detection methods.

...read moreread less

Abstract: In this letter, the single shot multibox detector (SSD), which is a real-time object detection method based on convolutional neural network, is applied to realize target detection for synthetic aperture radar (SAR) images. Since there are no sufficient labeled images for training in SAR target detection, we apply two strategies, data augmentation and transfer learning. For data augmentation, the first approaches to use some image processing methods, i.e., manual-extracting subimages, adding noise, filtering, and flipping, on the original training images to generate some new training images; the second approach is to employ the existing SAR target recognition data set, MSTAR data set, to assist in accomplishing the target detection task. For transfer learning, we first apply subaperture decomposition technique on original SAR images to acquire three-channel subaperture SAR images, and then transfer the three-channel VGGNet model pretrained on the ImageNet data set to the three-channel subaperture SAR images, in order to initialize corresponding parameters of the convolutional layers in the base network in our SSD. The feature extraction network, consisting of the base network and the auxiliary structure, is used to learn multiscale feature maps, and then convolutional predictors are used to acquire the final detection results. The experimental results on the miniSAR real image data set demonstrate that the proposed method can obtain better detection performance than other detection methods.

...read moreread less

Proceedings Article•DOI•

Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera

[...]

Liyuan Pan¹, Cedric Scheerlinck¹, Xin Yu¹, Richard Hartley¹, Miaomiao Liu², Yuchao Dai¹ - Show less +2 more•Institutions (2)

Australian National University¹, Northwestern Polytechnical University²

15 Jun 2019

TL;DR: In this article, a simple and effective approach, the Event-based Double Integral (EDI) model, was proposed to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data.

...read moreread less

Abstract: Event-based cameras can measure intensity changes (called ‘events’) with microsecond accuracy under high-speed motion and challenging lighting conditions. With the active pixel sensor (APS), the event camera allows simultaneous output of the intensity frames. However, the output images are captured at a relatively low frame-rate and often suffer from motion blur. A blurry image can be regarded as the integral of a sequence of latent images, while the events indicate the changes between the latent images. Therefore, we are able to model the blur-generation process by associating event data to a latent image. In this paper, we propose a simple and effective approach, the Event-based Double Integral (EDI) model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data. The video generation is based on solving a simple non-convex optimization problem in a single scalar variable. Experimental results on both synthetic and real images demonstrate the superiority of our EDI model and optimization method in comparison to the state-of-the-art.

...read moreread less

Proceedings Article•DOI•

NTIRE 2019 Challenge on Real Image Denoising: Methods and Results

[...]

Abdelrahman Abdelhamed¹, Radu Timofte¹, Michael S. Brown¹, Songhyun Yu¹, Bumjun Park¹, Jechang Jeong¹, Seung-Won Jung¹, Dong-Wook Kim¹, Jae-Ryun Chung¹, Jiaming Liu¹, Yuzhi Wang¹, Chi-Hao Wu¹, Qin Xu¹, Chuan Wang¹, Shaofan Cai¹, Yifan Ding¹, Haoqiang Fan¹, Jue Wang¹, Kai Zhang¹, Wangmeng Zuo¹, Magauiya Zhussip¹, Dongwon Park¹, Shakarim Soltanayev¹, Se Young Chun¹, Zhiwei Xiong¹, Chang Chen¹, Muhammad Haris¹, Kazutoshi Akita¹, Tomoki Yoshida¹, Greg Shakhnarovich¹, Norimichi Ukita¹, Syed Waqas Zamir¹, Aditya Arora¹, Salman Khan¹, Fahad Shahbaz Khan¹, Ling Shao¹, Sung-Jea Ko¹, Dong-Pan Lim¹, Seung-Wook Kim¹, Seo-Won Ji¹, Sang-Won Lee¹, Wenyi Tang¹, Yuchen Fan¹, Yuqian Zhou¹, Ding Liu¹, Thomas S. Huang¹, Deyu Meng¹, Lei Zhang¹, Hongwei Yong¹, Yiyun Zhao¹, Pengliang Tang¹, Yue Lu¹, Raimondo Schettini¹, Simone Bianco¹, Simone Zini¹, Chi Li¹, Yang Wang¹, Zhiguo Cao¹ - Show less +54 more•Institutions (1)

York University¹

16 Jun 2019

TL;DR: The proposed methods by the 15 teams represent the current state-of-the-art performance in image denoising targeting real noisy images.

...read moreread less

Abstract: This paper reviews the NTIRE 2019 challenge on real image denoising with focus on the proposed methods and their results. The challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern raw-RGB and (2) the standard RGB (sRGB) color spaces. The tracks had 216 and 220 registered participants, respectively. A total of 15 teams, proposing 17 methods, competed in the final phase of the challenge. The proposed methods by the 15 teams represent the current state-of-the-art performance in image denoising targeting real noisy images.

...read moreread less

Posted Content•

CNN-generated images are surprisingly easy to spot... for now

[...]

Sheng-Yu Wang¹, Oliver Wang², Richard Zhang², Andrew Owens³, Alexei A. Efros¹ - Show less +1 more•Institutions (3)

University of California, Berkeley¹, Adobe Systems², University of Michigan³

23 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is demonstrated that, with careful pre- and post-processing and data augmentation, a standard image classifier trained on only one specific CNN generator (ProGAN) is able to generalize surprisingly well to unseen architectures, datasets, and training methods.

...read moreread less

Abstract: In this work we ask whether it is possible to create a "universal" detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used. To test this, we collect a dataset consisting of fake images generated by 11 different CNN-based image generator models, chosen to span the space of commonly used architectures today (ProGAN, StyleGAN, BigGAN, CycleGAN, StarGAN, GauGAN, DeepFakes, cascaded refinement networks, implicit maximum likelihood estimation, second-order attention super-resolution, seeing-in-the-dark). We demonstrate that, with careful pre- and post-processing and data augmentation, a standard image classifier trained on only one specific CNN generator (ProGAN) is able to generalize surprisingly well to unseen architectures, datasets, and training methods (including the just released StyleGAN2). Our findings suggest the intriguing possibility that today's CNN-generated images share some common systematic flaws, preventing them from achieving realistic image synthesis. Code and pre-trained networks are available at this https URL .

...read moreread less

Proceedings Article•DOI•

Densely Connected Hierarchical Network for Image Denoising

[...]

Bum Jun Park¹, Songhyun Yu¹, Jechang Jeong¹•Institutions (1)

Hanyang University¹

16 Jun 2019

TL;DR: This study introduces a densely connected hierarchical image denoising network (DHDN), which exceeds the performances of state-of-the-art image Denoising solutions and establishes that the proposed network outperforms conventional methods.

...read moreread less

Abstract: Recently, deep convolutional neural networks have been applied in numerous image processing researches and have exhibited drastically improved performances. In this study, we introduce a densely connected hierarchical image denoising network (DHDN), which exceeds the performances of state-of-the-art image denoising solutions. Our proposed network improves the image denoising performance by applying the hierarchical architecture of the modified U-Net; this makes our network to use a larger number of parameters than other methods. In addition, we induce feature reuse and solve the vanishing-gradient problem by applying dense connectivity and residual learning to our convolution blocks and network. Finally, we successfully apply the model ensemble and self-ensemble methods; this enable us to improve the performance of the proposed network. The performance of the proposed network is validated by winning the second place in the NTRIE 2019 real image denoising challenge sRGB track and the third place in the raw-RGB track. Additional experimental results on additive white Gaussian noise removal also establishes that the proposed network outperforms conventional methods; this is notwithstanding the fact that the proposed network handles a wide range of noise levels with a single set of trained parameters.

...read moreread less

Proceedings Article•DOI•

3D Guided Fine-Grained Face Manipulation

[...]

Zhenglin Geng¹, Chen Cao, Sergey Tulyakov•Institutions (1)

Stanford University¹

15 Jun 2019

TL;DR: This work presents a method for fine-grained face manipulation that can synthesize another arbitrary expression by the same person by first fitting a 3D face model and then disentangling the face into a texture and a shape.

...read moreread less

Abstract: We present a method for fine-grained face manipulation. Given a face image with an arbitrary expression, our method can synthesize another arbitrary expression by the same person. This is achieved by first fitting a 3D face model and then disentangling the face into a texture and a shape. We then learn different networks in these two spaces. In the texture space, we use a conditional generative network to change the appearance, and carefully design input formats and loss functions to achieve the best results. In the shape space, we use a fully connected network to predict the accurate shapes and use the available depth data for supervision. Both networks are conditioned on expression coefficients rather than discrete labels, allowing us to generate an unlimited amount of expressions. We show the superiority of this disentangling approach through both quantitative and qualitative studies. In a user study, our method is preferred in 85% of cases when compared to the most recent work. When compared to the ground truth, annotators cannot reliably distinguish between our synthesized images and real images, preferring our method in 53% of the cases.

...read moreread less

Proceedings Article•DOI•

Photorealistic Image Synthesis for Object Instance Detection

[...]

Tomas Hodan¹, Vibhav Vineet², Ran Gal², Shalev Emanuel², Jon Hanzelka², Treb Connell², Pedro Urbina², Sudipta N. Sinha², Brian Guenter² - Show less +5 more•Institutions (2)

Czech Technical University in Prague¹, Microsoft²

19 Sep 2019

TL;DR: In this article, the authors synthesize highly photorealistic images of 3D object models, which they use to train a convolutional neural network for detecting the objects in real images.

...read moreread less

Abstract: We present an approach to synthesize highly photorealistic images of 3D object models, which we use to train a convolutional neural network for detecting the objects in real images. The proposed approach has three key ingredients: (1) 3D object models are rendered in 3D models of complete scenes with realistic materials and lighting, (2) plausible geometric configuration of objects and cameras in a scene is generated using physics simulation, and (3) high photorealism of the synthesized images is achieved by physically based rendering. When trained on images synthesized by the proposed approach, the Faster R-CNN object detector [1] achieves a 24% absolute improvement of mAP@.75IoU on Rutgers APC [2] and 11% on LineMod-Occluded [3] datasets, compared to a baseline where the training images are synthesized by rendering object models on top of random photographs. This work is a step towards being able to effectively train object detectors without capturing or annotating any real images. A dataset of 400K synthetic images with ground truth annotations for various computer vision tasks will be released on the project website: thodan.github.io/objectsynth.

...read moreread less

Proceedings Article•DOI•

Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection

[...]

Ruoqi Sun¹, Xinge Zhu², Chongruo Wu³, Chen Huang⁴, Jianping Shi⁵, Lizhuang Ma¹ - Show less +2 more•Institutions (5)

Shanghai Jiao Tong University¹, The Chinese University of Hong Kong², University of California, Davis³, Carnegie Mellon University⁴, SenseTime⁵

15 Jun 2019

TL;DR: This paper considers transfer learning for semantic segmentation that aims to mitigate the gap between abundant synthetic data (source domain) and limited real data (target domain), and jointly learns hierarchical weighting networks and segmentation network in an end-to-end manner.

...read moreread less

Abstract: The success of deep neural networks for semantic segmentation heavily relies on large-scale and well-labeled datasets, which are hard to collect in practice. Synthetic data offers an alternative to obtain ground-truth labels for free. However, models directly trained on synthetic data often struggle to generalize to real images. In this paper, we consider transfer learning for semantic segmentation that aims to mitigate the gap between abundant synthetic data (source domain) and limited real data (target domain). Unlike previous approaches that either learn mappings to target domain or finetune on target images, our proposed method jointly learn from real images and selectively from realistic pixels in synthetic images to adapt to the target domain. Our key idea is to have weighting networks to score how similar the synthetic pixels are to real ones, and learn such weighting at pixel-, region- and image-levels. We jointly learn these hierarchical weighting networks and segmentation network in an end-to-end manner. Extensive experiments demonstrate that our proposed approach significantly outperforms other existing baselines, and is applicable to scenarios with extremely limited real images.

...read moreread less

Collapse