scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Saliency detection via conditional adversarial image-to-image network

17 Nov 2018-Neurocomputing (Elsevier)-Vol. 316, pp 357-368
TL;DR: This work proposes to conduct saliency detection by exploiting conditional adversarial network under the cGAN framework, in which saliency map prediction is transformed as a saliency segmentation task by using pair-wised image-to-ground-truth saliency.
About: This article is published in Neurocomputing.The article was published on 2018-11-17. It has received 34 citations till now. The article focuses on the topics: Salience (neuroscience).
Citations
More filters
Journal ArticleDOI
TL;DR: A novel model called ternary adversarial networks with self-supervision (TANSS) is proposed, inspired by zero-shot learning, to overcome the limitation of the existing methods on this challenging task of cross-modal retrieval.
Abstract: Given a query instance from one modality (e.g., image), cross-modal retrieval aims to find semantically similar instances from another modality (e.g., text). To perform cross-modal retrieval, existing approaches typically learn a common semantic space from a labeled source set and directly produce common representations in the learned space for the instances in a target set. These methods commonly require that the instances of both two sets share the same classes. Intuitively, they may not generalize well on a more practical scenario of zero-shot cross-modal retrieval , that is, the instances of the target set contain unseen classes that have inconsistent semantics with the seen classes in the source set. Inspired by zero-shot learning, we propose a novel model called ternary adversarial networks with self-supervision (TANSS) in this paper, to overcome the limitation of the existing methods on this challenging task. Our TANSS approach consists of three paralleled subnetworks: 1) two semantic feature learning subnetworks that capture the intrinsic data structures of different modalities and preserve the modality relationships via semantic features in the common semantic space; 2) a self-supervised semantic subnetwork that leverages the word vectors of both seen and unseen labels as guidance to supervise the semantic feature learning and enhances the knowledge transfer to unseen labels; and 3) we also utilize the adversarial learning scheme in our TANSS to maximize the consistency and correlation of the semantic features between different modalities. The three subnetworks are integrated in our TANSS to formulate an end-to-end network architecture which enables efficient iterative parameter optimization. Comprehensive experiments on three cross-modal datasets show the effectiveness of our TANSS approach compared with the state-of-the-art methods for zero-shot cross-modal retrieval.

185 citations


Cites methods from "Saliency detection via conditional ..."

  • ...GAN has been extended and applied to various unimodal application areas, for example, image translation [48], image segmentation [53], saliency detection [54], etc....

    [...]

Journal ArticleDOI
TL;DR: This paper reviews the major deep learning concepts pertinent to biomedical applications and concludes with a critical discussion, interpretation and relevant open challenges of the Omics and the BBMI.
Abstract: Deep neural networks represent, nowadays, the most effective machine learning technology in biomedical domain. In this domain, the different areas of interest concern the Omics (study of the genome—genomics—and proteins—transcriptomics, proteomics, and metabolomics), bioimaging (study of biological cell and tissue), medical imaging (study of the human organs by creating visual representations), BBMI (study of the brain and body machine interface) and public and medical health management (PmHM). This paper reviews the major deep learning concepts pertinent to such biomedical applications. Concise overviews are provided for the Omics and the BBMI. We end our analysis with a critical discussion, interpretation and relevant open challenges.

124 citations

Journal ArticleDOI
TL;DR: The proposed CSGAN uses a new objective function (loss) called Cyclic-Synthesized Loss (CS) between the synthesized image of one domain and cycled image of another domain and exhibits the promising and comparable performance over Facades dataset in terms of both qualitative and quantitative measures.
Abstract: The primary motivation of image-to-image transformation is to convert an image of one domain to another domain. The Generative Adversarial Network (GAN) is the recent trend for image-to-image transformation. The existing GAN models suffer due to the lack of utilization of proper synthesization objectives. In this paper, we propose a new Cyclic-Synthesized Generative Adversarial Networks (CSGAN) for the development of expert and intelligent systems for image-to-image transformation. The proposed CSGAN uses a new objective function based on the proposed cyclic-synthesized loss between the synthesized image of one domain and cycled image of another domain. The proposed CSGAN enforces the mapping from one domain to another domain more accurately by limiting the scope of redundant transformation with the help of the cyclic-synthesized loss. The performance of the proposed CSGAN is evaluated on four benchmark image-to-image transformation datasets, including CUHK Face dataset, WHU-IIP Thermal-Visible Face Dataset, CMP Facades dataset, and NYU-Depth Dataset. The results are computed using the widely used evaluation metrics such as MSE, SSIM, PSNR, and LPIPS. The experimental results of the proposed CSGAN approach are compared with the latest state-of-the-art approaches, such as GAN, Pix2Pix, DualGAN, CycleGAN, and PS2GAN. The proposed CSGAN technique outperforms all the methods over CUHK dataset, WHU-IIP dataset, NYU-Depth dataset, and exhibits promising and comparable performance over Facades dataset in terms of both qualitative and quantitative measures. The code is available at https://github.com/KishanKancharagunta/CSGAN.

40 citations

Journal ArticleDOI
TL;DR: A combination of an encoder–decoder generator for image semantic inpainting and a multi-layer convolutional net for image seamless fusion, which is capable of restoring image effectively and seamlessly.

32 citations

References
More filters
Journal ArticleDOI
08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

38,211 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

28,225 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Journal ArticleDOI
TL;DR: This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First , we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second , we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third , we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

11,856 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: CycleGAN as discussed by the authors learns a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Abstract: Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

11,682 citations