scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Multi-focus image fusion with a deep convolutional neural network

01 Jul 2017-Information Fusion (Elsevier Science Publishers B. V.PUB568Amsterdam, The Netherlands, The Netherlands)-Vol. 36, pp 191-207
TL;DR: A new multi-focus image fusion method is primarily proposed, aiming to learn a direct mapping between source images and focus map, using a deep convolutional neural network trained by high-quality image patches and their blurred versions to encode the mapping.
About: This article is published in Information Fusion.The article was published on 2017-07-01. It has received 826 citations till now. The article focuses on the topics: Image fusion & Convolutional neural network.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a novel method to fuse two types of information using a generative adversarial network, termed as FusionGAN, which establishes an adversarial game between a generator and a discriminator, where the generator aims to generate a fused image with major infrared intensities together with additional visible gradients.

853 citations


Cites methods from "Multi-focus image fusion with a dee..."

  • ...[26] trained a deep convolutional neural network (CNN) to jointly generate activity level measurement and fusion rule, and they also applied their model to fuse infrared and visible images [27]....

    [...]

Journal ArticleDOI
Jiayi Ma1, Yong Ma1, Chang Li1
TL;DR: This survey comprehensively survey the existing methods and applications for the fusion of infrared and visible images, which can serve as a reference for researchers inrared and visible image fusion and related fields.

849 citations

Journal ArticleDOI
Hui Li1, Xiaojun Wu1
TL;DR: A novel deep learning architecture for infrared and visible images fusion problems is presented, where the encoding network is combined with convolutional layers, a fusion layer, and dense block in which the output of each layer is connected to every other layer.
Abstract: In this paper, we present a novel deep learning architecture for infrared and visible images fusion problems. In contrast to conventional convolutional networks, our encoding network is combined with convolutional layers, a fusion layer, and dense block in which the output of each layer is connected to every other layer. We attempt to use this architecture to get more useful features from source images in the encoding process, and two fusion layers (fusion strategies) are designed to fuse these features. Finally, the fused image is reconstructed by a decoder. Compared with existing fusion methods, the proposed fusion method achieves the state-of-the-art performance in objective and subjective assessment.

703 citations


Cites methods from "Multi-focus image fusion with a dee..."

  • ...network(CNN) is used to obtain the image features and reconstruct the fused image [12], [13]....

    [...]

  • ...model(JSR) [9], gradient transfer and total variation minimization(GTF) [23], the JSR model with saliency detection fusion method(JSRSD) [24], deep convolutional neural network-based method(CNN) [13] and the DeepFuse method(DeepFuse) [15]....

    [...]

  • ...[13] also presented a CNN-based fusion method for multi-focus image fusion task....

    [...]

Journal ArticleDOI
TL;DR: The experimental results show that the proposed model demonstrates better generalization ability than the existing image fusion models for fusing various types of images, such as multi-focus, infrared-visual, multi-modal medical and multi-exposure images.

524 citations

Journal ArticleDOI
TL;DR: This survey paper presents a systematic review of the DL-based pixel-level image fusion literature, summarized the main difficulties that exist in conventional image fusion research and discussed the advantages that DL can offer to address each of these problems.

493 citations

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

28,225 citations

Proceedings Article
21 Jun 2010
TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors.

14,799 citations