scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Image fusion meets deep learning: A survey and perspective

01 Dec 2021-Information Fusion (Elsevier)-Vol. 76, pp 323-336
TL;DR: In this paper, a comprehensive review and analysis of latest deep learning methods in different image fusion scenarios is provided, and the evaluation for some representative methods in specific fusion tasks are performed qualitatively and quantitatively.
About: This article is published in Information Fusion.The article was published on 2021-12-01. It has received 153 citations till now. The article focuses on the topics: Image fusion & Feature extraction.
Citations
More filters
Journal ArticleDOI
TL;DR: Tang et al. as discussed by the authors proposed a semantic-aware real-time image fusion network (SeAFusion), which cascaded the image fusion module and semantic segmentation module and leveraged the semantic loss to guide high-level semantic information to flow back to the fusion module.

137 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a decoupling network-based IVIF method (DNFusion), which utilizes the decoupled maps to design additional constraints on the network and force the network to retain the saliency information of the source image effectively.
Abstract: In general, the goal of existing infrared and visible image fusion (IVIF) methods is to make the fused image contain both the high-contrast regions of the infrared image and the texture details of the visible image. However, this definition would lead the fusion image losing information from the visible image in high-contrast areas. For this problem, this paper proposed a decoupling network-based IVIF method (DNFusion), which utilizes the decoupled maps to design additional constraints on the network to force the network to retain the saliency information of the source image effectively. The current definition of image fusion is satisfied while effectively maintaining the saliency objective of the source images. Specifically, the feature interaction module inside effectively facilitates the information exchange within the encoder and improves the utilization of complementary information. Also, a hybrid loss function constructed with weight fidelity loss, gradient loss, and decoupling loss which ensures the fusion image to be generated to effectively preserves the source image’s texture details and luminance information. The qualitative and quantitative comparison of extensive experiments demonstrates that our model can generate a fused image containing saliency objects and clear details of the source images, and the method we proposed has a better performance than other state-of-the-art methods.

121 citations

Journal ArticleDOI
TL;DR: Tang et al. as mentioned in this paper proposed a cross-domain long-range learning and Swin Transformer (SwinFusion) framework for image fusion, which achieved sufficient integration of complementary information and global interaction.
Abstract: This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

111 citations

Journal ArticleDOI
TL;DR: Tang et al. as mentioned in this paper proposed a progressive IR/VIS fusion method based on illumination aware, which adaptively maintains the intensity distribution of salient targets and preserves texture information in the background.

78 citations

Journal ArticleDOI
TL;DR: In this article , a compendious review of different medical imaging modalities and evaluation of related multimodal databases along with the statistical results is provided, and the quality assessments fusion metrics are also encapsulated in this article.

56 citations

References
More filters
Book ChapterDOI
06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Abstract: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

30,462 citations

Journal ArticleDOI
TL;DR: The guided filter is a novel explicit image filter derived from a local linear model that can be used as an edge-preserving smoothing operator like the popular bilateral filter, but it has better behaviors near edges.
Abstract: In this paper, we propose a novel explicit image filter called guided filter. Derived from a local linear model, the guided filter computes the filtering output by considering the content of a guidance image, which can be the input image itself or another different image. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter [1], but it has better behaviors near edges. The guided filter is also a more generic concept beyond smoothing: It can transfer the structures of the guidance image to the filtering output, enabling new filtering applications like dehazing and guided feathering. Moreover, the guided filter naturally has a fast and nonapproximate linear time algorithm, regardless of the kernel size and the intensity range. Currently, it is one of the fastest edge-preserving filters. Experiments show that the guided filter is both effective and efficient in a great variety of computer vision and computer graphics applications, including edge-aware smoothing, detail enhancement, HDR compression, image matting/feathering, dehazing, joint upsampling, etc.

4,730 citations

Journal ArticleDOI
TL;DR: This article develops methods for determining visually appealing motion transitions using linear blending, and assess the importance of these techniques by determining the minimum sensitivity of viewers to transition durations, the just noticeable difference, for both center-aligned and start-end specifications.
Abstract: This article develops methods for determining visually appealing motion transitions using linear blending. Motion transitions are segues between two sequences of animation, and are important components for generating compelling animation streams in virtual environments and computer games. Methods involving linear blending are studied because of their efficiency, computational speed, and widespread use. Two methods of transition specification are detailed, center-aligned and start-end transitions. First, we compute a set of optimal weights for an underlying cost metric used to determine the transition points. We then evaluate the optimally weighted cost metric for generalizability, appeal, and robustness through a cross-validation and user study. Next, we develop methods for computing visually appealing blend lengths for two broad categories of motion. We empirically evaluate these results through user studies. Finally, we assess the importance of these techniques by determining the minimum sensitivity of viewers to transition durations, the just noticeable difference, for both center-aligned and start-end specifications.

1,626 citations

Journal ArticleDOI
TL;DR: Positron emission tomography is a highly sensitive non-invasive technology that is ideally suited for pre-clinical and clinical imaging of cancer biology, in contrast to anatomical approaches.
Abstract: The imaging of specific molecular targets that are associated with cancer should allow earlier diagnosis and better management of oncology patients. Positron emission tomography (PET) is a highly sensitive non-invasive technology that is ideally suited for pre-clinical and clinical imaging of cancer biology, in contrast to anatomical approaches. By using radiolabelled tracers, which are injected in non-pharmacological doses, three-dimensional images can be reconstructed by a computer to show the concentration and location(s) of the tracer of interest. PET should become increasingly important in cancer imaging in the next decade.

1,570 citations

Journal ArticleDOI
TL;DR: A technique which simultaneously reduces the data dimensionality, suppresses undesired or interfering spectral signatures, and detects the presence of a spectral signature of interest is described.
Abstract: Most applications of hyperspectral imagery require processing techniques which achieve two fundamental goals: 1) detect and classify the constituent materials for each pixel in the scene; 2) reduce the data volume/dimensionality, without loss of critical information, so that it can be processed efficiently and assimilated by a human analyst. The authors describe a technique which simultaneously reduces the data dimensionality, suppresses undesired or interfering spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel vector onto a subspace which is orthogonal to the undesired signatures. This operation is an optimal interference suppression process in the least squares sense. Once the interfering signatures have been nulled, projecting the residual onto the signature of interest maximizes the signal-to-noise ratio and results in a single component image that represents a classification for the signature of interest. The orthogonal subspace projection (OSP) operator can be extended to k-signatures of interest, thus reducing the dimensionality of k and classifying the hyperspectral image simultaneously. The approach is applicable to both spectrally pure as well as mixed pixels. >

1,570 citations