scispace - formally typeset
Search or ask a question
Journal ArticleDOI

SSL-WAEIE: Self-Supervised Learning With Weighted Auto-Encoding and Information Exchange for Infrared and Visible Image Fusion

TL;DR: In this paper , Infrared and visible image fusion (IVIF) technologies are used to extract complementary information from source images and generate a single fused result, which is widely applied in various high-level visual tasks such as segmentation and object detection.
Abstract: Dear editor, Infrared and visible image fusion (IVIF) technologies are to extract complementary information from source images and generate a single fused result [1], which is widely applied in various high-level visual tasks such as segmentation and object detection [2].
Citations
More filters
Proceedings ArticleDOI
01 Dec 2022
TL;DR: In this article , two parallel encoders are designed as the input of the image, and the decoder divides the fused image output from the encoder, and a residual attention module is used in each branch to mine and enhance the spatial features of multilevel channels to extract image information.
Abstract: At present, there are many semantic segmentation algorithms with excellent performance for intelligent driving vehicles, but most of them are only work well on scenes with good illumination. In order to solve the problem of scene segmentation under low illumination, this paper proposes a novel semantic segmentation algorithm which combines visible and infrared images. In this algorithm, two parallel encoders are designed as the input of the image, and the decoder divides the fused image output from the encoder. The model is based on ResNet algorithm, and the residual attention module is used in each branch to mine and enhance the spatial features of multilevel channels to extract image information. Experiments are carried out on publicly available thermal infrared and visible data sets. The results show that the algorithm proposed in this paper is superior to the algorithm using only visible images in semantic segmentation of traffic environment.

1 citations

Journal ArticleDOI
Meiqi Gong, Hao Zhang, Han Xu, Xin Tian, Jiayi Ma 
TL;DR: Gong et al. as mentioned in this paper proposed a novel multipatch and multistage pansharpening method with knowledge distillation, termed PSDNet, which employs small patches in the early part to learn accurate local information, as small patches contain fewer object types.
Abstract: In this article, we propose a novel multipatch and multistage pansharpening method with knowledge distillation, termed PSDNet. Different from the existing pansharpening methods that typically input single-size patches to the network and implement pansharpening in an overall stage, we design multipatch inputs and a multistage network for more accurate and finer learning. First, multipatch inputs allow the network to learn more accurate spatial and spectral information by reducing the number of object types. We employ small patches in the early part to learn accurate local information, as small patches contain fewer object types. Then, the later part exploits large patches to fine-tune it for the overall information. Second, the multistage network is designed to reduce the difficulty of the previous single-step pansharpening and progressively generate elaborate results. In addition, instead of the traditional perceptual loss, which hardly relates to the specific task or the designed network, we introduce distillation loss to reinforce the guidance of the ground truth. Extensive experiments are conducted to demonstrate the superior performance of our proposed PSDNet to the existing state-of-the-art methods. Our code is available at https://github.com/Meiqi-Gong/PSDNet.

1 citations

Journal ArticleDOI
TL;DR: Gong et al. as discussed by the authors proposed a novel multi-patch and multi-stage pansharpening method with knowledge distillation, termed as PSDNet, which employs small patches in the early part to learn accurate local information, as small patches contain fewer object types.
Abstract: In this paper, we propose a novel multi-patch and multi-stage pansharpening method with knowledge distillation, termed as PSDNet. Different from existing pansharpening methods that typically input single-size patches to the network and implement pansharpening in an overall stage, we design multi-patch inputs and a multi-stage network for more accurate and finer learning. First, multi-patch inputs allow the network to learn more accurate spatial and spectral information by reducing the number of object types. We employ small patches in the early part to learn accurate local information, as small patches contain fewer object types. Then, the later part exploits large patches to fine-tune it for the overall information. Second, the multi-stage network is designed to reduce the difficulty of the previous single-step pansharpening and progressively generate elaborate results. In addition, instead of the traditional perceptual loss, which hardly relates to the specific task or the designed network, we introduce distillation loss to reinforce the guidance of the ground truth. Extensive experiments are conducted to demonstrate the superior performance of our proposed PSDNet to existing state-of-the-art methods. Our code is available at https://github.com/Meiqi-Gong/PSDNet.
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a multistage Dense-Parallel Attention fusion network (DPAFNet), which can focus on the intrinsic features of MS images and PAN images while exploring the correlation between the source images.
Abstract: Pansharpening is the technology to fuse a low spatial resolution MS image with its associated high spatial full resolution PAN image. However, primary methods have the insufficiency of the feature expression and do not explore both the intrinsic features of the images and correlation between images, which may lead to limited integration of valuable information in the pansharpening results. To this end, we propose a novel multistage Dense-Parallel attention fusion network (DPAFNet). The proposed parallel attention residual dense block (PARDB) module can focus on the intrinsic features of MS images and PAN images while exploring the correlation between the source images. To fuse more complementary information as much as possible, the features extracted from each PARDB are fused at multistage levels, which allows the network to better focus on and exploit different information. Additionally, we propose a new loss, where it calculates the L2-norm between the pansharpening results and PAN images to constrain the spatial structures. Experiments were conducted on simulated and real datasets and the evaluation results verified the superiority of the DPAFNet.
Proceedings ArticleDOI
25 May 2023
TL;DR: Zhang et al. as discussed by the authors proposed a self-supervised structure consisting of three primary operations including three layers of random transformation, a main neural network layer and prediction layer to classify the image data set.
Abstract: Image classification is an essential method to dispose the practical issues including the medical image classification, detection of objectives and process downstream tasks. However, current researchers has utilized the neural network or machine learning model to classify the images based on the image characteristic, which is extremely relies the obtained image data and the training model is a black box. Inspired by the supervised learning algorithm, we proposed a novel self-supervised structure to classify the image data-set. The model structure is consisted of three primary operations including three layers of random transformation, a main neural network layer and prediction layer. In this paper, we specifically demonstrate each components and test our model on a written number data-set. From our extensive experimental results, our proposed mechanism can identify the correct labels in image data-set with acceptable accuracy and reasonable computation cost.
References
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Journal ArticleDOI
TL;DR: Experimental results clearly indicate that this metric reflects the quality of visual information obtained from the fusion of input images and can be used to compare the performance of different image fusion algorithms.
Abstract: A measure for objectively assessing the pixel level fusion performance is defined. The proposed metric reflects the quality of visual information obtained from the fusion of input images and can be used to compare the performance of different image fusion algorithms. Experimental results clearly indicate that this metric is perceptually meaningful.

1,446 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel information fidelity criterion that is based on natural scene statistics and derives a novel QA algorithm that provides clear advantages over the traditional approaches and outperforms current methods in testing.
Abstract: Measurement of visual quality is of fundamental importance to numerous image and video processing applications. The goal of quality assessment (QA) research is to design algorithms that can automatically assess the quality of images or videos in a perceptually consistent manner. Traditionally, image QA algorithms interpret image quality as fidelity or similarity with a "reference" or "perfect" image in some perceptual space. Such "full-reference" QA methods attempt to achieve consistency in quality prediction by modeling salient physiological and psychovisual features of the human visual system (HVS), or by arbitrary signal fidelity criteria. In this paper, we approach the problem of image QA by proposing a novel information fidelity criterion that is based on natural scene statistics. QA systems are invariably involved with judging the visual quality of images and videos that are meant for "human consumption". Researchers have developed sophisticated models to capture the statistics of natural signals, that is, pictures and videos of the visual environment. Using these statistical models in an information-theoretic setting, we derive a novel QA algorithm that provides clear advantages over the traditional approaches. In particular, it is parameterless and outperforms current methods in our testing. We validate the performance of our algorithm with an extensive subjective study involving 779 images. We also show that, although our approach distinctly departs from traditional HVS-based methods, it is functionally similar to them under certain conditions, yet it outperforms them due to improved modeling. The code and the data from the subjective study are available at [1].

1,334 citations

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed method can obtain state-of-the-art performance for fusion of multispectral, multifocus, multimodal, and multiexposure images.
Abstract: A fast and effective image fusion method is proposed for creating a highly informative fused image through merging multiple images. The proposed method is based on a two-scale decomposition of an image into a base layer containing large scale variations in intensity, and a detail layer capturing small scale details. A novel guided filtering-based weighted average technique is proposed to make full use of spatial consistency for fusion of the base and detail layers. Experimental results demonstrate that the proposed method can obtain state-of-the-art performance for fusion of multispectral, multifocus, multimodal, and multiexposure images.

1,300 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel method to fuse two types of information using a generative adversarial network, termed as FusionGAN, which establishes an adversarial game between a generator and a discriminator, where the generator aims to generate a fused image with major infrared intensities together with additional visible gradients.
Abstract: Infrared images can distinguish targets from their backgrounds on the basis of difference in thermal radiation, which works well at all day/night time and under all weather conditions. By contrast, visible images can provide texture details with high spatial resolution and definition in a manner consistent with the human visual system. This paper proposes a novel method to fuse these two types of information using a generative adversarial network, termed as FusionGAN. Our method establishes an adversarial game between a generator and a discriminator, where the generator aims to generate a fused image with major infrared intensities together with additional visible gradients, and the discriminator aims to force the fused image to have more details existing in visible images. This enables that the final fused image simultaneously keeps the thermal radiation in an infrared image and the textures in a visible image. In addition, our FusionGAN is an end-to-end model, avoiding manually designing complicated activity level measurements and fusion rules as in traditional methods. Experiments on public datasets demonstrate the superiority of our strategy over state-of-the-arts, where our results look like sharpened infrared images with clear highlighted targets and abundant details. Moreover, we also generalize our FusionGAN to fuse images with different resolutions, say a low-resolution infrared image and a high-resolution visible image. Extensive results demonstrate that our strategy can generate clear and clean fused images which do not suffer from noise caused by upsampling of infrared information.

853 citations