scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A fractal dimension based framework for night vision fusion

TL;DR: A novel fusion framework is proposed for night-vision applications such as pedestrian recognition, vehicle navigation and surveillance that is consistently superior to the conventional image fusion methods in terms of visual and quantitative evaluations.
Abstract: In this paper, a novel fusion framework is proposed for night-vision applications such as pedestrian recognition, vehicle navigation and surveillance. The underlying concept is to combine low-light visible and infrared imagery into a single output to enhance visual perception. The proposed framework is computationally simple since it is only realized in the spatial domain. The core idea is to obtain an initial fused image by averaging all the source images. The initial fused image is then enhanced by selecting the most salient features guided from the root mean square error ( RMSE ) and fractal dimension of the visual and infrared images to obtain the final fused image. Extensive experiments on different scene imaginary demonstrate that it is consistently superior to the conventional image fusion methods in terms of visual and quantitative evaluations.
Citations
More filters
Journal ArticleDOI
TL;DR: An attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction, and an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity.
Abstract: This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

112 citations

Journal ArticleDOI
TL;DR: Tang et al. as mentioned in this paper proposed a cross-domain long-range learning and Swin Transformer (SwinFusion) framework for image fusion, which achieved sufficient integration of complementary information and global interaction.
Abstract: This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

111 citations

Journal ArticleDOI
TL;DR: In this paper, a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm is proposed to accelerate the convergence rate by integrating momentum effects into its training process.
Abstract: A recommender system (RS) relying on latent factor analysis usually adopts stochastic gradient descent (SGD) as its learning algorithm. However, owing to its serial mechanism, an SGD algorithm suffers from low efficiency and scalability when handling large-scale industrial problems. Aiming at addressing this issue, this study proposes a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm, whose main idea is two-fold: a) implementing parallelization via a novel data-splitting strategy, and b) accelerating convergence rate by integrating momentum effects into its training process. With it, an MPSGD-based latent factor (MLF) model is achieved, which is capable of performing efficient and high-quality recommendations. Experimental results on four high-dimensional and sparse matrices generated by industrial RS indicate that owing to an MPSGD algorithm, an MLF model outperforms the existing state-of-the-art ones in both computational efficiency and scalability.

108 citations

Journal ArticleDOI
Yu Biao Liu, Yu Shi, Fuhao Mu, Quan Cheng, Xun Chen 
TL;DR: Wang et al. as discussed by the authors proposed a glioma segmentation-oriented multi-modal magnetic resonance (MR) image fusion method using an adversarial learning framework, which adopts a segmentation network as the discriminator to achieve more meaningful fusion results.
Abstract: Dear Editor, In recent years, multi-modal medical image fusion has received widespread attention in the image processing community. However, existing works on medical image fusion methods are mostly devoted to pursuing high performance on visual perception and objective fusion metrics, while ignoring the specific purpose in clinical applications. In this letter, we propose a glioma segmentation-oriented multi-modal magnetic resonance (MR) image fusion method using an adversarial learning framework, which adopts a segmentation network as the discriminator to achieve more meaningful fusion results from the perspective of the segmentation task. Experimental results demonstrate the advantage of the proposed method over some state-of-the-art medical image fusion methods.

12 citations

Journal ArticleDOI
TL;DR: Extensive experiments conducted on the commonly used pedestrian attribute data sets have demonstrated that the proposed CSVFL approach outperforms multiple recently reported pedestrian gender recognition methods.
Abstract: Pedestrian gender recognition plays an important role in smart city. To effectively improve the pedestrian gender recognition performance, a new method, called cascading scene and viewpoint feature learning (CSVFL), is proposed in this article. The novelty of the proposed CSVFL lies on the joint consideration of two crucial challenges in pedestrian gender recognition, namely, scene and viewpoint variation. For that, the proposed CSVFL starts with the scene transfer (ST) scheme, followed by the viewpoint adaptation (VA) scheme in a cascading manner. Specifically, the ST scheme exploits the key pedestrian segmentation network to extract the key pedestrian masks for the subsequent key pedestrian transfer generative adversarial network, with the goal of encouraging the input pedestrian image to have the similar style to the target scene while preserving the image details of the key pedestrian as much as possible. Afterward, the obtained scene-transferred pedestrian images are fed to train the deep feature learning network with the VA scheme, in which each neuron will be enabled/disabled for different viewpoints depending on whether it has contribution on the corresponding viewpoint. Extensive experiments conducted on the commonly used pedestrian attribute data sets have demonstrated that the proposed CSVFL approach outperforms multiple recently reported pedestrian gender recognition methods.

9 citations

References
More filters
Journal ArticleDOI
TL;DR: A hierarchical image merging scheme based on a multiresolution contrast decomposition (the ratio of a low-pass pyramid) is introduced, which shows that the fused images present a more detailed representation of the depicted scene.
Abstract: Integration of images from different sensing modalities can produce information that cannot be obtained by viewing the sensor outputs separately and consecutively. This paper introduces a hierarchical image merging scheme based on a multiresolution contrast decomposition (the ratio of a low-pass pyramid). The composite images produced by this scheme preserve those details from the input images that are most relevant to visual perception. The method is tested by merging parallel registered thermal and visual images. The results show that the fused images present a more detailed representation of the depicted scene. Detection, recognition, and search tasks may therefore benefit from this new image representation.

503 citations

Journal ArticleDOI
TL;DR: Analytical proof that classic mutual information cannot be considered a measure for image fusion performance is provided.
Abstract: The unsuitability of using classic mutual information measure as a performance measure for image fusion is discussed Analytical proof that classic mutual information cannot be considered a measure for image fusion performance is provided

372 citations


"A fractal dimension based framework..." refers background in this paper

  • ...Based on the above definition, the quality of fused image can be expressed as [20], [21]...

    [...]

Journal ArticleDOI
TL;DR: The proposed hybrid-MSD transform enables to better capture important multi-scale IR spectral features and separate fine-scale texture details from large-scale edge features and proves the superiority of the proposed method compared with conventional MSD-based fusion methods.
Abstract: Multi-scale texture details and edge features are obtained by the hybrid-MSD.IR spectral features are injected into the visible image by an asymmetrical scheme.Different combination algorithms are used adaptively according to different scales.Important perceptual cues from the visible image are well preserved or enhanced. In order to achieve perceptually better fusion of infrared (IR) and visible images than conventional pixel-level fusion algorithms based on multi-scale decomposition (MSD), we present a novel multi-scale fusion method based on a hybrid multi-scale decomposition (hybrid-MSD). The proposed hybrid-MSD transform decomposes the source images into multi-scale texture details and edge features by jointly using multi-scale Gaussian and bilateral filters. This transform enables to better capture important multi-scale IR spectral features and separate fine-scale texture details from large-scale edge features. As a result, we can use it to achieve better fusion result for human visual perception than those obtained from conventional multi-scale fusion methods, by injecting the multi-scale IR spectral features into the visible image, while preserving (or properly enhancing) important perceptual cues of the background scenery and details from the visible image. In the decomposed information fusion process, three different combination algorithms are adaptively used in accordance to different scale levels (i.e., the small-scale levels, the large-scale levels and the base level). A regularization parameter is introduced to control the relative amount of IR spectral information injected into the visible image in a soft manner, which can be adjusted further depending on user preferences. Moreover, by testing different settings of the parameter, we demonstrate that injecting a moderate amount of IR spectral information with this parameter can actually make the fused images visually better for some infrared and visible source images. Experimental results of both objective assessment and subjective evaluation by human observers also prove the superiority of the proposed method compared with conventional MSD-based fusion methods.

275 citations


"A fractal dimension based framework..." refers methods in this paper

  • ...Index Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

  • ...For all these images, results of the proposed framework are obtained and are compared with the traditional maximum rule, principal component analysis (PCA) maximum rule, Laplacian pyramid (LP) [8], contrast pyramid (CP) [8], gradient pyramid (GP) [9], discrete wavelet transform (DWT) [10], morphological wavelet transform (MWT) [11], Gaussian-Bilateral filter (GBF) [14] and Karhunen-Loeve transform (KLT) [15] based methods....

    [...]

  • ...Image pair Image Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

  • ...Image pair Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a new edge preserving image fusion method for infrared and visible sensor images that outperforms the existing methods and is compared with the traditional and recent image fusion algorithms.
Abstract: Image fusion is a process of generating a more informative image from a set of source images. Major applications of image fusion are in navigation and military. Here, infrared and visible sensors are used to capture complementary images of the targeted scene. The complementary information of these source images has to be integrated into a single image using some fusion algorithms. The aim of any fusion method is to transfer maximum information from the source images to the fused image with a minimum information loss. It has to minimize the artifacts in the fused image. In this paper, we propose a new edge preserving image fusion method for infrared and visible sensor images. Anisotropic diffusion is used to decompose the source images into approximation and detail layers. Final detail and approximation layers are calculated with the help of Karhunen-Loeve transform and weighted linear superposition, respectively. A fused image is generated from the linear combination of final detail and approximation layers. Performance of the proposed algorithm is assessed with the help of petrovic metrics. The results of the proposed algorithm are compared with the traditional and recent image fusion algorithms. Results reveal that the proposed method outperforms the existing methods.

208 citations


"A fractal dimension based framework..." refers background or methods in this paper

  • ...Image pair Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

  • ...Furthermore, the results of KLT based fusion are free from artifacts, but are not able to give complete information about the scene due to the reduced contrast....

    [...]

  • ...7511102 vision system and an image fusion technique results in the images which have a natural appearance and a high degree of similarity with the related natural scene, in addition to the improving a human’s observation ability [11]−[15]....

    [...]

  • ...For all these images, results of the proposed framework are obtained and are compared with the traditional maximum rule, principal component analysis (PCA) maximum rule, Laplacian pyramid (LP) [8], contrast pyramid (CP) [8], gradient pyramid (GP) [9], discrete wavelet transform (DWT) [10], morphological wavelet transform (MWT) [11], Gaussian-Bilateral filter (GBF) [14] and Karhunen-Loeve transform (KLT) [15] based methods....

    [...]

  • ...Image pair Image Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

Journal ArticleDOI
TL;DR: The method requires a false colour RGB image that is produced by mapping three individual bands of a multiband nightvision system to the respective channels of an RGB image, and the inverse transformation to RGB space yields a nightvision image with a day-time colour appearance.
Abstract: We present a method to give (fused) multiband night-time imagery a natural day-time colour appearance. For input, the method requires a false colour RGB image that is produced by mapping three individual bands (or the .rst three principal components) of a multiband nightvision system to the respective channels of an RGB image. The false colour RGB nightvision image is transformed into a perceptually decorrelated colour space. In this colour space the .rst order statistics of a natural colour image (target scene) are transferred to the multiband nightvision image (source scene). To obtain a natural colour representation of the multiband night-time imagery, the compositions of the source and target scenes should resemble each other to some degree. The inverse transformation to RGB space yields a nightvision image with a day-time colour appearance. The luminance contrast of the resulting colour image can be enhanced by replacing its luminance component by a grayscale fused representation of the three input bands.

159 citations


"A fractal dimension based framework..." refers background or methods in this paper

  • ...Image pair Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

  • ...7511102 vision system and an image fusion technique results in the images which have a natural appearance and a high degree of similarity with the related natural scene, in addition to the improving a human’s observation ability [11]−[15]....

    [...]

  • ...For all these images, results of the proposed framework are obtained and are compared with the traditional maximum rule, principal component analysis (PCA) maximum rule, Laplacian pyramid (LP) [8], contrast pyramid (CP) [8], gradient pyramid (GP) [9], discrete wavelet transform (DWT) [10], morphological wavelet transform (MWT) [11], Gaussian-Bilateral filter (GBF) [14] and Karhunen-Loeve transform (KLT) [15] based methods....

    [...]

  • ...Image pair Image Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

  • ...Index Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]