scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A fractal dimension based framework for night vision fusion

TL;DR: A novel fusion framework is proposed for night-vision applications such as pedestrian recognition, vehicle navigation and surveillance that is consistently superior to the conventional image fusion methods in terms of visual and quantitative evaluations.
Abstract: In this paper, a novel fusion framework is proposed for night-vision applications such as pedestrian recognition, vehicle navigation and surveillance. The underlying concept is to combine low-light visible and infrared imagery into a single output to enhance visual perception. The proposed framework is computationally simple since it is only realized in the spatial domain. The core idea is to obtain an initial fused image by averaging all the source images. The initial fused image is then enhanced by selecting the most salient features guided from the root mean square error ( RMSE ) and fractal dimension of the visual and infrared images to obtain the final fused image. Extensive experiments on different scene imaginary demonstrate that it is consistently superior to the conventional image fusion methods in terms of visual and quantitative evaluations.
Citations
More filters
Journal ArticleDOI
TL;DR: An attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction, and an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity.
Abstract: This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

112 citations

Journal ArticleDOI
TL;DR: Tang et al. as mentioned in this paper proposed a cross-domain long-range learning and Swin Transformer (SwinFusion) framework for image fusion, which achieved sufficient integration of complementary information and global interaction.
Abstract: This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

111 citations

Journal ArticleDOI
TL;DR: In this paper, a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm is proposed to accelerate the convergence rate by integrating momentum effects into its training process.
Abstract: A recommender system (RS) relying on latent factor analysis usually adopts stochastic gradient descent (SGD) as its learning algorithm. However, owing to its serial mechanism, an SGD algorithm suffers from low efficiency and scalability when handling large-scale industrial problems. Aiming at addressing this issue, this study proposes a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm, whose main idea is two-fold: a) implementing parallelization via a novel data-splitting strategy, and b) accelerating convergence rate by integrating momentum effects into its training process. With it, an MPSGD-based latent factor (MLF) model is achieved, which is capable of performing efficient and high-quality recommendations. Experimental results on four high-dimensional and sparse matrices generated by industrial RS indicate that owing to an MPSGD algorithm, an MLF model outperforms the existing state-of-the-art ones in both computational efficiency and scalability.

108 citations

Journal ArticleDOI
Yu Biao Liu, Yu Shi, Fuhao Mu, Quan Cheng, Xun Chen 
TL;DR: Wang et al. as discussed by the authors proposed a glioma segmentation-oriented multi-modal magnetic resonance (MR) image fusion method using an adversarial learning framework, which adopts a segmentation network as the discriminator to achieve more meaningful fusion results.
Abstract: Dear Editor, In recent years, multi-modal medical image fusion has received widespread attention in the image processing community. However, existing works on medical image fusion methods are mostly devoted to pursuing high performance on visual perception and objective fusion metrics, while ignoring the specific purpose in clinical applications. In this letter, we propose a glioma segmentation-oriented multi-modal magnetic resonance (MR) image fusion method using an adversarial learning framework, which adopts a segmentation network as the discriminator to achieve more meaningful fusion results from the perspective of the segmentation task. Experimental results demonstrate the advantage of the proposed method over some state-of-the-art medical image fusion methods.

12 citations

Journal ArticleDOI
TL;DR: Extensive experiments conducted on the commonly used pedestrian attribute data sets have demonstrated that the proposed CSVFL approach outperforms multiple recently reported pedestrian gender recognition methods.
Abstract: Pedestrian gender recognition plays an important role in smart city. To effectively improve the pedestrian gender recognition performance, a new method, called cascading scene and viewpoint feature learning (CSVFL), is proposed in this article. The novelty of the proposed CSVFL lies on the joint consideration of two crucial challenges in pedestrian gender recognition, namely, scene and viewpoint variation. For that, the proposed CSVFL starts with the scene transfer (ST) scheme, followed by the viewpoint adaptation (VA) scheme in a cascading manner. Specifically, the ST scheme exploits the key pedestrian segmentation network to extract the key pedestrian masks for the subsequent key pedestrian transfer generative adversarial network, with the goal of encouraging the input pedestrian image to have the similar style to the target scene while preserving the image details of the key pedestrian as much as possible. Afterward, the obtained scene-transferred pedestrian images are fed to train the deep feature learning network with the VA scheme, in which each neuron will be enabled/disabled for different viewpoints depending on whether it has contribution on the corresponding viewpoint. Extensive experiments conducted on the commonly used pedestrian attribute data sets have demonstrated that the proposed CSVFL approach outperforms multiple recently reported pedestrian gender recognition methods.

9 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a simple yet efficient algorithm for multifocus image fusion, using a multiresolution signal decomposition scheme based on a nonlinear wavelet constructed with morphological operations.
Abstract: This paper presents a simple yet efficient algorithm for multifocus image fusion, using a multiresolution signal decomposition scheme. The decomposition scheme is based on a nonlinear wavelet constructed with morphological operations. The analysis operators are constructed by morphological dilation combined with quadratic downsampling and the synthesis operators are constructed by morphological erosion combined with quadratic upsampling. A performance measure based on image gradients is used to evaluate the results. The proposed scheme has some interesting computational advantages as well.

144 citations


"A fractal dimension based framework..." refers background or methods in this paper

  • ...Image pair Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

  • ...Recently with substantial advances in image fusion technology, it is very easy to combine relevant information from a sequence of images into a single image such that the resulting image will be more informative than any of the input images [6]−[10]....

    [...]

  • ...For all these images, results of the proposed framework are obtained and are compared with the traditional maximum rule, principal component analysis (PCA) maximum rule, Laplacian pyramid (LP) [8], contrast pyramid (CP) [8], gradient pyramid (GP) [9], discrete wavelet transform (DWT) [10], morphological wavelet transform (MWT) [11], Gaussian-Bilateral filter (GBF) [14] and Karhunen-Loeve transform (KLT) [15] based methods....

    [...]

  • ...Image pair Image Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

  • ...Index Existing methods Proposed method Max PCA LP [8] CP [8] GP [9] DWT [10] MWT [11] GBF [14] KLT [15]...

    [...]

Journal ArticleDOI
01 Oct 2005-Displays
TL;DR: A combined approach for fusing night-time infrared with visible imagery is presented and the final scene has a natural day-time color appearance due to the application of a color transfer technique.
Abstract: A combined approach for fusing night-time infrared with visible imagery is presented in this paper. Night color vision is thus accomplished and the final scene has a natural day-time color appearance. Fusion is based either on non-negative matrix factorization or on a transformation that takes into consideration perceptual attributes. The final obtained color images possess a natural day-time color appearance due to the application of a color transfer technique. In this way inappropriate color mappings are avoided and the overall discrimination capabilities are enhanced. Two different data sets are employed and the experimental results establish the overall method as being efficient, compact and perceptually meaningful.

84 citations

Journal ArticleDOI
01 Dec 1995-Fractals
TL;DR: In this paper, the authors present some results dealing with the multifractal analysis of sequences of Choquet capacities, and the possibility of constructing such capacities with prescribed spectrum, and related results concerning the pointwise irregularity of a continuous function at each point are given in the frame of iterated functions systems.
Abstract: Some recent advances in the application of fractal tools for studying complex signals are presented. The first part of the paper is devoted to a brief description of the theoretical methods used. These essentially consist of generalizations of previous techniques that allow us to efficiently handle real signals. We present some results dealing with the multifractal analysis of sequences of Choquet capacities, and the possibility of constructing such capacities with prescribed spectrum. Related results concerning the pointwise irregularity of a continuous function at each point are given in the frame of iterated functions systems. Finally, some results on a particular stochastic process are sketched: the multifractional Brownian motion, which is a generalization of the classical fractional Brownian motion, where the parameter H is replaced by a function. The second part consists of the description of selected applications of current interest, in the fields of image analysis, speech synthesis and road traffic modeling. In each case we try to show how a fractal approach provides new means to solve specific problems in signal processing, sometimes with greater success than classical methods.

77 citations


"A fractal dimension based framework..." refers methods in this paper

  • ...In the first method, named the blanket method [18], the FD is obtained from a geometric view-point assuming the image is in a 3D space whereas in the second method [19], FD is calculated by a function modeled in the 2D space with clear distinction betw-...

    [...]

Proceedings ArticleDOI
16 Mar 2008
TL;DR: A simple and fast lookup-table based method to derive and apply natural daylight colors to multi-band night-time images and demonstrates the potential of these systems for applications like surveillance, navigation and target detection.
Abstract: We developed a simple and fast lookup-table based method to derive and apply natural daylight colors to multi-band night-time images. The method deploys an optimal color transformation derived from a set of samples taken from a daytime color reference image. The colors in the resulting colorized multiband night-time images closely resemble the colors in the daytime color reference image. Also, object colors remain invariant under panning operations and are independent of the scene content. Here we describe the implementation of this method in two prototype portable dual band realtime night vision systems. One system provides co-aligned visual and near-infrared bands of two image intensifiers, the other provides co-aligned images from a digital image intensifier and an uncooled longwave infrared microbolometer. The co-aligned images from both systems are further processed by a notebook computer. The color mapping is implemented as a realtime lookup table transform. The resulting colorised video streams can be displayed in realtime on head mounted displays and stored on the hard disk of the notebook computer. Preliminary field trials demonstrate the potential of these systems for applications like surveillance, navigation and target detection. Keywords: image fusion, false color, natural color, real-time fusion, lookup tables

29 citations

Journal ArticleDOI
TL;DR: A new algorithm for multispectral image fusion using a method based on principal components that may be implemented readily in hardware for use in night-vision devices as an important aid to surveillance and navigation in total darkness.
Abstract: Fusion of registered images of night scenery that are obtained from cameras tuned to different bandwidths will be a significant component of future night-vision devices. A new algorithm for such multispectral image fusion is described. The algorithm performs gray-scale image fusion using a method based on principal components. The monochrome fused image is then colored by means of a suitable pseudocoloring technique to produce the fused color output image. The approach can easily be used for any number of bandwidths. Examples illustrate the algorithm's use to fuse an intensified low-light visible image with another image obtained from a single forward-looking infrared camera. The algorithm may be implemented readily in hardware for use in night-vision devices as an important aid to surveillance and navigation in total darkness. The applicability of the technology to transportation is also discussed.

20 citations