scispace - formally typeset
Search or ask a question

Showing papers on "Channel (digital image) published in 2020"


Journal ArticleDOI
Xu Qin1, Zhilin Wang2, Yuanchao Bai1, Xiaodong Xie1, Huizhu Jia1 
03 Apr 2020
TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image, which consists of three key components: Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels.
Abstract: In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components:1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers.The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset. Code has been made available at GitHub.

382 citations


Book ChapterDOI
23 Aug 2020
TL;DR: MIRNet as mentioned in this paper proposes a multi-scale residual block containing several key elements: (a) parallel multi-resolution convolution streams for extracting mult-scale features, (b) information exchange across the multiresolution streams, (c) spatial and channel attention mechanisms for capturing contextual information, and (d) attention-based multiscale feature aggregation.
Abstract: With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography and medical imaging. Recently, convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. Existing CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatially precise but contextually less robust results are achieved, while in the latter case, semantically reliable but spatially less accurate outputs are generated. In this paper, we present an architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network and receiving strong contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing several key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) spatial and channel attention mechanisms for capturing contextual information, and (d) attention based multi-scale feature aggregation. In a nutshell, our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on five real image benchmark datasets demonstrate that our method, named as MIRNet, achieves state-of-the-art results for image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNet.

357 citations


Journal ArticleDOI
TL;DR: A novel method for infrared and visible image fusion where the nest connection-based network and spatial/channel attention models are developed that describe the importance of each spatial position and of each channel with deep features is proposed.
Abstract: In this article, we propose a novel method for infrared and visible image fusion where we develop nest connection-based network and spatial/channel attention models. The nest connection-based network can preserve significant amounts of information from input data in a multiscale perspective. The approach comprises three key elements: encoder, fusion strategy, and decoder, respectively. In our proposed fusion strategy, spatial attention models and channel attention models are developed that describe the importance of each spatial position and of each channel with deep features. First, the source images are fed into the encoder to extract multiscale deep features. The novel fusion strategy is then developed to fuse these features for each scale. Finally, the fused image is reconstructed by the nest connection-based decoder. Experiments are performed on publicly available data sets. These exhibit that our proposed approach has better fusion performance than other state-of-the-art methods. This claim is justified through both subjective and objective evaluations. The code of our fusion method is available at https://github.com/hli1221/imagefusion-nestfuse .

235 citations


Journal ArticleDOI
TL;DR: A channel-wise and spatial feature modulation (CSFM) network in which a series of feature modulation memory (FMM) modules are cascaded with a densely connected structure to transform shallow features to high informative features and maintain long-term information for image super-resolution.
Abstract: The performance of single image super-resolution has achieved significant improvement by utilizing deep convolutional neural networks (CNNs). The features in deep CNN contain different types of information which make different contributions to image reconstruction. However, the most CNN-based models lack discriminative ability for different types of information and deal with them equally, which results in the representational capacity of the models being limited. On the other hand, as the depth of neural network grows, the long-term information coming from preceding layers is easy to be weaken or lost at later layers, which is adverse to super-resolving image. To capture more informative features and maintain long-term information for image super-resolution, we propose a channel-wise and spatial feature modulation (CSFM) network in which a series of feature modulation memory (FMM) modules are cascaded with a densely connected structure to transform shallow features to high informative features. In each FMM module, we construct a set of channel-wise and spatial attention residual (CSAR) blocks and stack them in a chain structure to dynamically modulate the multi-level features in global and local manners. This feature modulation strategy enables the valuable information to be enhanced and the redundant information to be suppressed. Meanwhile, for long-term information persistence, a gated fusion (GF) node is attached at the end of the FMM module to adaptively fuse hierarchical features and distill more effective information via the dense skip connections and the gating mechanism. The extensive quantitative and qualitative evaluations on benchmark datasets illustrate the superiority of our proposed method over the state-of-the-art methods.

228 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Zhang et al. as mentioned in this paper proposed a domain adaptation paradigm consisting of an image translation module and two image dehazing modules, which can bridge the gap between the synthetic and real domains by translating images from one domain to another.
Abstract: Image dehazing using learning-based methods has achieved state-of-the-art performance in recent years. However, most existing methods train a dehazing model on synthetic hazy images, which are less able to generalize well to real hazy images due to domain shift. To address this issue, we propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules. Specifically, we first apply a bidirectional translation network to bridge the gap between the synthetic and real domains by translating images from one domain to another. And then, we use images before and after translation to train the proposed two image dehazing networks with a consistency constraint. In this phase, we incorporate the real hazy image into the dehazing training via exploiting the properties of the clear image (e.g., dark channel prior and image gradient smoothing) to further improve the domain adaptivity. By training image translation and dehazing network in an end-to-end manner, we can obtain better effects of both image translation and dehazing. Experimental results on both synthetic and real-world images demonstrate that our model performs favorably against the state-of-the-art dehazing algorithms.

176 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Mao et al. as discussed by the authors employed multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image.
Abstract: Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by back-propagation or by learning an additional encoder. However, the reconstructions from both of the methods are far from ideal. In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. In particular, we employ multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image. Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors. The resulting high-fidelity image reconstruction enables the trained GAN models as prior to many real-world applications, such as image colorization, super-resolution, image inpainting, and semantic manipulation. We further analyze the properties of the layer-wise representation learned by GAN models and shed light on what knowledge each layer is capable of representing.

162 citations


Book ChapterDOI
Hongyu Liu1, Bin Jiang1, Yibing Song2, Wei Huang1, Chao Yang1 
23 Aug 2020
TL;DR: Li et al. as mentioned in this paper proposed a mutual encoder-decoder CNN for joint recovery of both structures and textures. But, the CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole.
Abstract: Deep encoder-decoder based CNNs have advanced image inpainting methods for hole filling. While existing methods recover structures and textures step-by-step in the hole regions, they typically use two encoder-decoders for separate recovery. The CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole. The insufficient utilization of these encoder features hampers the performance of recovering both structures and textures. In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively. The deep layer features are sent to a structure branch, while the shallow layer features are sent to a texture branch. In each branch, we fill holes in multiple scales of the CNN features. The filled CNN features from both branches are concatenated and then equalized. During feature equalization, we reweigh channel attentions first and propose a bilateral propagation activation function to enable spatial equalization. To this end, the filled CNN features of structure and texture mutually benefit each other to represent image content at all feature levels. We then use the equalized feature to supplement decoder features for output image generation through skip connections. Experiments on benchmark datasets show that the proposed method is effective to recover structures and textures and performs favorably against state-of-the-art approaches.

153 citations


Posted Content
TL;DR: This work proposes a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules, and applies a bidirectional translation network to bridge the gap between the synthetic and real domains by translating images from one domain to another.
Abstract: Image dehazing using learning-based methods has achieved state-of-the-art performance in recent years. However, most existing methods train a dehazing model on synthetic hazy images, which are less able to generalize well to real hazy images due to domain shift. To address this issue, we propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules. Specifically, we first apply a bidirectional translation network to bridge the gap between the synthetic and real domains by translating images from one domain to another. And then, we use images before and after translation to train the proposed two image dehazing networks with a consistency constraint. In this phase, we incorporate the real hazy image into the dehazing training via exploiting the properties of the clear image (e.g., dark channel prior and image gradient smoothing) to further improve the domain adaptivity. By training image translation and dehazing network in an end-to-end manner, we can obtain better effects of both image translation and dehazing. Experimental results on both synthetic and real-world images demonstrate that our model performs favorably against the state-of-the-art dehazing algorithms.

129 citations


Journal ArticleDOI
Xiaoqin Zhang1, Tao Wang1, Jinxin Wang1, Guiying Tang1, Li Zhao1 
TL;DR: Experimental results demonstrate that the proposed Pyramid Channel-based Feature Attention Network (PCFAN) outperforms existing state-of-the-art algorithms on standard benchmark datasets in terms of accuracy, efficiency, and visual effect.

119 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a completely unsupervised method of training via minimization of the well-known, Dark Channel Prior (DCP) energy function, which can be regarded as a fast approximation of DCP.
Abstract: Single image dehazing is a critical stage in many modern-day autonomous vision applications. Early prior-based methods often involved a time-consuming minimization of a hand-crafted energy function. Recent learning-based approaches utilize the representational power of deep neural networks (DNNs) to learn the underlying transformation between hazy and clear images. Due to inherent limitations in collecting matching clear and hazy images, these methods resort to training on synthetic data, constructed from indoor images and corresponding depth information. This may result in a possible domain shift when treating outdoor scenes. We propose a completely unsupervised method of training via minimization of the well-known, Dark Channel Prior (DCP) energy function. Instead of feeding the network with synthetic data, we solely use real-world outdoor images and tune the network’s parameters by directly minimizing the DCP. Although our “Deep DCP” technique can be regarded as a fast approximator of DCP, it actually improves its results significantly. This suggests an additional regularization obtained via the network and learning process. Experiments show that our method performs on par with large-scale supervised methods.

113 citations


Journal ArticleDOI
TL;DR: Extensive evaluations for color image denoising and inpainting tasks verify that LRQA achieves better performance over several state-of-the-art sparse representation and LRMA-based methods in terms of both quantitative metrics and visual quality.
Abstract: Low-rank matrix approximation (LRMA)-based methods have made a great success for grayscale image processing. When handling color images, LRMA either restores each color channel independently using the monochromatic model or processes the concatenation of three color channels using the concatenation model. However, these two schemes may not make full use of the high correlation among RGB channels. To address this issue, we propose a novel low-rank quaternion approximation (LRQA) model. It contains two major components: first, instead of modeling a color image pixel as a scalar in conventional sparse representation and LRMA-based methods, the color image is encoded as a pure quaternion matrix, such that the cross-channel correlation of color channels can be well exploited; second, LRQA imposes the low-rank constraint on the constructed quaternion matrix. To better estimate the singular values of the underlying low-rank quaternion matrix from its noisy observation, a general model for LRQA is proposed based on several nonconvex functions. Extensive evaluations for color image denoising and inpainting tasks verify that LRQA achieves better performance over several state-of-the-art sparse representation and LRMA-based methods in terms of both quantitative metrics and visual quality.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: A generally applicable transformation unit for visual recognition with deep convolutional neural networks that explicitly models channel relationships with explainable control variables and is applicable to operator-level without much increase of additional parameters.
Abstract: In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.

Proceedings ArticleDOI
Huanjing Yue1, Cao Cong1, Lei Liao1, Ronghe Chu1, Jingyu Yang1 
14 Jun 2020
TL;DR: Wang et al. as mentioned in this paper proposed a raw video denoising network (RViDeNet) by exploring the temporal, spatial, and channel correlations of video frames to generate clean video frames for dynamic scenes.
Abstract: In recent years, the supervised learning strategy for real noisy image denoising has been emerging and has achieved promising results. In contrast, realistic noise removal for raw noisy videos is rarely studied due to the lack of noisy-clean pairs for dynamic scenes. Clean video frames for dynamic scenes cannot be captured with a long-exposure shutter or averaging multi-shots as was done for static images. In this paper, we solve this problem by creating motions for controllable objects, such as toys, and capturing each static moment for multiple times to generate clean video frames. In this way, we construct a dataset with 55 groups of noisy-clean videos with ISO values ranging from 1600 to 25600. To our knowledge, this is the first dynamic video dataset with noisy-clean pairs. Correspondingly, we propose a raw video denoising network (RViDeNet) by exploring the temporal, spatial, and channel correlations of video frames. Since the raw video has Bayer patterns, we pack it into four sub-sequences, i.e RGBG sequences, which are denoised by the proposed RViDeNet separately and finally fused into a clean video. In addition, our network not only outputs a raw denoising result, but also the sRGB result by going through an image signal processing (ISP) module, which enables users to generate the sRGB result with their favourite ISPs. Experimental results demonstrate that our method outperforms state-of-the-art video and raw image denoising algorithms on both indoor and outdoor videos.

Journal ArticleDOI
TL;DR: Overall, this research contributes to developing SCHNet to integrate spatial and channel information in feature extraction, resulting in a more accurate and efficient crack detection process.

Journal ArticleDOI
TL;DR: A deep learning-based method by using single channel electroencephalogram (EEG) that automatically exploits the time–frequency spectrum of EEG signal, removing the need for manual feature extraction is developed.

Journal ArticleDOI
TL;DR: This paper improves the model to be a multi-level information fusion of the convolution calculation method, and further recovers the discarded feature information, so as to improve the recognition rate of the image.
Abstract: With the continuous progress of The Times and the development of technology, the rise of network social media has also brought the "explosive" growth of image data. As one of the main ways of People's Daily communication, image is widely used as a carrier of communication because of its rich content, intuitive and other advantages. Image recognition based on convolution neural network is the first application in the field of image recognition. A series of algorithm operations such as image eigenvalue extraction, recognition and convolution are used to identify and analyze different images. The rapid development of artificial intelligence makes machine learning more and more important in its research field. Use algorithms to learn each piece of data and predict the outcome. This has become an important key to open the door of artificial intelligence. In machine vision, image recognition is the foundation, but how to associate the low-level information in the image with the high-level image semantics becomes the key problem of image recognition. Predecessors have provided many model algorithms, which have laid a solid foundation for the development of artificial intelligence and image recognition. The multi-level information fusion model based on the VGG16 model is an improvement on the fully connected neural network. Different from full connection network, convolutional neural network does not use full connection method in each layer of neurons of neural network, but USES some nodes for connection. Although this method reduces the computation time, due to the fact that the convolutional neural network model will lose some useful feature information in the process of propagation and calculation, this paper improves the model to be a multi-level information fusion of the convolution calculation method, and further recovers the discarded feature information, so as to improve the recognition rate of the image. VGG divides the network into five groups (mimicking the five layers of AlexNet), yet it USES 3∗3 filters and combines them as a convolution sequence. Network deeper DCNN, channel number is bigger. The recognition rate of the model was verified by 0RL Face Database, BioID Face Database and CASIA Face Image Database.

Journal ArticleDOI
TL;DR: The proposed approach, called Color Channel Compensation (3C), overcomes artifacts resulting from the severely non-uniform color spectrum distribution encountered in images captured under hazy night-time conditions, underwater, or under non- uniform artificial illumination and is shown to consistently improve the outcome of conventional restoration methods.
Abstract: This article introduces a novel solution to improve image enhancement in terms of color appearance. Our approach, called Color Channel Compensation (3C), overcomes artifacts resulting from the severely non-uniform color spectrum distribution encountered in images captured under hazy night-time conditions, underwater, or under non-uniform artificial illumination. Our solution is founded on the observation that, under such adverse conditions, the information contained in at least one color channel is close to completely lost, making the traditional enhancing techniques subject to noise and color shifting. In those cases, our pre-processing method proposes to reconstruct the lost channel based on the opponent color channel. Our algorithm subtracts a local mean from each opponent color pixel. Thereby, it partly recovers the lost color from the two colors (red-green or blue-yellow) involved in the opponent color channel. The proposed approach, whilst simple, is shown to consistently improve the outcome of conventional restoration methods. To prove the utility of our 3C operator, we provide an extensive qualitative and quantitative evaluation for white balancing, image dehazing, and underwater enhancement applications.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed method outperforms some of the state-of-the-art methods in terms of both visual quality and objective assessment.
Abstract: Pulse coupled neural network (PCNN) is widely used in image fusion framework due to its global coupling and pulse synchronization of neurons. However, its manual setting of parameters and inability to process multiple images affect the fusion performance. In this letter, a novel weighted parameter adaptive dual channel PCNN (WPADCPCNN) based medical fusion method is proposed in non-subsampled shearlet transform domain to fuse the magnetic resonance imaging and single-photon emission computed tomography images of AIDS dementia complex and Alzheimer's disease patients. The parameters of the proposed WPADCPCNN model are estimated from its inputs using fractal dimension. The high-pass sub-bands are fused using the WPADCPCNN model whereas the low-pass sub-bands are merged using a new weighted multi-scale morphological gradients based rule. Experimental results demonstrate that the proposed method outperforms some of the state-of-the-art methods in terms of both visual quality and objective assessment.

Journal ArticleDOI
TL;DR: This letter proposes an unsupervised learning approach for single low-light image enhancement using the bright channel prior (BCP) that the brightest pixel in a small patch is likely to be close to 1.
Abstract: Recent approaches for low-light image enhancement achieve excellent performance through supervised learning based on convolutional neural networks. However, it is still challenging to collect a large amount of low-/normal-light image pairs in real environments for training the networks. In this letter, we propose an unsupervised learning approach for single low-light image enhancement using the bright channel prior (BCP) that the brightest pixel in a small patch is likely to be close to 1. An unsupervised loss function is defined with the pseudo ground-truth generated using the BCP. An enhancement network, consisting of a simple encoder-decoder, is then trained using the unsupervised loss function. To the best of our knowledge, this is the first attempt that enhances a low-light image through unsupervised learning. Furthermore, we introduce saturation loss and self-attention map for preserving image details and naturalness in the enhanced result. The performance of the proposed method is validated on various public datasets. Experimental results demonstrate that the proposed unsupervised approach achieves competitive performance over state-of-the-art methods based on supervised learning.

Journal ArticleDOI
TL;DR: This paper first design multiscale convolution to extract the contextual feature of different scales for HSIs and then proposes to employ the octave 3DCNN which factorizes the mixed feature maps by their frequency to replace the normal 3D CNN in order to reduce the spatial redundancy and enlarge the receptive field.
Abstract: 3D convolutional neural networks (CNNs) have been demonstrated to be a powerful tool in hyperspectral images (HSIs) classification. However, using the conventional 3D CNNs to extract the spectral–spatial feature for HSIs results in too many parameters as HSIs have plenty of spatial redundancy. To address this issue, in this paper, we first design multiscale convolution to extract the contextual feature of different scales for HSIs and then propose to employ the octave 3D CNN which factorizes the mixed feature maps by their frequency to replace the normal 3D CNN in order to reduce the spatial redundancy and enlarge the receptive field. To further explore the discriminative features, a channel attention module and a spatial attention module are adopted to optimize the feature maps and improve the classification performance. The experiments on four hyperspectral image data sets demonstrate that the proposed method outperforms other state-of-the-art deep learning methods.

Journal ArticleDOI
TL;DR: An efficient Gradient channel prior (GCP) is designed that overcomes various issues such as texture distortion, transmission map misestimation, color distortion, and edge degradation and can significantly restore the hazy images, even if images contain high density of haze.

Journal ArticleDOI
TL;DR: This method uses a pixel intensity center regionalization strategy to perform centralization of the image histogram on the overall image and dual-image multi-scale fusion to integrate the contrast, saliency and exposure weight maps of the color corrected and contrast enhanced images.
Abstract: Underwater images suffer from color cast and low visibility caused by the medium scattering and absorption, which will reduce the use of valuable information from the image. In this paper, we propose a novel method which includes four stages of pixel intensity center regionalization, global equalization of histogram, local equalization of histogram and multi-scale fusion. Additionally, this method uses a pixel intensity center regionalization strategy to perform centralization of the image histogram on the overall image. Global equalization of histogram is employed to correct color of the image according to the characteristics of each channel. Local equalization of dual-interval histogram based on average of peak and mean values is used to improve contrast of the image according to the characteristics of each channel. Dual-image multi-scale fusion to integrate the contrast, saliency and exposure weight maps of the color corrected and contrast enhanced images. Experiments on variety types of degraded underwater images show that the proposed method produces better output results in both qualitative and quantitative analysis, thus, the proposed method outperforms other state-of-the-art methods.

Journal ArticleDOI
TL;DR: A deep learning image semantic segmentation network named Spatial-Channel Attention U-Net (SCAU-Net) based on current research status of medical image is proposed, which has an encoder-decoder-style symmetrical structure integrated with spatial and channel attention as plug-and-play modules.
Abstract: With the development of medical technology, image semantic segmentation is of great significance for morphological analysis, quantification, and diagnosis of human tissues. However, manual detection and segmentation is a time-consuming task. Especially for biomedical image, only experts are able to identify tissues and mark their contours. In recent years, the development of deep learning has greatly improved the accuracy of computer automatic segmentation. This paper proposes a deep learning image semantic segmentation network named Spatial-Channel Attention U-Net (SCAU-Net) based on current research status of medical image. SCAU-Net has an encoder-decoder-style symmetrical structure integrated with spatial and channel attention as plug-and-play modules. The main idea is to enhance local related features and restrain irrelevant features at the spatial and channel levels. Experiments on the gland dataset GlaS and CRAG show that the proposed SCAU-Net model is superior to the classic U-Net model in image segmentation task, with 1% improvement on Dice score and 1.5% improvement on Jaccard score.

Proceedings ArticleDOI
12 Oct 2020
TL;DR: A novel Dual Attention GAN (DAGAN) is proposed to synthesize photo-realistic and semantically-consistent images with fine details from the input layouts without imposing extra training overhead or modifying the network architectures of existing methods.
Abstract: In this paper, we focus on the semantic image synthesis task that aims at transferring semantic label maps to photo-realistic images. Existing methods lack effective semantic constraints to preserve the semantic information and ignore the structural correlations in both spatial and channel dimensions, leading to unsatisfactory blurry and artifact-prone results. To address these limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images with fine details from the input layouts without imposing extra training overhead or modifying the network architectures of existing methods. We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively. Specifically, SAM selectively correlates the pixels at each position by a spatial attention map, leading to pixels with the same semantic label being related to each other regardless of their spatial distances. Meanwhile, CAM selectively emphasizes the scale-wise features at each channel by a channel attention map, which integrates associated features among all channel maps regardless of their scales. We finally sum the outputs of SAM and CAM to further improve feature representation. Extensive experiments on four challenging datasets show that DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters.

Journal ArticleDOI
01 Jan 2020-Talanta
TL;DR: A smartphone camera using an image-based colorimetric detection method was successfully developed for quantitative analysis of urine glucose and has great potential as a point-of-need platform for diabetic patients with defective color vision and features high accuracy and low cost.

Journal ArticleDOI
TL;DR: A Conditional Variational Image Deraining (CVID) network is proposed for better deraining performance, leveraging the exclusive generative ability of Conditionalvariational Auto-Encoder on providing diverse predictions for the rainy image.
Abstract: Image deraining is an important yet challenging image processing task. Though deterministic image deraining methods are developed with encouraging performance, they are infeasible to learn flexible representations for probabilistic inference and diverse predictions. Besides, rain intensity varies both in spatial locations and across color channels, making this task more difficult. In this paper, we propose a Conditional Variational Image Deraining (CVID) network for better deraining performance, leveraging the exclusive generative ability of Conditional Variational Auto-Encoder (CVAE) on providing diverse predictions for the rainy image. To perform spatially adaptive deraining, we propose a spatial density estimation (SDE) module to estimate a rain density map for each image. Since rain density varies across different color channels, we also propose a channel-wise (CW) deraining scheme. Experiments on synthesized and real-world datasets show that the proposed CVID network achieves much better performance than previous deterministic methods on image deraining. Extensive ablation studies validate the effectiveness of the proposed SDE module and CW scheme in our CVID network. The code is available at https://github.com/Yingjun-Du/VID .

Journal ArticleDOI
TL;DR: This paper presents an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images that significantly outperforms the other existing networks.
Abstract: Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

Journal ArticleDOI
Guojia Hou1, Jingming Li1, Guodong Wang1, Huan Yang1, Baoxiang Huang1, Zhenkuan Pan1 
TL;DR: An underwater total variation model relying on underwater dark channel prior (UDCP), in which UDCP is used to estimate the transmission map and the data item and smooth item of the unified variational model based on the underwater image formation model are designed.

Journal ArticleDOI
TL;DR: The proposed CSMS-SSRN framework can achieve better classification performance on different HSI datasets and enhance the expressiveness of the image features from the two aspects of channel and spatial domains, thereby improving the accuracy of classification.
Abstract: With the rapid development of aerospace and various remote sensing platforms, the amount of data related to remote sensing is increasing rapidly. To meet the application requirements of remote sensing big data, an increasing number of scholars are combining deep learning with remote sensing data. In recent years, based on the rapid development of deep learning methods, research in the field of hyperspectral image (HSI) classification has seen continuous breakthroughs. In order to fully extract the characteristics of HSIs and improve the accuracy of image classification, this article proposes a novel three-dimensional (3-D) channel and spatial attention-based multiscale spatial–spectral residual network (termed CSMS-SSRN). The CSMS-SSRN framework uses a three-layer parallel residual network structure by using different 3-D convolutional kernels to continuously learn spectral and spatial features from their respective residual blocks. Then, the extracted depth multiscale features are stacked and input into the 3-D attention module to enhance the expressiveness of the image features from the two aspects of channel and spatial domains, thereby improving the accuracy of classification. The CSMS-SSRN framework proposed in this article can achieve better classification performance on different HSI datasets.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Visual results on a set of real-world hazy images captured in different weather conditions demonstrate the effectiveness of the proposed approach for varicolored image de-hazing.
Abstract: The quality of images captured in bad weather is often affected by chromatic casts and low visibility due to the presence of atmospheric particles. Restoration of the color balance is often ignored in most of the existing image de-hazing methods. In this paper, we propose a varicolored end-to-end image de-hazing network which restores the color balance in a given varicolored hazy image and recovers the haze-free image. The proposed network comprises of 1) Haze color correction (HCC) module and 2) Visibility improvement (VI) module. The proposed HCC module provides required attention to each color channel and generates a color balanced hazy image. While the proposed VI module processes the color balanced hazy image through novel inception attention block to recover the haze-free image. We also propose a novel approach to generate a large-scale varicolored synthetic hazy image database. An ablation study has been carried out to demonstrate the effect of different factors on the performance of the proposed network for image de-hazing. Three benchmark synthetic datasets have been used for quantitative analysis of the proposed network. Visual results on a set of real-world hazy images captured in different weather conditions demonstrate the effectiveness of the proposed approach for varicolored image de-hazing.