scispace - formally typeset
Search or ask a question

Showing papers on "Channel (digital image) published in 2021"


Journal ArticleDOI
TL;DR: This work places multiple color charts in the scenes and calculated its 3D structure using stereo imaging to obtain ground truth, and contributes a dataset of 57 images taken in different locations that enables a rigorous quantitative evaluation of restoration algorithms on natural images for the first time.
Abstract: Underwater images suffer from color distortion and low contrast, because light is attenuated while it propagates through water. Attenuation under water varies with wavelength, unlike terrestrial images where attenuation is assumed to be spectrally uniform. The attenuation depends both on the water body and the 3D structure of the scene, making color restoration difficult. Unlike existing single underwater image enhancement techniques, our method takes into account multiple spectral profiles of different water types. By estimating just two additional global parameters: the attenuation ratios of the blue-red and blue-green color channels, the problem is reduced to single image dehazing, where all color channels have the same attenuation coefficients. Since the water type is unknown, we evaluate different parameters out of an existing library of water types. Each type leads to a different restored image and the best result is automatically chosen based on color distribution. We also contribute a dataset of 57 images taken in different locations. To obtain ground truth, we placed multiple color charts in the scenes and calculated its 3D structure using stereo imaging. This dataset enables a rigorous quantitative evaluation of restoration algorithms on natural images for the first time.

225 citations


Journal ArticleDOI
TL;DR: A novel dual-attention denoising network is proposed that combines two parallel branches to process the spatial and spectral information separately and proves the superiority of the method both visually and quantitatively when compared with state-of-the-art methods.
Abstract: Hyperspectral image (HSI) denoising plays an important role in image quality improvement and related applications. Convolutional neural network (CNN)-based image denoising methods have been predominant due to advances made in the field of deep learning in recent years. Spatial and spectral information are crucial to HIS denoising, along with their correlations. However, existing methods fail to consider the global dependence and correlation between spatial and spectral information. Accordingly, in this article, we propose a novel dual-attention denoising network to overcome these limitations. We design two parallel branches to process the spatial and spectral information separately. The position attention module is applied to the spatial branch to formulate the interdependencies on the feature map, while the channel attention module is applied to the spectral branch to simulate the spectral correlation before the two branches are combined. A multiscale structure is also employed to extract and fuse the multiscale features following the fusion of spatial and spectral information. Experimental results on simulated and real data substantiate the superiority of our method both visually and quantitatively when compared with state-of-the-art methods.

114 citations


Journal ArticleDOI
TL;DR: With the rapid advancement of video and image processing technologies in Internet-of-Things (IoT), it is urgent to address the issues in real-time performance, clarity and reliability of image quality.
Abstract: With the rapid advancement of video and image processing technologies in the Internet of Things, it is urgent to address the issues in real-time performance, clarity, and reliability of image recognition technology for a monitoring system in foggy weather conditions. In this work, a fast defogging image recognition algorithm is proposed based on bilateral hybrid filtering. First, the mathematical model based on bilateral hybrid filtering is established. The dark channel is used for filtering and denoising the defogging image. Next, a bilateral hybrid filtering method is proposed by using a combination of guided filtering and median filtering, as it can effectively improve the robustness and transmittance of defogging images. On this basis, the proposed algorithm dramatically decreases the computation complexity of defogging image recognition and reduces the image execution time. Experimental results show that the defogging effect and speed are promising, with the image recognition rate reaching to 98.8% after defogging.

74 citations


Journal ArticleDOI
TL;DR: Comparison with state-of-the-art methods show that the proposed method outputs high-quality underwater images with qualitative and quantitative evaluation well.
Abstract: Underwater captured images often suffer from color cast and low visibility due to light is scattered and absorbed while it traveling in water. In this paper, we proposed a novel method of color correction and Bi-interval contrast enhancement to improve the quality of underwater images. Firstly, a simple and effective color correction method based on sub-interval linear transformation is employed to address color distortion. Then, a Gaussian low-pass filter is applied to the L channel to decompose the low- and high-frequency components. Finally, the low- and high-frequency components are enhanced by Bi-interval histogram based on optimal equalization threshold strategy and S-shaped function to enhancement image contrast and highlight image details. Inspired by the multi-scale fusion, we employed a simple linear fusion to integrate the enhanced high- and low-frequency components. Comparison with state-of-the-art methods show that the proposed method outputs high-quality underwater images with qualitative and quantitative evaluation well.

73 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a double image encryption algorithm based on convolutional neural network (CNN) and dynamic adaptive diffusion, which not only ensures the security of double image but also improves the encryption efficiency and reduces the possibility of being attacked.
Abstract: To realize the secure transmission of double images, this paper proposes a double image encryption algorithm based on convolutional neural network (CNN) and dynamic adaptive diffusion. This scheme is different from the existing double image encryption technology. According to the characteristics of digital image, we design a dual-channel (digital channel / optical channel) encryption method, which not only ensures the security of double image, but also improves the encryption efficiency and reduces the possibility of being attacked. First, a chaotic map is used to control the initial values of the 5D conservative chaotic system to enhance the security of the key. Secondary, in order to effectively resist known-plaintext attack and chosen-plaintext attack, we employ a chaotic sequence as convolution kernel of convolution neural network to generate plaintext related chaotic pointer to control the scrambling operation of two images. On this basis, a novel image fusion method is designed, which splits and fuses two images into two different parts according to the amount of information contained. In addition, a dual-channel image encryption scheme, optical encryption channel and digital encryption channel, is designed for the two parts after fusion. The former has better parallelism and higher encryption efficiency, while the latter has higher computational complexity and better encryption reliability. Especially in the digital encryption channel, a new dynamic adaptive diffusion method is designed, which is more flexible and secure than the existing encryption algorithm. Finally, numerical simulation and experimental analysis verify the feasibility and effectiveness of the scheme.

70 citations


Journal ArticleDOI
Zheng Liang1, Yafei Wang1, Xueyan Ding1, Zetian Mi1, Xianping Fu1 
TL;DR: Experiments on a variety types of degraded underwater images have proven that the proposed systematic underwater image enhancement method can produce accurate results with vivid color and fine details, even better than other state-of-the-art underwater image dehazing methods.

67 citations


Journal ArticleDOI
TL;DR: A crossflow and cross-scale adaptive fusion network (CCAFNet) to detect salient objects in RGB-D images and the results indicate that the performance of the proposed CCAFNet is comparable to those of state-of-the-artRGB-D SOD models.
Abstract: Owing to the widespread adoption of depth sensors, salient object detection (SOD) supported by depth maps for reliable complementary information is being increasingly investigated. Existing SOD models mainly exploit the relation between an RGB image and its corresponding depth information across three fusion domains: input RGB-D images, extracted feature maps, and output salient object. However, these models do not leverage the crossflows between high- and low-level information well. Moreover, the decoder in these models uses conventional convolution that involves several calculations. To further improve RGB-D SOD, we propose a crossflow and cross-scale adaptive fusion network (CCAFNet) to detect salient objects in RGB-D images. First, a channel fusion module allows for effective fusing depth and high-level RGB features. This module extracts accurate semantic information features from high-level RGB features. Meanwhile, a spatial fusion module combines low-level RGB and depth features with accurate boundaries and subsequently extracts detailed spatial information from low-level depth features. Finally, a purification loss is proposed to precisely learn the boundaries of salient objects and obtain additional details of the objects. The results of comprehensive experiments on seven common RGB-D SOD datasets indicate that the performance of the proposed CCAFNet is comparable to those of state-of-the-art RGB-D SOD models.

63 citations


Proceedings ArticleDOI
22 Mar 2021
TL;DR: Zhang et al. as discussed by the authors proposed an Adaptive-weighted Bi-directional Modality Difference Reduction Network (ABMDRNet) to reduce the modality differences between RGB and thermal features.
Abstract: Semantic segmentation models gain robustness against poor lighting conditions by virtue of complementary information from visible (RGB) and thermal images. Despite its importance, most existing RGB-T semantic segmentation models perform primitive fusion strategies, such as concatenation, element-wise summation and weighted summation, to fuse features from different modalities. These strategies, unfortunately, overlook the modality differences due to different imaging mechanisms, so that they suffer from the reduced discriminability of the fused features. To address such an issue, we propose, for the first time, the strategy of bridging-then-fusing, where the innovation lies in a novel Adaptive-weighted Bi-directional Modality Difference Reduction Network (ABMDRNet). Concretely, a Modality Difference Reduction and Fusion (MDRF) subnetwork is designed, which first employs a bi-directional image-to-image translation based method to reduce the modality differences between RGB features and thermal features, and then adaptively selects those discriminative multi-modality features for RGB-T semantic segmentation in a channel-wise weighted fusion way. Furthermore, considering the importance of contextual information in semantic segmentation, a Multi-Scale Spatial Context (MSC) module and a Multi-Scale Channel Context (MCC) module are proposed to exploit the interactions among multi-scale contextual information of cross-modality features together with their long-range dependencies along spatial and channel dimensions, respectively. Comprehensive experiments on MFNet dataset demonstrate that our method achieves new state-of-the-art results.

62 citations


Journal ArticleDOI
TL;DR: A retinal vessel segmentation algorithm for color fundus images based on back-propagation (BP) neural network is proposed according to the characteristics of retinal blood vessels, feasible and effective and can detect more capillaries.
Abstract: To improve the accuracy of retinal vessel segmentation, a retinal vessel segmentation algorithm for color fundus images based on back-propagation (BP) neural network is proposed according to the characteristics of retinal blood vessels. Four kinds of green channel image enhancement results of adaptive histogram equalization, morphological processing, Gaussian matched filtering, and Hessian matrix filtering are used to form feature vectors. The BP neural network is input to segment blood vessels. Experiments on the color fundus image libraries DRIVE and STARE show that this algorithm can obtain complete retinal blood vessel segmentation as well as connected vessel stems and terminals. When segmenting most small blood vessels, the average accuracy on the DRIVE library reaches 0.9477, and the average accuracy on the STARE library reaches 0.9498, which has a good segmentation effect. Through verification, the algorithm is feasible and effective for blood vessel segmentation of color fundus images and can detect more capillaries.

60 citations


Journal ArticleDOI
TL;DR: This paper attempts to explore a more suitable small deep learning model for ore image classification by considering the model depth, model structure, and dataset size.

60 citations


Proceedings Article
18 May 2021
TL;DR: A novel end-to-end residual learning framework is proposed, which formulates the depth completion as a two-stage learning task, i.e., a sparse- to-coarse stage and a coarse-tofine stage, and achieves SoTA performance in RMSE on KITTI benchmark.
Abstract: Depth completion aims to recover a dense depth map from a sparse depth map with the corresponding color image as input. Recent approaches mainly formulate the depth completion as a one-stage end-to-end learning task, which outputs dense depth maps directly. However, the feature extraction and supervision in one-stage frameworks are insufficient, limiting the performance of these approaches. To address this problem, we propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task, i.e., a sparse-to-coarse stage and a coarse-to-fine stage. First, a coarse dense depth map is obtained by a simple CNN framework. Then, a refined depth map is further obtained using a residual learning strategy in the coarse-to-fine stage with coarse depth map and color image as input. Specially, in the coarse-to-fine stage, a channel shuffle extraction operation is utilized to extract more representative features from color image and coarse depth map, and an energy based fusion operation is exploited to effectively fuse these features obtained by channel shuffle operation, thus leading to more accurate and refined depth maps. We achieve SoTA performance in RMSE on KITTI benchmark. Extensive experiments on other datasets future demonstrate the superiority of our approach over current state-of-the-art depth completion approaches.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper presented an improved U-Net with residual connections, adding a plug-and-play, very portable channel attention (CA) block and a hybrid dilated attention convolutional (HDAC) layer to perform medical image segmentation for different tasks accurately and effectively, and call it HDA-ResUNet, in which they fully utilize advantages of U-net, attention mechanism and dilated convolution.

Journal ArticleDOI
TL;DR: It is shown that the spread of the hue in a hazy image can differentiate a color cast image from a non-cast one, and a measure is proposed using the same for categorizing hazy images as cast and non- cast ones.
Abstract: Hazy images suffer from low visibility since the light gets scattered as it passes through various atmospheric particles. Moreover, such images are prone to color distortion, particularly in real weather conditions like sandstorms. In this letter, an effective dehazing technique is proposed using weighted least squares filtering on dark channel prior and color correction that involves automatic detection of color cast images. We show that the spread of the hue in a hazy image can differentiate a color cast image from a non-cast one. We propose a measure using the same for categorizing hazy images as cast and non-cast ones. Our novel color correction is performed by color balancing using a non-linear transformation followed by a cast-adaptive airlight refinement. Subjective and quantitative evaluations show that our method outperforms the state-of-the-art. It removes cast satisfactorily and reduces haze substantially while maintaining the naturalness of the image. Moreover, it produces visually pleasing images without halo artifacts.

Journal ArticleDOI
Jiaqian Li1, Juncheng Li1, Faming Fang1, Fang Li1, Guixu Zhang1 
TL;DR: A lightweight and efficient Luminance-aware Pyramid Network (LPNet) to reconstruct normal-light images in a coarse-to-fine strategy that outperforms state-of-the-art methods both qualitatively and quantitatively.
Abstract: Low-light image enhancement based on deep convolutional neural networks (CNNs) has revealed prominent performance in recent years. However, it is still a challenging task since the underexposed regions and details are always imperceptible. Moreover, deep learning models are always accompanied by complex structures and enormous computational burden, which hinders their deployment on mobile devices. To remedy these issues, in this paper, we present a lightweight and efficient Luminance-aware Pyramid Network (LPNet) to reconstruct normal-light images in a coarse-to-fine strategy. The architecture is comprised of two coarse feature extraction branches and a luminance-aware refinement branch with an auxiliary subnet learning the luminance map of the input and target images. Besides, we propose a multi-scale contrast feature block (MSCFB) that involves channel split, channel shuffle strategies, and contrast attention mechanism. MSCFB is the essential component of our network, which achieves an excellent balance between image quality and model size. In this way, our method can not only brighten up low-light images with rich details and high contrast but also significantly ameliorate the execution speed. Extensive experiments demonstrate that our LPNet outperforms state-of-the-art methods both qualitatively and quantitatively.

Journal ArticleDOI
01 Dec 2021-Displays
TL;DR: Wang et al. as mentioned in this paper proposed a quadratic polynomial guided fuzzy C-means and dual attention mechanism composite network model architecture to address the medical image's high complexity and noise.

Journal ArticleDOI
TL;DR: A deep retinex dehazing network (RDN) to jointly estimate the residual illumination map and the haze-free image and can avoid the errors associated with the simplified scattering model and provide better generalization ability with no dependence on prior information.
Abstract: In this paper, we propose a retinex-based decomposition model for a hazy image and a novel end-to-end image dehazing network. In the model, the illumination of the hazy image is decomposed into natural illumination for the haze-free image and residual illumination caused by haze. Based on this model, we design a deep retinex dehazing network (RDN) to jointly estimate the residual illumination map and the haze-free image. Our RDN consists of a multiscale residual dense network for estimating the residual illumination map and a U-Net with channel and spatial attention mechanisms for image dehazing. The multiscale residual dense network can simultaneously capture global contextual information from small-scale receptive fields and local detailed information from large-scale receptive fields to precisely estimate the residual illumination map caused by haze. In the dehazing U-Net, we apply the channel and spatial attention mechanisms in the skip connection of the U-Net to achieve a trade-off between overdehazing and underdehazing by automatically adjusting the channel-wise and pixel-wise attention weights. Compared with scattering model-based networks, fully data-driven networks, and prior-based dehazing methods, our RDN can avoid the errors associated with the simplified scattering model and provide better generalization ability with no dependence on prior information. Extensive experiments show the superiority of the RDN to various state-of-the-art methods.

Journal ArticleDOI
TL;DR: A new user-independent emotion classification method is presented that classifies four distinct emotions using electroencephalograph (EEG) signals and the broad learning system (BLS) which successfully upgrades the efficiency of emotion classification based on EEG brain signals.
Abstract: This article presents a new user-independent emotion classification method that classifies four distinct emotions using electroencephalograph (EEG) signals and the broad learning system (BLS). The public DEAP and MAHNOB-HCI databases are used. Just one EEG electrode channel is selected for the feature extraction process. Continuous wavelet transform (CWT) is then utilized to extract the proposed gray-scale image (GSI) feature which describes the EEG brain activation in both time and frequency domains. Finally, the new BLS is constructed for the emotion classification process, which successfully upgrades the efficiency of emotion classification based on EEG brain signals. The experiment results show that the proposed work produces a robust system with high accuracy of approximately 93.1% and training process time of approximately 0.7 s for the DEAP database, as well as, the high average accuracy of approximately 94.4% and training process time of approximately 0.6 s for MAHNOB-HCI database.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a multi-level feature aggregation network (MFANet), which is improved in two aspects: deep feature extraction and up-sampling feature fusion.
Abstract: Detailed information regarding land utilization/cover is a valuable resource in various fields. In recent years, remote sensing images, especially aerial images, have become higher in resolution and larger span in time and space, and the phenomenon that the objects in an identical category may yield a different spectrum would lead to the fact that relying on spectral features only is often insufficient to accurately segment the target objects. In convolutional neural networks, down-sampling operations are usually used to extract abstract semantic features, which leads to loss of details and fuzzy edges. To solve these problems, the paper proposes a Multi-level Feature Aggregation Network (MFANet), which is improved in two aspects: deep feature extraction and up-sampling feature fusion. Firstly, the proposed Channel Feature Compression module extracts the deep features and filters the redundant channel information from the backbone to optimize the learned context. Secondly, the proposed Multi-level Feature Aggregation Upsample module nestedly uses the idea that high-level features provide guidance information for low-level features, which is of great significance for positioning the restoration of high-resolution remote sensing images. Finally, the proposed Channel Ladder Refinement module is used to refine the restored high-resolution feature maps. Experimental results show that the proposed method achieves state-of-the-art performance 86.45% mean IOU on LandCover dataset.

Journal ArticleDOI
12 Mar 2021
TL;DR: Wang et al. as discussed by the authors proposed an extended version of U-Net for the segmentation of skin lesions using the concept of the triple attention mechanism, which first selected regions using attention coefficients computed by the attention gate and contextual information, then a dual attention decoding module consisting of spatial attention and channel attention was used to capture the spatial correlation between features and improve segmentation performance.
Abstract: Segmentation of skin lesions is a challenging task because of the wide range of skin lesion shapes, sizes, colors, and texture types. In the past few years, deep learning networks such as U-Net have been successfully applied to medical image segmentation and exhibited faster and more accurate performance. In this paper, we propose an extended version of U-Net for the segmentation of skin lesions using the concept of the triple attention mechanism. We first selected regions using attention coefficients computed by the attention gate and contextual information. Second, a dual attention decoding module consisting of spatial attention and channel attention was used to capture the spatial correlation between features and improve segmentation performance. The combination of the three attentional mechanisms helped the network to focus on a more relevant field of view of the target. The proposed model was evaluated using three datasets, ISIC-2016, ISIC-2017, and PH2. The experimental results demonstrated the effectiveness of our method with strong robustness to the presence of irregular borders, lesion and skin smooth transitions, noise, and artifacts.

Journal ArticleDOI
TL;DR: A blind and robust scheme using YCbCr color space, IWT (integer wavelet transform) and DCT (discrete cosine transform) for color image watermarking and the ANN framework provides faster embedding with approximately similar parametric results.

Journal ArticleDOI
TL;DR: A Gaussian Markov Random Field model with four-element cross neighborhood is proposed to characterize the interactions among local elements of cover images, and the problem of secure image steganography is formulated as the one of minimization of KL-divergence in terms of a series of low-dimensional clique structures associated with GMRF.
Abstract: Recent advances on adaptive steganography show that the performance of image steganographic communication can be improved by incorporating the non-additive models that capture the dependencies among adjacent pixels. In this paper, a Gaussian Markov Random Field model (GMRF) with four-element cross neighborhood is proposed to characterize the interactions among local elements of cover images, and the problem of secure image steganography is formulated as the one of minimization of KL-divergence in terms of a series of low-dimensional clique structures associated with GMRF by taking advantages of the conditional independence of GMRF. The adoption of the proposed GMRF tessellates the cover image into two disjoint subimages, and an alternating iterative optimization scheme is developed to effectively embed the given payload while minimizing the total KL-divergence between cover and stego, i.e., the statistical detectability. Experimental results demonstrate that the proposed GMRF outperforms the prior arts of model based schemes, e.g., MiPOD, and rivals the state-of-the-art HiLL for practical steganography, where the selection channel knowledges are unavailable to steganalyzers.

Journal ArticleDOI
TL;DR: This paper aims at achieving high precise segmentation with minimum resources by presenting a lightweight and efficient generative adversarial network (GAN) model, called MobileGAN, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model.
Abstract: The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational time and memory. Consequently, running such segmentation algorithms requires a powerful GPU and high bandwidth memory, which are not available in dermoscopy devices. Thus, this article aims to achieve precise skin lesion segmentation with minimum resources: a lightweight, efficient generative adversarial network (GAN) model called SLSNet, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model. The 1-D kernel factorized network reduces the computational cost of 2D filtering. The position and channel attention modules enhance the discriminative ability between the lesion and non-lesion feature representations in spatial and channel dimensions, respectively. A multiscale block is also used to aggregate the coarse-to-fine features of input skin images and reduce the effect of the artifacts. SLSNet is evaluated on two publicly available datasets: ISBI 2017 and the ISIC 2018. Although SLSNet has only 2.35 million parameters, the experimental results demonstrate that it achieves segmentation results on a par with the state-of-the-art skin lesion segmentation methods with an accuracy of 97.61%, and Dice and Jaccard similarity coefficients of 90.63% and 81.98%, respectively. SLSNet can run at more than 110 frames per second (FPS) in a single GTX1080Ti GPU, which is faster than well-known deep learning-based image segmentation models, such as FCN. Therefore, SLSNet can be used for practical dermoscopic applications.

Journal ArticleDOI
Jianhua Guo1, Jingyu Yang1, Huanjing Yue1, Hai Tan, Chunping Hou1, Kun Li1 
TL;DR: This work proposes a novel haze synthesis method to generate realistic hazy multispectral RS image dehazing by modeling the wavelength-dependent and spatial-varying characteristics of haze in RS images.
Abstract: Multispectral remote sensing (RS) images are often contaminated by the haze that degrades the quality of RS data and reduces the accuracy of interpretation and classification. Recently, the emerging deep convolutional neural networks (CNNs) provide us new approaches for RS image dehazing. Unfortunately, the power of CNNs is limited by the lack of sufficient hazy-clean pairs of RS imagery, which makes supervised learning impractical. To meet the data hunger of supervised CNNs, we propose a novel haze synthesis method to generate realistic hazy multispectral images by modeling the wavelength-dependent and spatial-varying characteristics of haze in RS images. The proposed haze synthesis method not only alleviates the lack of realistic training pairs in multispectral RS image dehazing but also provides a benchmark data set for quantitative evaluation. Furthermore, we propose an end-to-end RSDehazeNet for haze removal. We utilize both local and global residual learning strategies in RSDehazeNet for fast convergence with superior performance. Channel attention modules are incorporated to exploit strong channel correlation in multispectral RS images. Experimental results show that the proposed network outperforms the state-of-the-art methods for synthetic data and real Landsat-8 OLI multispectral RS images.

Journal ArticleDOI
TL;DR: A semi-supervised generative adversarial network with two sub-networks for more precise segmentation results at the pixel level, which can leverage unlabeled images to enhance the segmentation performance and alleviate the data labeling task.

Journal ArticleDOI
TL;DR: A more comprehensive color measurement in spatial domain and frequency domain is designed by combining the colorfulness, contrast, and sharpness cues, inspired by the different sensibility of humans to high-frequency and low-frequency information.
Abstract: Owing to the complexity of the underwater environment and the limitations of imaging devices, the quality of underwater images varies differently, which may affect the practical applications in modern military, scientific research, and other fields. Thus, achieving subjective quality assessment to distinguish different qualities of underwater images has an important guiding role for subsequent tasks. In this paper, considering the underwater image degradation effect and human visual perception scheme, an effective reference-free underwater image quality assessment metric is designed by combining the colorfulness, contrast, and sharpness cues. Specifically, inspired by the different sensibility of humans to high-frequency and low-frequency information, we design a more comprehensive color measurement in spatial domain and frequency domain. In addition, for the low contrast caused by the backward scattering, we propose a dark channel prior weighted contrast measure to enhance the discrimination ability of the original contrast measurement. The sharpness measurement is used to evaluate the blur effect caused by the forward scattering of the underwater image. Finally, these three measurements are combined by the weighted summation, where the weighed coefficients are obtained by multiple linear regression. Moreover, we collect a large dataset for underwater image quality assessment for testing and evaluating different methods. Experiments on this dataset demonstrate the superior performance both qualitatively and quantitatively.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a Deep High-Resolution Pseudo-Siamese Framework (PS-HRNet) to solve the cross-resolution person re-ID problem.
Abstract: Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. This problem, named Cross-Resolution Person Re-identification, presents a great challenge for the accurate person matching. In this paper, we propose a Deep High-Resolution Pseudo-Siamese Framework (PS-HRNet) to solve the above problem. Specifically, we first improve the VDSR by introducing existing channel attention (CA) mechanism and harvest a new module, i.e., VDSR-CA, to restore the resolution of low-resolution images and make full use of the different channel information of feature maps. Then we reform the HRNet by designing a novel representation head, HRNet-ReID, to extract discriminating features. In addition, a pseudo-siamese framework is developed to reduce the difference of feature distributions between low-resolution images and high-resolution images. The experimental results on five cross-resolution person datasets verify the effectiveness of our proposed approach. Compared with the state-of-the-art methods, the proposed PS-HRNet improves the Rank-1 accuracy by 3.4%, 6.2%, 2.5%,1.1% and 4.2% on MLR-Market-1501, MLR-CUHK03, MLR-VIPeR, MLR-DukeMTMC-reID, and CAVIAR datasets, respectively, which demonstrates the superiority of our method in handling the Cross-Resolution Person Re-ID task. Our code is available at https://github.com/zhguoqing .

Journal ArticleDOI
TL;DR: This paper explores how to find out the most effective EEG features and channels for emotion recognition so as to only collect data as less as possible and demonstrates that the optimal channel set has extremely high similarity on the self‐collected data set and the public data set.
Abstract: Emotion recognition has become an important component of human–computer interaction systems. Research on emotion recognition based on electroencephalogram (EEG) signals are mostly conducted by the analysis of all channels' EEG signals. Although some progresses are achieved, there are still several challenges such as high dimensions, correlation between different features and feature redundancy in the realistic experimental process. These challenges have hindered the applications of emotion recognition to portable human–computer interaction systems (or devices). This paper explores how to find out the most effective EEG features and channels for emotion recognition so as to only collect data as less as possible. First, discriminative features of EEG signals from different dimensionalities are extracted for emotion classification, including the first difference, multiscale permutation entropy, Higuchi fractal dimension, and discrete wavelet transform. Second, relief algorithm and floating generalized sequential backward selection algorithm are integrated as a novel channel selection method. Then, support vector machine is employed to classify the emotions for verifying the performance of the channel selection method and extracted features. At last, experimental results demonstrate that the optimal channel set, which are mostly located at the frontal, has extremely high similarity on the self‐collected data set and the public data set and the average classification accuracy is achieved up to 91.31% with the selected 10‐channel EEG signals. The findings are valuable for the practical EEG‐based emotion recognition systems.

Journal ArticleDOI
TL;DR: This article proposes a novel hybrid 2-D–3-D deep residual attentional network (HDRAN) with structure tensor constraints, which can take fully advantage of the spatial–spectral context information in the reconstruction progress and achieves the state-of-the-art performance in terms of mean relative absolute error (MRAE) and root mean square error (RMSE) on both the “clean” and “real world” tracks in the NTIRE 2018
Abstract: RGB image spectral super-resolution (SSR) is a challenging task due to its serious ill-posedness, which aims at recovering a hyperspectral image (HSI) from a corresponding RGB image In this article, we propose a novel hybrid 2-D–3-D deep residual attentional network (HDRAN) with structure tensor constraints, which can take fully advantage of the spatial–spectral context information in the reconstruction progress Previous works improve the SSR performance only through stacking more layers to catch local spatial correlation neglecting the differences and interdependences among features, especially band features; different from them, our novel method focuses on the context information utilization First, the proposed HDRAN consists of a 2D-RAN following by a 3D-RAN, where the 2D-RAN mainly focuses on extracting abundant spatial features, whereas the 3D-RAN mainly simulates the interband correlations Then, we introduce 2-D channel attention and 3-D band attention mechanisms into the 2D-RAN and 3D-RAN, respectively, to adaptively recalibrate channelwise and bandwise feature responses for enhancing context features Besides, since structure tensor represents structure and spatial information, we apply structure tensor constraint to further reconstruct more accurate high-frequency details during the training process Experimental results demonstrate that our proposed method achieves the state-of-the-art performance in terms of mean relative absolute error (MRAE) and root mean square error (RMSE) on both the “clean” and “real world” tracks in the NTIRE 2018 Spectral Reconstruction Challenge As for competitive ranking metric MRAE, our method separately achieves a 1606% and 290% relative reduction on two tracks over the first place Furthermore, we investigate HDRAN on the other two HSI benchmarks noted as the CAVE and Harvard data sets, also demonstrating better results than state-of-the-art methods

Proceedings ArticleDOI
30 May 2021
TL;DR: Li et al. as discussed by the authors proposed a lightweight adaptive feature fusion network (LAFFNet), which subsumes multiple branches with different kernel sizes to generate multi-scale feature maps adaptively, and channel attention is used to merge these feature maps.
Abstract: Underwater image enhancement is an important low-level computer vision task for autonomous underwater vehicles and remotely operated vehicles to explore and understand the underwater environments. Recently, deep convolutional neural networks (CNNs) have been successfully used in many computer vision problems, and so does underwater image enhancement. There are many deep-learning-based methods with impressive performance for underwater image enhancement, but their memory and model parameter costs are hindrances in practical application. To address this issue, we propose a lightweight adaptive feature fusion network (LAFFNet). The model is the encoder-decoder model with multiple adaptive feature fusion (AAF) modules. AAF subsumes multiple branches with different kernel sizes to generate multi-scale feature maps. Furthermore, channel attention is used to merge these feature maps adaptively. Our method reduces the number of parameters from 2.5M to 0.15M (around 94% reduction) but outperforms state-of-the-art algorithms by extensive experiments. Furthermore, we demonstrate our LAFFNet effectively improves high-level vision tasks like salience object detection and single image depth estimation.

Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed a channel enhancement feature pyramid network (CE-FPN) with three simple yet effective modules to alleviate the loss of semantical information due to channel reduction.
Abstract: Feature pyramid network (FPN) has been an effective framework to extract multi-scale features in object detection. However, current FPN-based methods mostly suffer from the intrinsic flaw of channel reduction, which brings about the loss of semantical information. And the miscellaneous fused feature maps may cause serious aliasing effects. In this paper, we present a novel channel enhancement feature pyramid network (CE-FPN) with three simple yet effective modules to alleviate these problems. Specifically, inspired by sub-pixel convolution, we propose a sub-pixel skip fusion method to perform both channel enhancement and upsampling. Instead of the original 1x1 convolution and linear upsampling, it mitigates the information loss due to channel reduction. Then we propose a sub-pixel context enhancement module for extracting more feature representations, which is superior to other context methods due to the utilization of rich channel information by sub-pixel convolution. Furthermore, a channel attention guided module is introduced to optimize the final integrated features on each level, which alleviates the aliasing effect only with a few computational burdens. Our experiments show that CE-FPN achieves competitive performance compared to state-of-the-art FPN-based detectors on MS COCO benchmark.