scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2022"


Journal ArticleDOI
TL;DR: In this paper , an image generation scheme is designed based on quantum generative adversarial networks, and two structures of quantum GANs are simulated on Bars and Stripes dataset, and the results corroborate that the quantum generator with reduced parameters has no visible performance loss.
Abstract: It has been reported that quantum generative adversarial networks have a potential exponential advantage over classical generative adversarial networks. However, quantum machine learning is difficult to find real applications in the near future due to the limitation of quantum devices. The structure of quantum generator is optimized to reduce the required parameters and make use of quantum devices to a greater extent. And an image generation scheme is designed based on quantum generative adversarial networks. Two structures of quantum generative adversarial networks are simulated on Bars and Stripes dataset, and the results corroborate that the quantum generator with reduced parameters has no visible performance loss. The original complex multimodal distribution of an image can be converted into a simple unimodal distribution by the remapping method. The MNIST images and the Fashion-MNIST images are successfully generated by the optimized quantum generator with the remapping method, which verified the feasibility of the proposed image generation scheme. • The structure of quantum generator is optimized to reduce the required parameters and make use of quantum devices. • An image generation scheme is designed based on quantum generative adversarial networks. • The multimodal distribution of an image is converted into a simple unimodal distribution by the remapping method. • The MNIST images and the Fashion-MNIST ones are generated by the optimized quantum generator.

27 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors applied the Weber-Fechner law to the grayscale mapping in logarithmic space and proposed an adaptive and simple color image enhancement method.
Abstract: In an environment with poor illumination, such as indoor, night, and overcast conditions, the image information can be seriously lost, which affects the visual effect and degrades the performance of machine systems. However, existing methods such as retinex-based method, dehazing model-based method, and machine learning-based method usually have high computational complexity or are prone to color distortion, noise amplification, and halo artifacts. To balance the enhancement effect and processing speed, this paper applies the Weber–Fechner law to the grayscale mapping in logarithmic space and proposed an adaptive and simple color image enhancement method based on the improved logarithmic transformation. In the framework, the brightness component is extracted from the scene of the low-light image using Gaussian filtering after color space conversion. The image is logarithmically transformed by adaptively adjusting the parameters of the illumination distribution to improve the brightness of the image. The color saturation is hence compensated. The proposed algorithm adaptively reduces the impact of non-uniform illumination on the image, and the enhanced image is clear and natural. Our experimental results demonstrate improved performance to the existing image enhancement approaches. • It is a simple and effective strategy to map the gray levels of the image pixels to logarithmic space based on Weber–Fechner law. • In the framework, the image is logarithmically transformed by adaptively adjusting the parameters of the illumination distribution and has the compensation mechanism of color saturation. • It applies local and global information to reduce the influence of non-uniform illumination on the image adaptively, and the enhanced image is clear and natural. • This method does not need lots of datasets for training and can produce satisfying results with less computational complexity.

26 citations


Journal ArticleDOI
TL;DR: This study introduces classification of digital watermarking schemes by their robustness towards various classes of attacks, and highlights the most promising solutions that further development could lead to schemes with better robustness compared to existing schemes.
Abstract: Digital watermarking is an important scientific direction located at the intersection of cybersecurity and multimedia processing. Digital watermarking is used for digital objects copyright protection and protection against forgery. The importance of those tasks is highlighted by the ongoing COVID-19 pandemic that lasts for more than a year and forced multiple industries to transit to online. The main digital watermarking application include active attackers whose goal is to destroy or damage watermarks. Thus, digital watermarking schemes must be robust to attacks. Despite of the large number of existing digital watermarking schemes, a comprehensive analysis of their robustness towards various attacks does not exist. Our study aims to fill this gap. In this article, latest years findings in digital watermarking are systematized. First of all, we review the most prominent distinctive features of watermarking schemes, attack types, and performance measures to facilitate navigation in digital image watermarking. Then, we introduce classification of digital watermarking schemes by their robustness towards various classes of attacks. We believe this classification could be useful for researchers who aim to choose a digital watermarking scheme for their application with a known set of attacks. Finally, we highlight the most promising solutions that further development could lead to schemes with better robustness compared to existing schemes.

20 citations


Journal ArticleDOI
TL;DR: Recently, a remarkable progress has been made in utilizing deep learning (DL) approaches for low-light image (LLI) enhancement as mentioned in this paper , which is an important image processing task that aims at improving the illumination of images taken under lowlight conditions.
Abstract: Low-light image (LLI) enhancement is an important image processing task that aims at improving the illumination of images taken under low-light conditions. Recently, a remarkable progress has been made in utilizing deep learning (DL) approaches for LLI enhancement. This paper provides a concise and comprehensive review and comparative study of the most recent DL models used for LLI enhancement. To our knowledge, this is the first comparative study dedicated to DL-based models for LLI enhancement. We address LLI enhancement in two ways: (i) standalone, as a separate task, and (ii) end-to-end, as a pre-processing stage embedded within another high-level computer vision task, namely object detection and classification. The paper consists of six logical parts. First, we provide an overview of the background and literature in LLI enhancement. Second, we describe the test data and experimental setup of the study. Third, we present a quantitative and qualitative comparison of the visual and perceptual quality achieved by 10 of the most recent DL-based LLI enhancement models. Fourth, we present a comparative analysis for object detection and classification performance achieved by 4 different object detection models applied on LLIs and their enhanced counterparts. Fifth, we perform a feature analysis of DL feature maps extracted from normal, low-light, and enhanced images, and perform the occlusion experiment to better understand the effect of LLI enhancement on the object detection and classification task. Finally, we provide our conclusions and highlight future steps and potential directions. • Evaluates Deep Learning (DL) models for Low-light Image (LLI) enhancement. • Compares 10 LLI enhancement models and 4 object detection and classification models. • Provides a quantitative and qualitative comparison of visual and perceptual quality. • Evaluates impact of LLI enhancement on object detecting and classification quality. • Performs occlusion experiment to study LLI enhancement’s effect on object detection.

18 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a two-stage object detection framework called "Focus-and-Detect" which consists of an object detector network supervised by a Gaussian Mixture Model, which generates clusters of objects constituting the focused regions.
Abstract: Despite recent advances, object detection in aerial images is still a challenging task. Specific problems in aerial images makes the detection problem harder, such as small objects, densely packed objects, objects in different sizes and with different orientations. To address small object detection problem, we propose a two-stage object detection framework called “Focus-and-Detect”. The first stage which consists of an object detector network supervised by a Gaussian Mixture Model, generates clusters of objects constituting the focused regions. The second stage, which is also an object detector network, predicts objects within the focal regions. Incomplete Box Suppression (IBS) method is also proposed to overcome the truncation effect of region search approach. Results indicate that the proposed two-stage framework achieves an AP score of 42.06 on VisDrone validation dataset, surpassing all other state-of-the-art small object detection methods reported in the literature, to the best of authors’ knowledge.

16 citations


Journal ArticleDOI
TL;DR: In this paper , a comprehensive analysis of digital watermarking schemes' robustness to various types of attacks is presented. But, despite the large number of existing watermark schemes, a complete analysis of their robustness against various attacks does not exist.
Abstract: Digital watermarking is an important scientific direction located at the intersection of cybersecurity and multimedia processing. Digital watermarking is used for digital objects copyright protection and protection against forgery. The importance of those tasks is highlighted by the ongoing COVID-19 pandemic that lasts for more than a year and forced multiple industries to transit to online. The main digital watermarking application include active attackers whose goal is to destroy or damage watermarks. Thus, digital watermarking schemes must be robust to attacks. Despite of the large number of existing digital watermarking schemes, a comprehensive analysis of their robustness towards various attacks does not exist. Our study aims to fill this gap. In this article, latest years findings in digital watermarking are systematized. First of all, we review the most prominent distinctive features of watermarking schemes, attack types, and performance measures to facilitate navigation in digital image watermarking. Then, we introduce classification of digital watermarking schemes by their robustness towards various classes of attacks. We believe this classification could be useful for researchers who aim to choose a digital watermarking scheme for their application with a known set of attacks. Finally, we highlight the most promising solutions that further development could lead to schemes with better robustness compared to existing schemes.

13 citations


Journal ArticleDOI
TL;DR: In this article , a morphologically dilated convolutional neural network (MDCNN) was proposed to extract more robust spectral-spatial features for mapping land cover through hyperspectral images.
Abstract: The use of hyperspectral images is expanding rapidly with the advancement of remote sensing technologies. The precise classification of features for mapping land cover through hyperspectral images is a major research topic and a major focus. In the classification of hyperspectral images, several methods provided good classification results. Among all, convolutional neural network (CNN) is a widely used deep neural network due to its robust feature extraction capabilities. It can enhance the hyperspectral image classification accuracy. Mathematical morphology (MM) is a robust and straightforward spatial feature descriptor, which can reduce the computational workload. We proposed a novel model morphologically dilated Convolutional Neural Network (MDCNN), which can extract more robust spectral–spatial features. MDCNN adopt a process to concatenate the morphological feature maps with original hyperspectral data. CNN structure uses both traditional and dilated convolution. Replacing the dilated convolution in traditional convolution layers expands the receptive field without boosting parameters and thus improves network performance without increasing network complexity. The dilation layer does not reduce the number of parameters but reduces the size of the output feature map, which leads to the overall reduction in the number of parameters. 3D convolution extracts spectral–spatial features and maintains the correlation of spectral data. 2D CNN extracts spatial features and reduces the model’s complexity, which can occur if only 3D convolution is used. Experimental findings show that the proposed approach can provide better classification results than traditional deep learning models and other state-of-the-art models on the Indian Pines, University of Pavia, and Salinas Scene data.

13 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a fast and more secure algorithm for protecting batch images by employing chaotic intrinsic properties, reversible steganography, and parallel computing, which can achieve superior results with high efficiency.
Abstract: In real-world applications, the ability to encrypt a large quantity of images is highly desirable. A branch of works resort chaotic encryption to encrypt images but mostly aim at one image for one-time encryption. Besides, encryption algorithms that can support processing batch images are strongly needed for acceleration in the scenario of massive data. To this end, we propose a fast and more secure algorithm for protecting batch images by employing chaotic intrinsic properties, reversible steganography, and parallel computing. First, our algorithm assigns batch images evenly to each thread, in which Cipher Block Chaining (CBC) mode is applied among neighboring images for encryption. Later, the identifier number of each thread and its CBC indexes are encrypted and embedded into the corresponding cipher-images using reversible steganography, which enhances the security of existing chaotic encryption. Besides, keystreams are associated with the plaintext for the goal of resisting chosen-plaintext attack. The experimental results and security analyses show that our algorithm can achieve superior results with high efficiency.

13 citations


Journal ArticleDOI
TL;DR: In this article, a morphologically dilated convolutional neural network (MDCNN) was proposed to extract more robust spectral-spatial features for mapping land cover through hyperspectral images.
Abstract: The use of hyperspectral images is expanding rapidly with the advancement of remote sensing technologies. The precise classification of features for mapping land cover through hyperspectral images is a major research topic and a major focus. In the classification of hyperspectral images, several methods provided good classification results. Among all, convolutional neural network (CNN) is a widely used deep neural network due to its robust feature extraction capabilities. It can enhance the hyperspectral image classification accuracy. Mathematical morphology (MM) is a robust and straightforward spatial feature descriptor, which can reduce the computational workload. We proposed a novel model morphologically dilated Convolutional Neural Network (MDCNN), which can extract more robust spectral–spatial features. MDCNN adopt a process to concatenate the morphological feature maps with original hyperspectral data. CNN structure uses both traditional and dilated convolution. Replacing the dilated convolution in traditional convolution layers expands the receptive field without boosting parameters and thus improves network performance without increasing network complexity. The dilation layer does not reduce the number of parameters but reduces the size of the output feature map, which leads to the overall reduction in the number of parameters. 3D convolution extracts spectral–spatial features and maintains the correlation of spectral data. 2D CNN extracts spatial features and reduces the model’s complexity, which can occur if only 3D convolution is used. Experimental findings show that the proposed approach can provide better classification results than traditional deep learning models and other state-of-the-art models on the Indian Pines, University of Pavia, and Salinas Scene data.

13 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an underwater image enhancement method via integrated RGB and LAB color models (RLCM), wherein the local contrast of the L channel is enhanced by a histogram with local enhancement and exposure cut-off strategy, whereas the difference between the A and B channels is traded-off by a gain equalization strategy.
Abstract: Images taken underwater suffers from color shift and poor visibility because the light is absorbed and scattered when it travels through water. To handle the issues mentioned above, we propose an underwater image enhancement method via integrated RGB and LAB color models (RLCM). In the RGB color model, we first fully consider the leading causes of underwater image color shift, and then the poor color channels are corrected by dedicated fractions, which are designed via calculating the differences between the well and poor color channels. In the LAB color model, wherein the local contrast of the L channel is enhanced by a histogram with local enhancement and exposure cut-off strategy, whereas the difference between the A and B channels is traded-off by a gain equalization strategy. Besides, a normalized guided filtering strategy is incorporated into the histogram enhancement process to mitigate the effects of noise. Ultimately, the image is inverted from the LAB color model to the RGB color model, and a detail sharpening strategy is implemented in each channel to obtain a high-quality underwater image. Experiments on various real-world underwater images demonstrate that our method outputs better results with natural color and high visibility. • A dedicated fractions-based method to tackle the color shifts of underwater images. • A local enhancement and exposure cut-off-based histogram to improve local contrast. • A gain equalization method to trade-off the differences between the A and B channels. • A detail sharpening method to improve the details and edges of the output image.

13 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new LLI enhancement model called LLHFNet (Low-light Homomorphic Filtering Network) which performs image-to-frequency filter learning and is designed for seamless integration into classification models.
Abstract: Low-light image (LLI) enhancement techniques have recently demonstrated remarkable progress especially with the use of deep learning approaches. However, most existing techniques are developed as standalone solutions and do not take into account the impact of LLI enhancement on high-level computer vision tasks like object classification. In this paper, we propose a new LLI enhancement model titled LLHFNet (Low-light Homomorphic Filtering Network) which performs image-to-frequency filter learning and is designed for seamless integration into classification models. Through this integration, the classification model is embedded with an internal enhancement capability and is jointly trained to optimize both image enhancement and classification performance. We have conducted a large battery of experiments using SICE, Pascal VOC, and ExDark datasets, to quantitatively and qualitatively evaluate our approach’s enhancement quality and classification performance. When evaluated as a standalone enhancement model, our solution consistently ranks among the best existing image enhancement techniques. When embedded with a classification model, our solution achieves an average 5.5% improvement in classification accuracy, compared with the traditional pipeline of separate enhancement followed by classification. Results produce robust classification quality on both LLIs and normal-light images (NLIs), and highlight a clear improvement to the literature.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a multi-focus color image fusion algorithm based on low vision image reconstruction and focus feature extraction, which improves the recognition accuracy of decision focus and defocused areas.
Abstract: Multi-focus image fusion is a process of generating fused images by merging multiple images with different degrees of focus in the same scene. In multi-focus image fusion, the accuracy of the detected focus area is critical for improving the quality of the fused image. Combining the structural gradient, we propose a multi-focus color image fusion algorithm based on low vision image reconstruction and focus feature extraction. First, the source images are input into the deep residual network (ResNet) to conduct the low vision image reconstructed by the super-resolution method. Next, an end-to-end restoration model is used to improve the image details and maintain the edges of the image through rolling guidance filter. What is more, the difference image is obtained from the reconstructed image and the source image. Then, the fusion decision map is generated based on the focus area detection method based on structural gradient. Finally, the source image and the fusion decision map are used for weighted fusion to generate a fusion image. Experimental results show that our algorithm is quite accurate in detecting the edge of the focus area. Compared with other algorithms, the proposed algorithm improves the recognition accuracy of decision focus and defocused areas. It can well retain the detailed texture features and edge structure of the source image.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors used the feature transfer module and feature fusion module to consider both the shallow content features and the deep semantic features of the image, thereby improving the accuracy of identifying computer-generated images.
Abstract: The rapid development of digital image processing technology and the popularization of image editors have made computer-generated images more realistic so that it is more challenging to distinguish computer-generated images from natural images with the naked eye. Simultaneously, the malicious potential of highly realistic computer-generated images has made the detection of the authenticity of digital images a significant research area. In this work, we propose a computer-generated image detection algorithm through the application of transfer learning and Convolutional Block Attention Module. More specifically, our method uses the feature transfer module and feature fusion module to consider both the shallow content features and the deep semantic features of the image, thereby improving the accuracy of identifying computer-generated images. We validate our method through extensive experiments on various datasets, and the experimental results show that our model outperformed state-of-the-art methods and achieved an accuracy of 0.963. We also show that the proposed model has strong robustness and high generalization ability.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a multi-focus color image fusion algorithm based on low vision image reconstruction and focus feature extraction, which improves the recognition accuracy of decision focus and defocused areas.
Abstract: Multi-focus image fusion is a process of generating fused images by merging multiple images with different degrees of focus in the same scene. In multi-focus image fusion, the accuracy of the detected focus area is critical for improving the quality of the fused image. Combining the structural gradient, we propose a multi-focus color image fusion algorithm based on low vision image reconstruction and focus feature extraction. First, the source images are input into the deep residual network (ResNet) to conduct the low vision image reconstructed by the super-resolution method. Next, an end-to-end restoration model is used to improve the image details and maintain the edges of the image through rolling guidance filter. What is more, the difference image is obtained from the reconstructed image and the source image. Then, the fusion decision map is generated based on the focus area detection method based on structural gradient. Finally, the source image and the fusion decision map are used for weighted fusion to generate a fusion image. Experimental results show that our algorithm is quite accurate in detecting the edge of the focus area. Compared with other algorithms, the proposed algorithm improves the recognition accuracy of decision focus and defocused areas. It can well retain the detailed texture features and edge structure of the source image.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a wavelet-integrated, identity preserving, adversarial (WIPA) approach, which used wavelet prediction blocks attached to a baseline CNN to predict wavelet missing details of facial images.
Abstract: Super-resolution of face images, known as Face Hallucination (FH), has been excessively studied in recent years. Modern FH methods use deep Convolution Neural Networks (CNN) with a pixel-wise MSE loss function to infer high-resolution facial images. The MSE-oriented approaches generate over-smooth results, particularly when dealing with very low-resolution images. Recently, Generative Adversarial Networks (GANs) have successfully been exploited to synthesize perceptually more pleasant images. However, the GAN-based models do not guarantee identity preservation during face super-resolution. To address these challenges, we have proposed a novel Wavelet-integrated, Identity Preserving, Adversarial (WIPA) approach. Specifically, we present Wavelet Prediction blocks attached to a Baseline CNN network to predict wavelet missing details of facial images. The extracted wavelet coefficients are concatenated with original feature maps in different scales to recover fine details. Unlike other wavelet-based FH methods, this algorithm exploits the wavelet-enriched feature maps as complementary information to facilitate the hallucination task. We introduce a wavelet prediction loss to push the network to generate wavelet coefficients. In addition to the wavelet-domain cost function, a combination of perceptual, adversarial, and identity loss functions has been utilized to achieve low-distortion and perceptually high-quality images while maintaining identity. The extensive experiments prove the superiority of the proposed approach over the state-of-the-art methods by achieving PSNR of 25.16 dB for CelebA dataset and verification rate of 86.1% for LFW dataset; both conducted on 8X magnification factor.

Journal ArticleDOI
TL;DR: In this paper , the color constancy mechanism in photoreceptors and horizontal cells (HCs) was used to correct the color distortion and to solve the problems of blurry and noise.
Abstract: Underwater images are usually characterized by color distortion, blurry, and severe noise, because light is severely scattered and absorbed when traveling in the water. In this paper, we propose a novel method motivated by the astonishing capability of the biological vision to address the low visibility of the real-world underwater images. Firstly, we simply imitate the color constancy mechanism in photoreceptors and horizontal cells (HCs) to correct the color distortion. In particular, HCs modulation provides a global color correction with gain control, in which light wavelength-dependent absorption is taken into account. Then, to solve the problems of blurry and noise, we introduce a straightforward and effective two-pathway dehazing method. The core idea is to decompose the color corrected image into structure-pathway and texture-pathway, corresponding to the Magnocellular (M-) and Parvocellular (P-) pathway in the early visual system. In the structure-pathway, we design an innovative biological normalization model to adjust the dynamic range of luminance by integrating the bright and dark regions. By using this approach, the proposed method leads to significant improvement in the contrast degradation of underwater images. Additionally, the detail preservation and noise suppression are implemented on the textural information. Finally, we merge the outputs of structure and texture pathways to reconstruct the enhanced underwater image. Both qualitative and quantitative evaluations show that the proposed biologically-inspired method achieves better visual quality, when compared with several related methods.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a reversible data hiding in encrypted images (RDHEI) method without any additional information transmission between the image owner and the data hider, which improves the performance in terms of embedding and reversibility compared to the baseline methods.
Abstract: Reversible data hiding in encrypted images (RDHEI) is an essential branch of image reversible data hiding. Over the past decade, many significant achievements have been made. However, for most RDHEI researches, the cloud service user (CSU) demands either an encryption method specified by the cloud service provider (CSP) or prior knowledge about the original image to indicate the preprocessing operation. This characteristic may lead to some security issues like information leakage that may limit the use of this type of technology. To solve this problem, we propose a simple but effective RDHEI method that separates the CSU and CSP. The owner CSU exclusively sends the standard stream cipher image to the CSP, while the CSP can embed message data into the encrypted image through a simple MSB replacement without knowing the encrypted method or prior knowledge about the original image. The embedded message data can be extracted; meanwhile, the image can be approximately recovered by another authentic CSU. Since the proposed fundamental method is not error-free in most circumstances, we provide an alternative option to achieve reversibility by transmitting a perfect recovery key from the owner CSU to any authentic CSU. The experimental results demonstrate the efficacy of the proposed method. • This paper proposes an RDHEI method without any additional information transmission between the image owner and the data hider. • The security level has a significant improvement compared to the methods with preprocessing. • The proposed method improves the performance in terms of embedding and reversibility compared to the baseline methods. • The embedding performance is not influenced by the image contents, and the algorithmic complexity is decreased.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a parallel multiscale context-based edge-preserving optical flow estimation method with occlusion detection, named PMC-PWC.
Abstract: Although convolutional neural network (CNN)-based optical flow approaches have exhibited good performance in terms of computational accuracy and efficiency in recent years, the issue of edge-blurring caused by motion occlusions remains. In this paper, we propose a parallel multiscale context-based edge-preserving optical flow estimation method with occlusion detection, named PMC-PWC. First, we exploit a parallel multiscale context (PMC) network for occlusion detection, in which the proposed PMC model is able to aggregate the multiscale context information to develop the performance of occlusion detection near motion boundaries. Second, we combine the PMC model with a context network to plan an occlusion estimation module and incorporate it into a pyramid, warping, and cost volume model to construct an edge-preserving optical flow computation network. Third, we design a novel loss function including an endpoint error (EPE)-based loss, a binary cross-entropy loss and an edge loss to supervise the proposed PMC-PWC network to produce optical flow and occlusion simultaneously. Finally, we run the proposed PMC-PWC method on the MPI-Sintel and KITTI datasets to conduct a comprehensive comparison with several state-of-the-art approaches. The experimental results indicate that the proposed PMC-PWC method performed well in terms of both accuracy and robustness, especially due to the significant benefits of edge preservation and occlusion handling. • We construct a parallel multiscale context network for occlusion detection, which extracts multiscale context information to refine the occlusion boundaries. • We combine the PMC network with a context network to establish an occlusion detection module and incorporate it into a pyramid, warping, and cost volume network to construct an edge-preserving optical flow model. • We exploit a novel loss function by integrating an edge loss with an EPE-based loss and a binary cross-entropy loss. The proposed loss function supervises the network to estimate flow field and occlusions simultaneously.

Journal ArticleDOI
TL;DR: CARL-D as discussed by the authors is a large-scale dataset and benchmark suite for developing 2D object detection and instance/pixel-level segmentation methods for self-driving cars in traffic scenarios of sub-continent countries like Pakistan, India, Bangladesh, and Sri Lanka.
Abstract: Vision-based object detection and scene understanding are becoming key features of environment perception and autonomous driving . In the past couple of years, numerous large-scale datasets for visual object detection and semantic understanding have been released which has enormously benefited the environment perception in self-driving cars. However, these datasets like Kitti and Cityscapes only focus on well organized and urban road traffic scenarios of European countries while ignoring the dense and unpattern traffic conditions of sub-continent countries like Pakistan, India, Bangladesh, and Sri Lanka. Consequently, the environment perception system developed on these datasets cannot efficiently assist self-driving cars in traffic scenarios of sub-continent countries. To this end, we present CARL-D, a large-scale dataset and benchmark suite for develop 2D object detection and instance/pixel-level segmentation methods for self-driving cars. CARL-D comprises large-scale stereo vision-based driving videos captured from more than 100 cities of Pakistan, including motorways, dense and unpattern traffic scenarios of urban, rural, and hilly areas. As a benchmark selection, 15,000 suitable images are labeled for 2D object detection and recognition. Whereas semantic segmentation benchmark contains 2500 images with pixel-level high-quality fine annotations and 5000 coarse-annotated images which could help in enabling the deep neural networks to leverage the weakly labeled data. Alongside the dataset, we also present transfer learning-based 2D vehicle detection and scene segmentation methods to evaluate the performance of existing state-of-the-art deep neural networks on our dataset. Lastly, an extensive experimental evaluation along with the comparative study has been carried out which demonstrates the upper edge of our dataset in terms of interclass-diversity, scene-variability, and annotations richness. The proposed benchmark suite is available at https://carl-dataset.github.io/index/ . • A large-scale dataset for 2D vehicle detection and scene understanding in self-driving cars in un-pattern rural, urban, and hilly road traffic scenarios. • Scene segmentation benchmark – 7500 pixel-level annotated images including 44 classes grouped into nature, infrastructure, moving and static objects, signboards, and misc. • Vehicle detection & recognition benchmark – 15,000 annotated images covering 25 classes with 50,348 labels. • A comparative analysis of existing vehicle detection and scene segmentation dataset with our proposed benchmarks.

Journal ArticleDOI
TL;DR: In this article , a conditional generative adversarial network with a dual-branch progressive generator is proposed to asymptotically enhance underwater images, which consists of two independent branches and a progressive enhancement algorithm.
Abstract: Underwater images are of great significance for exploring and utilizing the marine environment. However, the raw underwater images usually suffer from distorted color and low contrast due to the attenuation of light. To solve this problem, we present a conditional generative adversarial network with dual-branch progressive generator, which can asymptotically enhance underwater images. In particular, the generator consists of two independent branches and a progressive enhancement algorithm. The dual-branch structure is designed to generate a base image and numerous parameter maps required by progressive enhancement respectively. The progressive enhancement algorithm is proposed to iteratively improve the quality of underwater images. Meanwhile, an iterative function is constructed to guide image enhancement in the progressive enhancement algorithm. Finally, a simple discriminator and multiple effective loss functions are adopted to optimize the progressive process of underwater image enhancement. The qualitative and quantitative experiments on synthetic and real-world underwater datasets demonstrate that the proposed method can achieve superior performance against several representative underwater image processing methods. Furthermore, a series of ablation studies are presented to show the contribution of each branch in our model.

Journal ArticleDOI
TL;DR: In this article , a Siamese network is designed to jointly train fully supervised and weakly supervised images with the help of an auxiliary cross-field and cross-attention (CFCA) network that maps features from the classification field to the segmentation field with a crossattention mechanism.
Abstract: Surface defect segmentation from industrial images based on deep learning has been rapidly developed in recent years. However, the related methods depend heavily on a large number of costly pixel-level labels. This paper introduces a method that uses abundant weakly supervised images and a few fully supervised images to reduce the labeling expense and obtain an excellent segmentation result. First, we propose a novel weakly supervised semantic segmentation method, the multistage squeeze-and-excitation (SE)-augmented adaptive Lp norm class activation map (multistage SALN-CAM), which adopts both position-level and channel-level attention mechanisms. Based on multistage SALN-CAM, a Siamese network is designed to jointly train fully supervised and weakly supervised images simultaneously with the help of an auxiliary cross-field and cross-attention (CFCA) network that maps features from the classification field to the segmentation field with a cross-attention mechanism. Finally, we retrain a fully supervised segmentation model using images with pixel-level labels or pseudo-pixel-level labels generated by the Siamese network. In our experiments, based on the Severstal: Steel Defect Detection dataset, our method, in which 25% of the images (2512) have pixel-level labels and 75% (7544) of the images have class-level labels, obtains an m D i c e of 94.01% on the test set. This result completely surpasses that of the fully supervised method, which uses 4775 images; thus, our method reduces the labeling expense by 37%. • We propose a novel weakly supervised semantic segmentation method for small defects. • An elaborate Siamese network is proposed for the hybrid supervised dataset. • Our hybrid supervised method obtains excellent results and saves the labeling cost.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors introduced learned convolutional regularizers into multifocus image fusion and proposed Convolutional Analysis Operator Learning (CAOL) based multifocus Image fusion algorithm.
Abstract: Sparse representation (SR), convolutional sparse representation (CSR) and convolutional dictionary learning (CDL) are synthetic-based priors that have proven to be successful in signal inverse problems (such as multifocus image fusion). Unlike “synthesis” formulas, “analysis” model assigns probabilities to signals through various forward measurements of signals. Analysis operator learning (AOL) is a classical analysis-based learning method. Convolutional analysis operator learning (CAOL) is convolutional form of AOL. CAOL uses unsupervised learning method to train autocoded convolutional neural network (CNN) to more accurately solve inverse problem. From the perspective of CAOL, this paper introduces learned convolutional regularizers into multifocus image fusion and proposes CAOL-based multifocus image fusion algorithm. In the CDL stage, convergent block proximal extrapolated gradient method with majorizer (BPEG-M) and adaptive momentum restarting scheme are used. In the sparse fusion stage, alternating direction method of multipliers (ADMM) approach with convolutional basis pursuit denoising (CBPDN) and l1 norm maximum strategy are employed for high-frequency and low-frequency component, respectively. 3 types of multifocus images (static gray images, gray images in sports and color images) are tested to verify performance of the proposed method. A comparison with representative methods demonstrates superiority of our method in terms of subjective observation and objective evaluation.

Journal ArticleDOI
TL;DR: In this article, an underwater image restoration algorithm is introduced based on color compensation and color line model, which can obtain images with significantly enhanced visibility and higher color fidelity by inverting the form of simplified imaging model.
Abstract: The images captured underwater suffer from low contrast and color distortion owing to the harsh underwater imaging environment. An underwater image restoration algorithm is introduced in this paper based on color compensation and color-line model. Firstly, color compensation method is proposed to compensate the attenuated color information and improve the accuracy of estimated color lines. Secondly, the relationship of three-channel transmission is derived from the association between the global background light and the inherent optical characteristics. Then, an underwater image local transmission optimization model is established in light of the color line law to acquire local transmission map. Finally, the restoration images are performed by inverting the form of the simplified imaging model. Experimental results show that our method is superior to other methods in subjective evaluation, objective evaluation, color accuracy tests and application tests. This method can obtain images with significantly enhanced visibility and higher color fidelity.

Journal ArticleDOI
TL;DR: In this paper , an underwater image restoration algorithm is introduced based on color compensation and color line model, which can obtain images with significantly enhanced visibility and higher color fidelity by inverting the form of simplified imaging model.
Abstract: The images captured underwater suffer from low contrast and color distortion owing to the harsh underwater imaging environment. An underwater image restoration algorithm is introduced in this paper based on color compensation and color-line model. Firstly, color compensation method is proposed to compensate the attenuated color information and improve the accuracy of estimated color lines. Secondly, the relationship of three-channel transmission is derived from the association between the global background light and the inherent optical characteristics. Then, an underwater image local transmission optimization model is established in light of the color line law to acquire local transmission map. Finally, the restoration images are performed by inverting the form of the simplified imaging model. Experimental results show that our method is superior to other methods in subjective evaluation, objective evaluation, color accuracy tests and application tests. This method can obtain images with significantly enhanced visibility and higher color fidelity.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel feature aggregation network FANet including a feature extraction module and an aggregation module for RGBD saliency detection, where the feature aggregation module consisting of a designed region enhanced module (REM) and a series of deployed hierarchical fusion module (HFM) is used to gradually integrate high-level semantic information and low-level spatial details.
Abstract: The crucial issue in RGBD saliency detection is how to adequately mine and fuse the geometric information and the appearance information contained in depth maps and RGB images, respectively. In this paper, we propose a novel feature aggregation network FANet including a feature extraction module and an aggregation module for RGBD saliency detection. The premier characteristic of FANet is the feature aggregation module consisting of a designed region enhanced module (REM) and a series of deployed hierarchical fusion module (HFM). Specifically, on one hand, the REM provides the powerful capability in differentiating salient objects and background. On the other hand, the HFM is used to gradually integrate high-level semantic information and low-level spatial details, where the K-nearest neighbor graph neural networks (KGNNs) and the non-local module (NLM) are embedded into HFM to dig the geometric information and enhance high-level appearance features, respectively. Extensive experiments on five RGBD datasets show that our model achieves compelling performance against the current 11 state-of-the-art RGBD saliency models.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors exploited the property of DCT for finding significant information in images by selecting multiple channels, and proposed a method that studies texture distribution based on statistical measurement to extract features.
Abstract: Text detection from natural scene images is an active research area for computer vision, signal, and image processing because of several real-time applications such as driving vehicles automatically and tracing person behaviors during sports or marathon events. In these situations, there is a high probability of missing text information due to the occlusion of different objects/persons while capturing images. Unlike most of the existing methods, which focus only on text detection by ignoring the effect of missing texts, this work detects and predicts missing texts so that the performance of the OCR improves. The proposed method exploits the property of DCT for finding significant information in images by selecting multiple channels. For chosen DCT channels, the proposed method studies texture distribution based on statistical measurement to extract features. We propose to adopt Bayesian classifier for categorizing text pixels using extracted features. Then a deep learning model is proposed for eliminating false positives to improve text detection performance. Further, the proposed method employs a Natural Language Processing (NLP) model for predicting missing text information by using detected and recognition texts. Experimental results on our dataset, which contains texts occluded by objects, show that the proposed method is effective in predicting missing text information. To demonstrate the effectiveness and objectiveness of the proposed method, we also tested it on the standard datasets of natural scene images, namely, ICDAR 2017-MLT, Total-Text, and CTW1500.

Journal ArticleDOI
TL;DR: In this article , a novel rank learning guided no-reference quality assessment method is proposed to evaluate different underwater image enhancement (UIE) algorithms. But the method is limited to underwater images.
Abstract: Objectively and accurately evaluating underwater images generated by different enhancement algorithms is an essential issue, which however is still largely under-explored. In this paper, we present a novel rank learning guided no-reference quality assessment method to evaluate different underwater image enhancement (UIE) algorithms. It is also the first work that utilizes deep learning approaches to address this problem. Our approach, termed Twice Mixing, is motivated by the observation that a mid-quality image can be generated by mixing a high-quality image and its low-quality version. Twice Mixing is trained based on an elaborately formulated self-supervision mechanism. Specifically, before each iteration, we randomly generate two mixing ratios which will be utilized for both generating virtual images and guiding the network training. In the test phase, a single branch of the network is extracted to predict the quality rankings of different UIE outputs. Additionally, to train our network, we construct a new dataset that contains over 2200 raw underwater images and their high/low-quality versions. Twice Mixing is evaluated on both synthetic and real-world datasets. Experimental results show that the proposed approach outperforms the previous methods significantly.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a multi-scale fusion generative adversarial network named Fusion Water-GAN (FW-GAN) to enhance the underwater image quality, which has four convolution branches, these branches refine the features of the three prior inputs and encode the original input, then fuse prior features using the proposed multiscale fusion connections, and finally use the channel attention decoder to generate satisfactory enhanced results.
Abstract: Underwater robots have broad applications in many fields such as ocean exploration, ocean pasture and environmental monitoring. However, due to the inference of light scattering and absorption, selective color attenuation, suspended particles and other complex factors in the underwater environment, it is difficult for robot vision sensors to obtain high-quality underwater image signal, which is the bottleneck problem that restricts the visual perception of underwater robots. In this paper, we propose a multi-scale fusion generative adversarial network named Fusion Water-GAN (FW-GAN) to enhance the underwater image quality. The proposed model has four convolution branches, these branches refine the features of the three prior inputs and encode the original input, then fuse prior features using the proposed multi-scale fusion connections, and finally use the channel attention decoder to generate satisfactory enhanced results. We conduct qualitative and quantitative comparison experiments on real-world and synthetic distorted underwater image datasets under various degradation conditions. The results show that compared with the recent state-of-the-art underwater image enhancement methods, our proposed method achieves higher quantitative metrics scores and better generalization capability. In addition, the ablation study demonstrated the contribution of each component. • We propose a multi-scale fusion generator network architecture. Based on the analysis of the underwater environment, an adaptive fusion strategy is proposed to fuse the multi-source and multi-scale features, which can effectively correct the color casts and haze of the image and improve its contrast, it can also avoid blind enhancement of the image and improve the generalization capability of the model. • We propose a decoder model combined with channel attention to compute the attention of the prior and decoded feature maps in the fusion process and adjust them adaptively. The aim is to learn the potential associations between the priori features of fusion and the enhanced results. • We conducted qualitative and quantitative evaluations and compared FW-GAN with traditional methods and state-of-the-art models. The results show that FW-GAN has good generalization capability and competitive performance. Finally, we conduct an ablation study to demonstrate the contribution of each core component in our network.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel encoder-decoder architecture called Multi Inception Residual Attention U-Net (MIRAU-Net), which integrates residual inception modules with attention gates into U-net to further enhance brain tumor segmentation performance.
Abstract: Gliomas are the largest prevalent and destructive of brain tumors and have crucial parts for the diagnosing and treating of MRI brain tumors during segmentation using computerized methods. Recently, U-Net architecture has achieved impressive brain tumor segmentation, but this role remains challenging due to the differing severity and appearance of gliomas. Therefore, we proposed a novel encoder–decoder architecture called Multi Inception Residual Attention U-Net (MIRAU-Net) in this work. It integrates residual inception modules with attention gates into U-Net to further enhance brain tumor segmentation performance. Encoder–decoder is connected in this architecture through Inception Residual pathways to decrease the distance between their maps of features. We use the weight cross-entropy and generalized Dice (GDL) with focal Tversky loss functions to resolve the class imbalance problem. The evaluation performance of MIRAU-Net checked with Brats 2019 and obtained mean dice similarities of 0.885 for the whole tumor, 0.879 for the core area, and 0.818 for the enhancement tumor. Experiment results reveal that the suggested MIRAU-Net beats its baselines and provides better efficiency than recent techniques for brain tumor segmentation.

Journal ArticleDOI
TL;DR: Based on the atmospheric scattering model, a novel model is designed to directly generate the haze-free image in this article , where a simple and effective U-connection residual network (UR-Net) is proposed to combine the generator and adopt the spatial pyramid pooling (SPP) to design the discriminator.
Abstract: Image-to-image translation based on generative adversarial network (GAN) has achieved state-of-the-art performance in various image restoration applications. Single image dehazing is a typical example, which aims to obtain the haze-free image of a haze one. This paper concentrates on the challenging task of single image dehazing. Based on the atmospheric scattering model, a novel model is designed to directly generate the haze-free image. The main challenge of image dehazing is that the atmospheric scattering model has two parameters, i.e., transmission map and atmospheric light. When they are estimated respectively, the errors will be accumulated to compromise the dehazing quality. Considering this reason and various image sizes, a novel input-size flexibility conditional generative adversarial network (cGAN) is proposed for single image dehazing, which is input-size flexibility at both training and test stages for image-to-image translation with cGAN framework. A simple and effective U-connection residual network (UR-Net) is proposed to combine the generator and adopt the spatial pyramid pooling (SPP) to design the discriminator. Moreover, the model is trained with multi-loss function, in which the consistency loss is a novel designed loss in this paper. Finally, a multi-scale cGAN fusion model is built to realize state-of-the-art single image dehazing performance. The proposed models receive a haze image as input and directly output a haze-free one. Experimental results demonstrate the effectiveness and efficiency of the proposed models.