scispace - formally typeset
Search or ask a question

Showing papers on "Channel (digital image) published in 2019"


Proceedings ArticleDOI
22 Oct 2019
TL;DR: Yu et al. as mentioned in this paper proposed a generative image inpainting system to complete images with free-form mask and guidance, which is based on gated convolutions learned from millions of images without additional labeling efforts.
Abstract: We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: \url{https://github.com/JiahuiYu/generative_inpainting}.

904 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: A sampling fusion network is devised which fuses multi-layer feature with effective anchor sampling, to improve the sensitivity to small objects, and the IoU constant factor is added to the smooth L1 loss to address the boundary problem for the rotating bounding box.
Abstract: Object detection has been a building block in computer vision. Though considerable progress has been made, there still exist challenges for objects with small size, arbitrary direction, and dense distribution. Apart from natural images, such issues are especially pronounced for aerial images of great importance. This paper presents a novel multi-category rotation detector for small, cluttered and rotated objects, namely SCRDet. Specifically, a sampling fusion network is devised which fuses multi-layer feature with effective anchor sampling, to improve the sensitivity to small objects. Meanwhile, the supervised pixel attention network and the channel attention network are jointly explored for small and cluttered object detection by suppressing the noise and highlighting the objects feature. For more accurate rotation estimation, the IoU constant factor is added to the smooth L1 loss to address the boundary problem for the rotating bounding box. Extensive experiments on two remote sensing public datasets DOTA, NWPU VHR-10 as well as natural image datasets COCO, VOC2007 and scene text data ICDAR2015 show the state-of-the-art performance of our detector. The code and models will be available at https://github.com/DetectionTeamUCAS.

552 citations


Proceedings ArticleDOI
15 Oct 2019
TL;DR: Zheng et al. as mentioned in this paper proposed a lightweight information multi-distillation network (IMDN) by constructing the cascaded information multidistillation blocks (IMDB), which contains distillation and selective fusion parts.
Abstract: In recent years, single image super-resolution (SISR) methods using deep convolution neural network (CNN) have achieved impressive results. Thanks to the powerful representation capabilities of the deep networks, numerous previous ways can learn the complex non-linear mapping between low-resolution (LR) image patches and their high-resolution (HR) versions. However, excessive convolutions will limit the application of super-resolution technology in low computing power devices. Besides, super-resolution of any arbitrary scale factor is a critical issue in practical applications, which has not been well solved in the previous approaches. To address these issues, we propose a lightweight information multi-distillation network (IMDN) by constructing the cascaded information multi-distillation blocks (IMDB), which contains distillation and selective fusion parts. Specifically, the distillation module extracts hierarchical features step-by-step, and fusion module aggregates them according to the importance of candidate features, which is evaluated by the proposed contrast-aware channel attention mechanism. To process real images with any sizes, we develop an adaptive cropping strategy (ACS) to super-resolve block-wise image patches using the same well-trained model. Extensive experiments suggest that the proposed method performs favorably against the state-of-the-art SR algorithms in term of visual quality, memory footprint, and inference time. Code is available at \urlhttps://github.com/Zheng222/IMDN.

386 citations


Journal ArticleDOI
TL;DR: This paper effectively incorporate the recently proposed “squeeze and excitation” (SE) modules for channel recalibration for image classification in three state-of-the-art F-CNNs and demonstrates a consistent improvement of segmentation accuracy on three challenging benchmark datasets.
Abstract: In a wide range of semantic segmentation tasks, fully convolutional neural networks (F-CNNs) have been successfully leveraged to achieve the state-of-the-art performance. Architectural innovations of F-CNNs have mainly been on improving spatial encoding or network connectivity to aid gradient flow. In this paper, we aim toward an alternate direction of recalibrating the learned feature maps adaptively, boosting meaningful features while suppressing weak ones. The recalibration is achieved by simple computational blocks that can be easily integrated in F-CNNs architectures. We draw our inspiration from the recently proposed “squeeze and excitation” (SE) modules for channel recalibration for image classification. Toward this end, we introduce three variants of SE modules for segmentation: 1) squeezing spatially and exciting channel wise; 2) squeezing channel wise and exciting spatially; and 3) joint spatial and channel SE. We effectively incorporate the proposed SE blocks in three state-of-the-art F-CNNs and demonstrate a consistent improvement of segmentation accuracy on three challenging benchmark datasets. Importantly, SE blocks only lead to a minimal increase in model complexity of about 1.5%, while the Dice score increases by 4%–9% in the case of U-Net. Hence, we believe that SE blocks can be an integral part of future F-CNN architectures.

318 citations


Journal ArticleDOI
TL;DR: In this article, an Illumination-aware Faster R-CNN (IAF-R-CNN) was proposed to fuse color and thermal images for pedestrian detection in multispectral images.

231 citations


Proceedings ArticleDOI
TL;DR: An adaptive cropping strategy (ACS) is developed to super-resolve block-wise image patches using the same well-trained model and performs favorably against the state-of-the-art SR algorithms in terms of visual quality, memory footprint, and inference time.
Abstract: In recent years, single image super-resolution (SISR) methods using deep convolution neural network (CNN) have achieved impressive results. Thanks to the powerful representation capabilities of the deep networks, numerous previous ways can learn the complex non-linear mapping between low-resolution (LR) image patches and their high-resolution (HR) versions. However, excessive convolutions will limit the application of super-resolution technology in low computing power devices. Besides, super-resolution of any arbitrary scale factor is a critical issue in practical applications, which has not been well solved in the previous approaches. To address these issues, we propose a lightweight information multi-distillation network (IMDN) by constructing the cascaded information multi-distillation blocks (IMDB), which contains distillation and selective fusion parts. Specifically, the distillation module extracts hierarchical features step-by-step, and fusion module aggregates them according to the importance of candidate features, which is evaluated by the proposed contrast-aware channel attention mechanism. To process real images with any sizes, we develop an adaptive cropping strategy (ACS) to super-resolve block-wise image patches using the same well-trained model. Extensive experiments suggest that the proposed method performs favorably against the state-of-the-art SR algorithms in term of visual quality, memory footprint, and inference time. Code is available at \url{this https URL}.

178 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a novel object detection framework, called Optical Remote Sensing Imagery detector (ORSIm detector), integrating diverse channel features extraction, feature learning, fast image pyramid matching, and boosting strategy.
Abstract: With the rapid development of spaceborne imaging techniques, object detection in optical remote sensing imagery has drawn much attention in recent decades. While many advanced works have been developed with powerful learning algorithms, the incomplete feature representation still cannot meet the demand for effectively and efficiently handling image deformations, particularly objective scaling and rotation. To this end, we propose a novel object detection framework, called Optical Remote Sensing Imagery detector (ORSIm detector), integrating diverse channel features extraction, feature learning, fast image pyramid matching, and boosting strategy. An ORSIm detector adopts a novel spatial-frequency channel feature (SFCF) by jointly considering the rotation-invariant channel features constructed in the frequency domain and the original spatial channel features (e.g., color channel and gradient magnitude). Subsequently, we refine SFCF using learning-based strategy in order to obtain the high-level or semantically meaningful features. In the test phase, we achieve a fast and coarsely scaled channel computation by mathematically estimating a scaling factor in the image domain. Extensive experimental results conducted on the two different airborne data sets are performed to demonstrate the superiority and effectiveness in comparison with the previous state-of-the-art methods.

155 citations


Journal ArticleDOI
TL;DR: A block scrambling-based encryption scheme is presented to enhance the security of Encryption-then-Compression (EtC) systems with JPEG compression, which allow us to securely transmit the images through an untrusted channel provider, such as social network service providers.
Abstract: A block scrambling-based encryption scheme is presented to enhance the security of Encryption-then-Compression (EtC) systems with JPEG compression, which allow us to securely transmit the images through an untrusted channel provider, such as social network service providers. The proposed scheme enables the use of a smaller block size and a larger number of blocks than the conventional scheme. Images encrypted using the proposed scheme include less color information due to the use of grayscale images even when the original image has three color channels. These features enhance security against various attacks such as jigsaw puzzle solver and brute-force attacks. In an experiment, the security against jigsaw puzzle solver attacks is evaluated. Encrypted images were uploaded to and then downloaded from Facebook and Twitter, and the results demonstrated that the proposed scheme is effective for EtC systems.

153 citations


Book ChapterDOI
13 Oct 2019
TL;DR: This work proposes a general unifying curvilinear structure segmentation network that works on different medical imaging modalities: optical coherence tomography angiography, color fundus image, and corneal confocal microscopy, and instead of the U-Net based convolutional neural network, a novel network which includes a self-attention mechanism in the encoder and decoder.
Abstract: The detection of curvilinear structures in medical images, e.g., blood vessels or nerve fibers, is important in aiding management of many diseases. In this work, we propose a general unifying curvilinear structure segmentation network that works on different medical imaging modalities: optical coherence tomography angiography (OCT-A), color fundus image, and corneal confocal microscopy (CCM). Instead of the U-Net based convolutional neural network, we propose a novel network (CS-Net) which includes a self-attention mechanism in the encoder and decoder. Two types of attention modules are utilized - spatial attention and channel attention, to further integrate local features with their global dependencies adaptively. The proposed network has been validated on five datasets: two color fundus datasets, two corneal nerve datasets and one OCT-A dataset. Experimental results show that our method outperforms state-of-the-art methods, for example, sensitivities of corneal nerve fiber segmentation were at least 2% higher than the competitors. As a complementary output, we made manual annotations of two corneal nerve datasets which have been released for public access.

152 citations


Posted Content
TL;DR: A novel approach is proposed, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks, by employing multiple latent codes to generate multiple feature maps at some intermediate layer of the generator and composing them with adaptive channel importance to recover the input image.
Abstract: Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by back-propagation or by learning an additional encoder. However, the reconstructions from both of the methods are far from ideal. In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. In particular, we employ multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image. Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors. The resulting high-fidelity image reconstruction enables the trained GAN models as prior to many real-world applications, such as image colorization, super-resolution, image inpainting, and semantic manipulation. We further analyze the properties of the layer-wise representation learned by GAN models and shed light on what knowledge each layer is capable of representing.

134 citations


Journal ArticleDOI
TL;DR: The proposed matching framework has been evaluated using many different types of multimodal images, and the results demonstrate its superior matching performance with respect to the state-of-the-art methods.
Abstract: While image matching has been studied in remote sensing community for decades, matching multimodal data [e.g., optical, light detection and ranging (LiDAR), synthetic aperture radar (SAR), and map] remains a challenging problem because of significant nonlinear intensity differences between such data. To address this problem, we present a novel fast and robust template matching framework integrating local descriptors for multimodal images. First, a local descriptor [such as histogram of oriented gradient (HOG) and local self-similarity (LSS) or speeded-up robust feature (SURF)] is extracted at each pixel to form a pixelwise feature representation of an image. Then, we define a fast similarity measure based on the feature representation using the fast Fourier transform (FFT) in the frequency domain. A template matching strategy is employed to detect correspondences between images. In this procedure, we also propose a novel pixelwise feature representation using orientated gradients of images, which is named channel features of orientated gradients (CFOG). This novel feature is an extension of the pixelwise HOG descriptor with superior performance in image matching and computational efficiency. The major advantages of the proposed matching framework include: 1) structural similarity representation using the pixelwise feature description and 2) high computational efficiency due to the use of FFT. The proposed matching framework has been evaluated using many different types of multimodal images, and the results demonstrate its superior matching performance with respect to the state-of-the-art methods.

Journal ArticleDOI
TL;DR: A novel cross-modality interactive attention network that takes full advantage of the interactive properties of multispectral input sources is proposed that achieves state-of-the-art performance with high efficiency.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed 3D auto-context-based locality adaptive multi-modality generative adversarial networks model (LA-GANs) outperforms the traditional multi- modality fusion methods used in deep networks, as well as the state-of-the-art PET estimation approaches.
Abstract: Positron emission tomography (PET) has been substantially used recently. To minimize the potential health risk caused by the tracer radiation inherent to PET scans, it is of great interest to synthesize the high-quality PET image from the low-dose one to reduce the radiation exposure. In this paper, we propose a 3D auto-context-based locality adaptive multi-modality generative adversarial networks model (LA-GANs) to synthesize the high-quality FDG PET image from the low-dose one with the accompanying MRI images that provide anatomical information. Our work has four contributions. First, different from the traditional methods that treat each image modality as an input channel and apply the same kernel to convolve the whole image, we argue that the contributions of different modalities could vary at different image locations, and therefore a unified kernel for a whole image is not optimal. To address this issue, we propose a locality adaptive strategy for multi-modality fusion. Second, we utilize $1 \times 1 \times 1$ kernel to learn this locality adaptive fusion so that the number of additional parameters incurred by our method is kept minimum. Third, the proposed locality adaptive fusion mechanism is learned jointly with the PET image synthesis in a 3D conditional GANs model, which generates high-quality PET images by employing large-sized image patches and hierarchical features. Fourth, we apply the auto-context strategy to our scheme and propose an auto-context LA-GANs model to further refine the quality of synthesized images. Experimental results show that our method outperforms the traditional multi-modality fusion methods used in deep networks, as well as the state-of-the-art PET estimation approaches.

Journal ArticleDOI
TL;DR: The proposed three-channel convolutional neural networks model can automatically learn the representative features from the complex diseased leaf images, and effectively recognize vegetable diseases.

Journal ArticleDOI
TL;DR: This study proposes zero-padding for resizing images to the same size and compares it with the conventional approach of scaling images up (zooming in) using interpolation, showing that zero- padding had no effect on the classification accuracy but considerably reduced the training time.
Abstract: The input to a machine learning model is a one-dimensional feature vector. However, in recent learning models, such as convolutional and recurrent neural networks, two- and three-dimensional feature tensors can also be inputted to the model. During training, the machine adjusts its internal parameters to project each feature tensor close to its target. After training, the machine can be used to predict the target for previously unseen feature tensors. What this study focuses on is the requirement that feature tensors must be of the same size. In other words, the same number of features must be present for each sample. This creates a barrier in processing images and texts, as they usually have different sizes, and thus different numbers of features. In classifying an image using a convolutional neural network (CNN), the input is a three-dimensional tensor, where the value of each pixel in each channel is one feature. The three-dimensional feature tensor must be the same size for all images. However, images are not usually of the same size and so are not their corresponding feature tensors. Resizing images to the same size without deforming patterns contained therein is a major challenge. This study proposes zero-padding for resizing images to the same size and compares it with the conventional approach of scaling images up (zooming in) using interpolation. Our study showed that zero-padding had no effect on the classification accuracy but considerably reduced the training time. The reason is that neighboring zero input units (pixels) will not activate their corresponding convolutional unit in the next layer. Therefore, the synaptic weights on outgoing links from input units do not need to be updated if they contain a zero value. Theoretical justification along with experimental endorsements are provided in this paper.

Journal ArticleDOI
TL;DR: A novel object detection framework is proposed, called Optical Remote Sensing Imagery detector (ORSIm detector), integrating diverse channel features extraction, feature learning, fast image pyramid matching, and boosting strategy, and achieves a fast and coarsely scaled channel computation.
Abstract: With the rapid development of spaceborne imaging techniques, object detection in optical remote sensing imagery has drawn much attention in recent decades. While many advanced works have been developed with powerful learning algorithms, the incomplete feature representation still cannot meet the demand for effectively and efficiently handling image deformations, particularly objective scaling and rotation. To this end, we propose a novel object detection framework, called optical remote sensing imagery detector (ORSIm detector), integrating diverse channel features extraction, feature learning, fast image pyramid matching, and boosting strategy. ORSIm detector adopts a novel spatial-frequency channel feature (SFCF) by jointly considering the rotation-invariant channel features constructed in frequency domain and the original spatial channel features (e.g., color channel, gradient magnitude). Subsequently, we refine SFCF using learning-based strategy in order to obtain the high-level or semantically meaningful features. In the test phase, we achieve a fast and coarsely-scaled channel computation by mathematically estimating a scaling factor in the image domain. Extensive experimental results conducted on the two different airborne datasets are performed to demonstrate the superiority and effectiveness in comparison with previous state-of-the-art methods.

Journal ArticleDOI
Jingyu Lu1, Na Li1, Shaoyong Zhang1, Zhibin Yu1, Haiyong Zheng1, Bing Zheng1 
TL;DR: This paper proposes an underwater image restoration method based on transferring an underwater style image into a recovered style using Multi-Scale Cycle Generative Adversarial Network (MCycle GAN) System and includes a Structural Similarity Index Measure loss (SSIM loss), which can provide more flexibility to model the detail structural to improve the image restoration performance.
Abstract: Underwater image restoration, which is the keystone to the underwater vision research, is still a challenging work. The key point of underwater image restoration work is how to remove the turbidity and the color distortion caused by the underwater environment. In this paper, we propose an underwater image restoration method based on transferring an underwater style image into a recovered style using Multi-Scale Cycle Generative Adversarial Network (MCycle GAN) System. We include a Structural Similarity Index Measure loss (SSIM loss), which can provide more flexibility to model the detail structural to improve the image restoration performance. We use dark channel prior (DCP) algorithm to get the transmission map of an image and design an adaptive SSIM loss to improve underwater image quality. We input this information into the network for multi-scale calculation on the images, which achieves the combination of DCP algorithm and Cycle-Consistent Adversarial Networks (CycleGAN). By compared the quantitative and qualitative with existing state-of-the-art approaches, our method shows a pleasing performance on the underwater image dataset.

Journal ArticleDOI
TL;DR: An underwater image enhancement model inspired by the morphology and function of the teleost fish retina is proposed, which aims to solve the problems of underwater image degradation raised by the blurring and nonuniform color biasing.
Abstract: We propose an underwater image enhancement model inspired by the morphology and function of the teleost fish retina. We aim to solve the problems of underwater image degradation raised by the blurring and nonuniform color biasing. In particular, the feedback from color-sensitive horizontal cells to cones and a red channel compensation are used to correct the nonuniform color bias. The center-surround opponent mechanism of the bipolar cells and the feedback from amacrine cells to interplexiform cells then to horizontal cells serve to enhance the edges and contrasts of the output image. The ganglion cells with color-opponent mechanism are used for color enhancement and color correction. Finally, we adopt a luminance-based fusion strategy to reconstruct the enhanced image from the outputs of ON and OFF pathways of fish retina. Our model utilizes the global statistics (i.e., image contrast) to automatically guide the design of each low-level filter, which realizes the self-adaption of the main parameters. Extensive qualitative and quantitative evaluations on various underwater scenes validate the competitive performance of our technique. Our model also significantly improves the accuracy of transmission map estimation and local feature point matching using the underwater image. Our method is a single image approach that does not require the specialized prior about the underwater condition or scene structure.

Journal ArticleDOI
31 Jan 2019
TL;DR: A novel grayscale-based block scrambling image encryption scheme is presented not only to enhance security, but also to improve the compression performance for Encryption-then-Compression (EtC) systems with JPEG compression, which are used to securely transmit images through an untrusted channel provider.
Abstract: A novel grayscale-based block scrambling image encryption scheme is presented not only to enhance security, but also to improve the compression performance for Encryption-then-Compression (EtC) systems with JPEG compression, which are used to securely transmit images through an untrusted channel provider. The proposed scheme enables the use of a smaller block size and a larger number of blocks than the color-based image encryption scheme. Images encrypted using the proposed scheme include less color information due to the use of grayscale images even when the original image has three color channels. These features enhance security against various attacks, such as jigsaw puzzle solver and brute-force attacks. Moreover, generating the grayscale-based images from a full-color image in YCbCr color space allows the use of color sub-sampling operation, which can provide the higher compression performance than the conventional grayscale-based encryption scheme, although the encrypted images have no color information. In an experiment, encrypted images were uploaded to and then downloaded from Twitter and Facebook, and the results demonstrated that the proposed scheme is effective for EtC systems and enhances the compression performance, while maintaining the security against brute-force and jigsaw puzzle solver attacks.

Journal ArticleDOI
TL;DR: The spatial channel-wise convolution, a convolutional operation along the direction of the channel of feature maps, is proposed to extract mapping relationship of spatial information between pixels, which facilitates learning the mapping relationship between pixels in the feature maps and distinguishing the tumors from the liver tissue.
Abstract: It is a challenge to automatically and accurately segment the liver and tumors in computed tomography (CT) images, as the problem of over-segmentation or under-segmentation often appears when the Hounsfield unit (Hu) of liver and tumors is close to the Hu of other tissues or background. In this paper, we propose the spatial channel-wise convolution, a convolutional operation along the direction of the channel of feature maps, to extract mapping relationship of spatial information between pixels, which facilitates learning the mapping relationship between pixels in the feature maps and distinguishing the tumors from the liver tissue. In addition, we put forward an iterative extending learning strategy, which optimizes the mapping relationship of spatial information between pixels at different scales and enables spatial channel-wise convolution to map the spatial information between pixels in high-level feature maps. Finally, we propose an end-to-end convolutional neural network called Channel-UNet, which takes UNet as the main structure of the network and adds spatial channel-wise convolution in each up-sampling and down-sampling module. The network can converge the optimized mapping relationship of spatial information between pixels extracted by spatial channel-wise convolution and information extracted by feature maps and realizes multi-scale information fusion. The proposed ChannelUNet is validated by the segmentation task on the 3Dircadb dataset. The Dice values of liver and tumors segmentation were 0.984 and 0.940, which is slightly superior to current best performance. Besides, compared with the current best method, the number of parameters of our method reduces by 25.7%, and the training time of our method reduces by 33.3%. The experimental results demonstrate the efficiency and high accuracy of Channel-UNet in liver and tumors segmentation in CT images.

Posted Content
TL;DR: In this paper, a channel normalization layer is introduced to reduce the number of parameters and computational complexity of deep convolutional neural networks, which is applicable to operator-level recognition.
Abstract: In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.

Journal ArticleDOI
TL;DR: A novel restoration algorithm is proposed using a single image to reduce the environmental pollution effects, and it is based on the dark channel prior and the use of morphological reconstruction for fast computing of transmission maps.
Abstract: Outdoor images are used in a vast number of applications, such as surveillance, remote sensing, and autonomous navigation. The greatest issue with these types of images is the effect of environmental pollution: haze, smog, and fog originating from suspended particles in the air, such as dust, carbon, and water drops, which cause degradation to the image. The elimination of this type of degradation is essential for the input of computer vision systems. Most of the state-of-the-art research in dehazing algorithms is focused on improving the estimation of transmission maps, which are also known as depth maps. The transmission maps are relevant because they have a direct relation to the quality of the image restoration. In this paper, a novel restoration algorithm is proposed using a single image to reduce the environmental pollution effects, and it is based on the dark channel prior and the use of morphological reconstruction for fast computing of transmission maps. The obtained experimental results are evaluated and compared qualitatively and quantitatively with other dehazing algorithms using the metrics of the peak signal-to-noise ratio and structural similarity index; based on these metrics, it is found that the proposed algorithm has improved performance compared with recently introduced approaches.

Journal ArticleDOI
TL;DR: The potential of HSI for document image analysis is explored and a comprehensive review of the literature and future prospects are presented.

Journal ArticleDOI
TL;DR: The experiments at reduced and full resolution show that the proposed method outperforms the other state-of-the-art pansharpening methods and can be successfully extended to hyper-spectral image fusion.
Abstract: Pansharpening is a process of acquiring a multi-spectral image with high spatial resolution by fusing a low resolution multi-spectral image with a corresponding high resolution panchromatic image In this paper, a new pansharpening method based on the Bayesian theory is proposed The algorithm is mainly based on three assumptions: 1) the geometric information contained in the pan-sharpened image is coincident with that contained in the panchromatic image; 2) the pan-sharpened image and the original multi-spectral image should share the same spectral information; and 3) in each pan-sharpened image channel, the neighboring pixels not around the edges are similar We build our posterior probability model according to above-mentioned assumptions and solve it by the alternating direction method of multipliers The experiments at reduced and full resolution show that the proposed method outperforms the other state-of-the-art pansharpening methods Besides, we verify that the new algorithm is effective in preserving spectral and spatial information with high reliability Further experiments also show that the proposed method can be successfully extended to hyper-spectral image fusion

Journal ArticleDOI
TL;DR: Results derived from images obtained in a controlled laboratory water tank environment with different turbidity conditions and images from tests using the proposed method at sea demonstrate an ability to significantly improve visibility and reduce runtime by a factor of about 50 for a 4K image when compared to conventional DCP methods.
Abstract: Object identification in highly turbid optical media depends mainly on the quality of collected images. Underwater images acquired in a turbid environment are generally of very poor quality. Attenuation and backscattering of light by water, by materials dissolved in the water, and by particulate material are the main causes of the degradation of underwater images. It is therefore essential to improve the quality of such images to facilitate object identification. The focus of this paper is to report the principle and validation of a fast and effective method of improving the quality of underwater images. On the one hand, this method uses a polarimetric imaging optical system to reduce the effect of diffusion on the image acquisition. On the other hand, it is based on an optimized version of the dark channel prior (DCP) method that has received a great deal of attention for image dehazing. Results derived from images obtained in a controlled laboratory water tank environment with different turbidity conditions and images from tests using the proposed method at sea demonstrate an ability to significantly improve visibility and reduce runtime by a factor of about 50 for a 4K image when compared to conventional DCP methods.

Journal ArticleDOI
TL;DR: An automated system for segmentation and recognition of grape leaf diseases is proposed which acquired an average segmentation accuracy rate of 90% and classification accuracy is above 92% which is superior in contrast of existing techniques.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: Experimental results show that, with the combination of the proposed PMS-Net, the performance in haze removal is much better than that of other state-of-the-art algorithms and the problems caused by the fixed patch size.
Abstract: In this paper, we proposed a novel haze removal algorithm based on a new feature called the patch map. Conventional patch-based haze removal algorithms (e.g. the Dark Channel prior) usually performs dehazing with a fixed patch size. However, it may produce several problems in recovered results such as oversaturation and color distortion. Therefore, in this paper, we designed an adaptive and automatic patch size selection model called the Patch Map Selection Network (PMS-Net) to select the patch size corresponding to each pixel. This network is designed based on the convolutional neural network (CNN), which can generate the patch map from the image to image. Experimental results on both synthesized and real-world hazy images show that, with the combination of the proposed PMS-Net, the performance in haze removal is much better than that of other state-of-the-art algorithms and we can address the problems caused by the fixed patch size.

Journal ArticleDOI
20 Mar 2019
TL;DR: A novel instrument for pedestrian detection by combining stereo vision cameras with a thermal camera is presented, and it significantly outperforms the traditional histogram of oriented gradients features.
Abstract: Pedestrian detection is a critical feature of autonomous vehicle or advanced driver assistance system. This paper presents a novel instrument for pedestrian detection by combining stereo vision cameras with a thermal camera. A new dataset for vehicle applications is built from the test vehicle recorded data when driving on city roads. Data received from multiple cameras are aligned using trifocal tensor with pre-calibrated parameters. Candidates are generated from each image frame using sliding windows across multiple scales. A reconfigurable detector framework is proposed, in which feature extraction and classification are two separate stages. The input to the detector can be the color image, disparity map, thermal data, or any of their combinations. When applying to convolutional channel features, feature extraction utilizes the first three convolutional layers of a pre-trained convolutional neural network cascaded with an AdaBoost classifier. The evaluation results show that it significantly outperforms the traditional histogram of oriented gradients features. The proposed pedestrian detector with multi-spectral cameras can achieve 9% log-average miss rate. The experimental dataset is made available at http://computing.wpi.edu/dataset.html .

Journal ArticleDOI
TL;DR: This paper proposes the absorption light scattering model (ALSM), which can be used to reasonably explain the absorbed light imaging process for low-light images, and identifies that the minimum channel of ALSM obtained above exhibits high local similarity.
Abstract: Low light often leads to poor image visibility, which can easily affect the performance of computer vision algorithms. First, this paper proposes the absorption light scattering model (ALSM), which can be used to reasonably explain the absorbed light imaging process for low-light images. In addition, the absorbing light scattering image obtained via ALSM under a sufficient and uniform illumination can reproduce hidden outlines and details from the low-light image. Then, we identify that the minimum channel of ALSM obtained above exhibits high local similarity. This similarity can be constrained by superpixels, which effectively prevent the use of gradient operations at the edges so that the noise is not amplified quickly during enhancement. Finally, by analyzing the monotonicity between the scene reflection and the atmospheric light or transmittance in ALSM, a new low-light image enhancement method is identified. We replace atmospheric light with inverted atmospheric light to reduce the contribution of atmospheric light in the imaging results. Moreover, a soft jointed mean-standard-deviation (MSD) mechanism is proposed that directly acts on the patches represented by the superpixels. The MSD can obtain a smaller transmittance than that obtained by the minimum strategy, and it can be automatically adjusted according to the information of the image. The experiments on challenging low-light images are conducted to reveal the advantages of our method compared with other powerful techniques.

Journal ArticleDOI
TL;DR: Based on the multi-scale Retinex, an efficient enhancement method for underwater image and video is presented in this paper and the color is selectively preserved by the inverted gray world method depending on imaging conditions and application requirements.
Abstract: The Retinex models the human visual system to perceive natural colors, which could improve the contrast and sharpness of the degraded image and also provide color constancy and dynamic range simultaneously. This endows the Retinex exceeding advantages for enhancing the underwater image. Based on the multi-scale Retinex, an efficient enhancement method for underwater image and video is presented in this paper. Firstly, the image is pre-corrected to equalize the pixel distribution and reduce the dominating color. Then, the classical multi-scale Retinex with intensity channel is applied to the pre-corrected images for further improving the contrast and the color. In addition, multi-down-sampling and infinite impulse response Gaussian filtering are adopted to increase processing speed. Subsequently, the image is restored from logarithmic domain and the illumination of the restored image is compensated based on statistical properties. Finally, the color is selectively preserved by the inverted gray world method depending on imaging conditions and application requirements. Five kinds of typical underwater images with green, blue, turbid, dark and colorful backgrounds and two underwater videos are enhanced and evaluated on Jetson TX2, respectively, to verify the effectiveness of the proposed method.