scispace - formally typeset
Search or ask a question

Showing papers on "Pixel published in 2021"


Journal ArticleDOI
TL;DR: HuHuang et al. as discussed by the authors proposed a bitemporal image transformer (BIT) to efficiently and effectively model contexts within the spatial-temporal domain, and incorporated BIT in a deep feature differencing-based CD framework.
Abstract: Modern change detection (CD) has achieved remarkable success by the powerful discriminative ability of deep convolutions. However, high-resolution remote sensing CD remains challenging due to the complexity of objects in the scene. Objects with the same semantic concept may show distinct spectral characteristics at different times and spatial locations. Most recent CD pipelines using pure convolutions are still struggling to relate long-range concepts in space-time. Nonlocal self-attention approaches show promising performance via modeling dense relationships among pixels, yet are computationally inefficient. Here, we propose a bitemporal image transformer (BIT) to efficiently and effectively model contexts within the spatial-temporal domain. Our intuition is that the high-level concepts of the change of interest can be represented by a few visual words, that is, semantic tokens. To achieve this, we express the bitemporal image into a few tokens and use a transformer encoder to model contexts in the compact token-based space-time. The learned context-rich tokens are then fed back to the pixel-space for refining the original features via a transformer decoder. We incorporate BIT in a deep feature differencing-based CD framework. Extensive experiments on three CD datasets demonstrate the effectiveness and efficiency of the proposed method. Notably, our BIT-based model significantly outperforms the purely convolutional baseline using only three times lower computational costs and model parameters. Based on a naive backbone (ResNet18) without sophisticated structures (e.g., feature pyramid network (FPN) and UNet), our model surpasses several state-of-the-art CD methods, including better than four recent attention-based methods in terms of efficiency and accuracy. Our code is available at https://github.com/justchenhao/BIT_CD.

290 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed SNUNet-CD method improves greatly on many evaluation criteria and has a better tradeoff between accuracy and calculation amount than other state-of-the-art (SOTA) change detection methods.
Abstract: Change detection is an important task in remote sensing (RS) image analysis. It is widely used in natural disaster monitoring and assessment, land resource planning, and other fields. As a pixel-to-pixel prediction task, change detection is sensitive about the utilization of the original position information. Recent change detection methods always focus on the extraction of deep change semantic feature, but ignore the importance of shallow-layer information containing high-resolution and fine-grained features, this often leads to the uncertainty of the pixels at the edge of the changed target and the determination miss of small targets. In this letter, we propose a densely connected siamese network for change detection, namely SNUNet-CD (the combination of Siamese network and NestedUNet). SNUNet-CD alleviates the loss of localization information in the deep layers of neural network through compact information transmission between encoder and decoder, and between decoder and decoder. In addition, Ensemble Channel Attention Module (ECAM) is proposed for deep supervision. Through ECAM, the most representative features of different semantic levels can be refined and used for the final classification. Experimental results show that our method improves greatly on many evaluation criteria and has a better tradeoff between accuracy and calculation amount than other state-of-the-art (SOTA) change detection methods.

256 citations


Journal ArticleDOI
TL;DR: The RIAD approach (RIAD) randomly removes partial image regions and reconstructs the image from partial inpaintings, thus addressing the drawbacks of auto-enocoding methods.

191 citations


Journal ArticleDOI
TL;DR: A deep translation based change detection network (DTCDN) for optical and SAR images is proposed that utilizes deep context features to separate the unchanged pixels and changed pixels in a supervised CD network.
Abstract: With the development of space-based imaging technology, a larger and larger number of images with different modalities and resolutions are available. The optical images reflect the abundant spectral information and geometric shape of ground objects, whose qualities are degraded easily in poor atmospheric conditions. Although synthetic aperture radar (SAR) images cannot provide the spectral features of the region of interest (ROI), they can capture all-weather and all-time polarization information. In nature, optical and SAR images encapsulate lots of complementary information, which is of great significance for change detection (CD) in poor weather situations. However, due to the difference in imaging mechanisms of optical and SAR images, it is difficult to conduct their CD directly using the traditional difference or ratio algorithms. Most recent CD methods bring image translation to reduce their difference, but the results are obtained by ordinary algebraic methods and threshold segmentation with limited accuracy. Towards this end, this work proposes a deep translation based change detection network (DTCDN) for optical and SAR images. The deep translation firstly maps images from one domain (e.g., optical) to another domain (e.g., SAR) through a cyclic structure into the same feature space. With the similar characteristics after deep translation, they become comparable. Different from most previous researches, the translation results are imported to a supervised CD network that utilizes deep context features to separate the unchanged pixels and changed pixels. In the experiments, the proposed DTCDN was tested on four representative data sets from Gloucester, California, and Shuguang village. Compared with state-of-the-art methods, the effectiveness and robustness of the proposed method were confirmed.

166 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: PixLoc as discussed by the authors aligns multiscale deep features with a 3D model to estimate a 6-DoF pose from an image and 3D models, which can localize in large environments given coarse pose priors.
Abstract: Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model parameters and scene geometry. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching by jointly refining keypoints and poses with little overhead. The code will be publicly available at github.com/cvg/pixloc.

157 citations


Journal ArticleDOI
TL;DR: This paper proposes an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices and empirically shows the advantages of this approach with competitive performances on five challenging benchmarks.
Abstract: In this paper, we address the semantic segmentation task with a new context aggregation scheme named object context, which focuses on enhancing the role of object information. Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. We use a binary relation matrix to represent the relationship between all pixels, where the value one indicates the two selected pixels belong to the same category and zero otherwise. We propose to use a dense relation matrix to serve as a surrogate for the binary relation matrix. The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. Considering that the dense relation matrix estimation requires quadratic computation overhead and memory consumption w.r.t. the input size, we propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling (Zhao et al. 2017) and atrous spatial pyramid pooling (Chen et al. 2018). We empirically show the advantages of our approach with competitive performances on five challenging benchmarks including: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff.

138 citations


Proceedings ArticleDOI
05 May 2021
TL;DR: PD-GAN as mentioned in this paper modulates deep features of input random noise from coarse-to-fine by injecting an initially restored image and the hole regions in multiple scales to generate multiple inpainting results with diverse and visually realistic content.
Abstract: We propose PD-GAN, a probabilistic diverse GAN for image inpainting. Given an input image with arbitrary hole regions, PD-GAN produces multiple inpainting results with diverse and visually realistic content. Our PD-GAN is built upon a vanilla GAN which generates images based on random noise. During image generation, we modulate deep features of input random noise from coarse-to-fine by injecting an initially restored image and the hole regions in multiple scales. We argue that during hole filling, the pixels near the hole boundary should be more deterministic (i.e., with higher probability trusting the context and initially restored image to create natural inpainting boundary), while those pixels lie in the center of the hole should enjoy more degrees of freedom (i.e., more likely to depend on the random noise for enhancing diversity). To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information. SPDNorm dynamically balances the realism and diversity inside the hole region, making the generated content more diverse towards the hole center and resemble neighboring image content more towards the hole boundary. Meanwhile, we propose a perceptual diversity loss to further empower PD-GAN for diverse content generation. Experiments on benchmark datasets including CelebA-HQ, Places2 and Paris Street View indicate that PD-GAN is effective for diverse and visually realistic image restoration.

138 citations


Book
24 Aug 2021
TL;DR: In this article, the authors overview graph spectral techniques in graph signal processing (GSP) specifically for image/video processing, including image compression, image restoration, image filtering, and image segmentation.
Abstract: Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2-D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this paper, we overview recent graph spectral techniques in GSP specifically for image/video processing. The topics covered include image compression, image restoration, image filtering, and image segmentation.

126 citations


Journal ArticleDOI
Hao Zhang1, Zhuliang Le1, Zhenfeng Shao1, Han Xu1, Jiayi Ma1 
TL;DR: A new generative adversarial network with adaptive and gradient joint constraints to fuse multi-focus images is presented with the superiority of the method over the state-of-the-art in terms of both subjective visual effect and quantitative metrics.

125 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: LiFeng et al. as discussed by the authors proposed Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output.
Abstract: How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with superresolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to ×30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. Our project page with code is at https://yinboc.github.io/liif/.

124 citations


Journal ArticleDOI
10 Feb 2021-Nature
TL;DR: This result paves the way for the development and proliferation of low-cost, compact and high-performance 3D imaging cameras that could be used in applications from robotics and autonomous navigation to augmented reality and healthcare.
Abstract: Accurate three-dimensional (3D) imaging is essential for machines to map and interact with the physical world1,2 Although numerous 3D imaging technologies exist, each addressing niche applications with varying degrees of success, none has achieved the breadth of applicability and impact that digital image sensors have in the two-dimensional imaging world3–10 A large-scale two-dimensional array of coherent detector pixels operating as a light detection and ranging system could serve as a universal 3D imaging platform Such a system would offer high depth accuracy and immunity to interference from sunlight, as well as the ability to measure the velocity of moving objects directly11 Owing to difficulties in providing electrical and photonic connections to every pixel, previous systems have been restricted to fewer than 20 pixels12–15 Here we demonstrate the operation of a large-scale coherent detector array, consisting of 512 pixels, in a 3D imaging system Leveraging recent advances in the monolithic integration of photonic and electronic circuits, a dense array of optical heterodyne detectors is combined with an integrated electronic readout architecture, enabling straightforward scaling to arbitrarily large arrays Two-axis solid-state beam steering eliminates any trade-off between field of view and range Operating at the quantum noise limit16,17, our system achieves an accuracy of 31 millimetres at a distance of 75 metres when using only 4 milliwatts of light, an order of magnitude more accurate than existing solid-state systems at such ranges Future reductions of pixel size using state-of-the-art components could yield resolutions in excess of 20 megapixels for arrays the size of a consumer camera sensor This result paves the way for the development and proliferation of low-cost, compact and high-performance 3D imaging cameras that could be used in applications from robotics and autonomous navigation to augmented reality and healthcare A compact, high-performance silicon photonics-based light detection and ranging system for three-dimensional imaging is developed that should be amenable to low-cost mass manufacturing

Journal ArticleDOI
TL;DR: This paper proposes a new color image encryption algorithm (CIEA) that sufficiently considers the properties of the color image and Latin square and designs a two-dimensional chaotic system called 2D-LSM to address the weaknesses of existing chaotic systems.
Abstract: Recently, many image encryption schemes have been developed using Latin squares. When encrypting a color image, these algorithms treat the color image as three greyscale images and encrypt these greyscale images one by one using the Latin squares. Obviously, these algorithms do not sufficiently consider the inner connections between the color image and Latin square and thus result in many redundant operations and low efficiency. To address this issue, in this paper, we propose a new color image encryption algorithm (CIEA) that sufficiently considers the properties of the color image and Latin square. First, we propose a two-dimensional chaotic system called 2D-LSM to address the weaknesses of existing chaotic systems. Then, we design a new CIEA using orthogonal Latin squares and 2D-LSM. The proposed CIEA can make full use of the inherent connections of the orthogonal Latin squares and color image and executes the encryption process in the pixel level. Simulation and security analysis results show that the proposed CIEA has a high level of security and can outperform some representative image encryption algorithms.

Journal ArticleDOI
Hao Zhang1, Jiayi Ma1
TL;DR: A squeeze-and-decomposition network (SDNet) is proposed to realize multi-modal and digital photography image fusion in real time and is much faster than the state-of-the-arts, which can deal with real-time fusion tasks.
Abstract: In this paper, a squeeze-and-decomposition network (SDNet) is proposed to realize multi-modal and digital photography image fusion in real time. Firstly, we generally transform multiple fusion problems into the extraction and reconstruction of gradient and intensity information, and design a universal form of loss function accordingly, which is composed of intensity term and gradient term. For the gradient term, we introduce an adaptive decision block to decide the optimization target of the gradient distribution according to the texture richness at the pixel scale, so as to guide the fused image to contain richer texture details. For the intensity term, we adjust the weight of each intensity loss term to change the proportion of intensity information from different images, so that it can be adapted to multiple image fusion tasks. Secondly, we introduce the idea of squeeze and decomposition into image fusion. Specifically, we consider not only the squeeze process from source images to the fused result, but also the decomposition process from the fused result to source images. Because the quality of decomposed images directly depends on the fused result, it can force the fused result to contain more scene details. Experimental results demonstrate the superiority of our method over the state-of-the-arts in terms of subjective visual effect and quantitative metrics in a variety of fusion tasks. Moreover, our method is much faster than the state-of-the-arts, which can deal with real-time fusion tasks.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed SSC yields better mapping results than state-of-the-art methods, and the utilized spectral properties are extracted directly by spectral imagery, thus avoiding the spectral unmixing errors.
Abstract: Due to the influences of imaging conditions, spectral imagery can be coarse and contain a large number of mixed pixels. These mixed pixels can lead to inaccuracies in the land-cover class (LC) mapping. Super-resolution mapping (SRM) can be used to analyze such mixed pixels and obtain the LC mapping information at the subpixel level. However, traditional SRM methods mostly rely on spatial correlation based on linear distance, which ignores the influences of nonlinear imaging conditions. In addition, spectral unmixing errors affect the accuracy of utilized spectral properties. In order to overcome the influence of linear and nonlinear imaging conditions and utilize more accurate spectral properties, the SRM based on spatial–spectral correlation (SSC) is proposed in this work. Spatial correlation is obtained using the mixed spatial attraction model (MSAM) based on the linear Euclidean distance. Besides, a spectral correlation that utilizes spectral properties based on the nonlinear Kullback–Leibler distance (KLD) is proposed. Spatial and spectral correlations are combined to reduce the influences of linear and nonlinear imaging conditions, which results in an improved mapping result. The utilized spectral properties are extracted directly by spectral imagery, thus avoiding the spectral unmixing errors. Experimental results on the three spectral images show that the proposed SSC yields better mapping results than state-of-the-art methods.

Journal ArticleDOI
26 Jul 2021
TL;DR: Li et al. as discussed by the authors presented a method for automatic extrinsic calibration of high-resolution LiDARs and RGB cameras in targetless environments, which can achieve pixel-level accuracy by aligning natural edge features in the two sensors.
Abstract: In this letter, we present a novel method for automatic extrinsic calibration of high-resolution LiDARs and RGB cameras in targetless environments. Our approach does not require checkerboards but can achieve pixel-level accuracy by aligning natural edge features in the two sensors. On the theory level, we analyze the constraints imposed by edge features and the sensitivity of calibration accuracy with respect to edge distribution in the scene. On the implementation level, we carefully investigate the physical measuring principles of LiDARs and propose an efficient and accurate LiDAR edge extraction method based on point cloud voxel cutting and plane fitting. Due to the edges’ richness in natural scenes, we have carried out experiments in many indoor and outdoor scenes. The results show that this method has high robustness, accuracy, and consistency. It can promote the research and application of the fusion between LiDAR and camera. We have open sourced our code on GitHub 1 to benefit the community.

Journal ArticleDOI
TL;DR: An autonomous hyperspectral anomaly detection network (Auto-AD) is proposed, in which the background is reconstructed by the network and the anomalies appear as reconstruction errors, which confirms the effectiveness of the proposed Auto-AD method.
Abstract: Hyperspectral anomaly detection is aimed at detecting observations that differ from their surroundings, and is an active area of research in hyperspectral image processing. Recently, autoencoders (AEs) have been applied in hyperspectral anomaly detection; however, the existing AE-based methods are complicated and involve manual parameter setting and preprocessing and/or postprocessing procedures. In this article, an autonomous hyperspectral anomaly detection network (Auto-AD) is proposed, in which the background is reconstructed by the network and the anomalies appear as reconstruction errors. Specifically, through a fully convolutional AE with skip connections, the background can be reconstructed while the anomalies are difficult to reconstruct, since the anomalies are relatively small compared to the background and have a low probability of occurring in the image. To further suppress the anomaly reconstruction, an adaptive-weighted loss function is designed, where the weights of potential anomalous pixels with large reconstruction errors are reduced during training. As a result, the anomalies have a higher contrast with the background in the map of reconstruction errors. The experimental results obtained on a public airborne data set and two unmanned aerial vehicle-borne hyperspectral data sets confirm the effectiveness of the proposed Auto-AD method.

Journal ArticleDOI
Chunfang Deng1, Mengmeng Wang1, Liang Liu1, Yong Liu, Yunliang Jiang 
TL;DR: Wang et al. as discussed by the authors proposed an extended feature pyramid network (EFPN) with an extra high-resolution pyramid level specialized for small object detection, which is used to super-resolve features and extract credible regional details simultaneously.
Abstract: Small object detection remains an unsolved challenge because it is hard to extract information of small objects with only a few pixels. While scale-level corresponding detection in feature pyramid network alleviates this problem, we find feature coupling of various scales still impairs the performance of small objects. In this paper, we propose an extended feature pyramid network (EFPN) with an extra high-resolution pyramid level specialized for small object detection. Specifically, we design a novel module, named feature texture transfer (FTT), which is used to super-resolve features and extract credible regional details simultaneously. Moreover, we introduce a cross resolution distillation mechanism to transfer the ability of perceiving details across the scales of the network, where a foreground-background-balanced loss function is designed to alleviate area imbalance of foreground and background. In our experiments, the proposed EFPN is efficient on both computation and memory, and yields state-of-the-art results on small traffic-sign dataset Tsinghua-Tencent 100K and small category of general object detection dataset MS COCO.

Journal ArticleDOI
TL;DR: This method is the first to use HSV color space for underwater image enhancement based on deep learning and efficiently and effectively integrate both RGB Color Space and HSV Color Space in one single CNN.
Abstract: Underwater image enhancement has attracted much attention due to the rise of marine resource development in recent years. Benefit from the powerful representation capabilities of Convolution Neural Networks(CNNs), multiple underwater image enhancement algorithms based on CNNs have been proposed in the past few years. However, almost all of these algorithms employ RGB color space setting, which is insensitive to image properties such as luminance and saturation. To address this problem, we proposed Underwater Image Enhancement Convolution Neural Network using 2 Color Space (UICE^2-Net) that efficiently and effectively integrate both RGB Color Space and HSV Color Space in one single CNN. To our best knowledge, this method is the first one to use HSV color space for underwater image enhancement based on deep learning. UIEC^2-Net is an end-to-end trainable network, consisting of three blocks as follow: a RGB pixel-level block implements fundamental operations such as denoising and removing color cast, a HSV global-adjust block for globally adjusting underwater image luminance, color and saturation by adopting a novel neural curve layer, and an attention map block for combining the advantages of RGB and HSV block output images by distributing weight to each pixel. Experimental results on synthetic and real-world underwater images show that the proposed method has good performance in both subjective comparisons and objective metrics. The code is available at https://github.com/BIGWangYuDong/UWEnhancement .

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed fusion solution, i.e., SEDRFuse, outperforms the state-of-the-art fusion methods in terms of both subjective and objective evaluations.
Abstract: Image fusion is an important task for computer vision as a diverse range of applications are benefiting from the fusion operation. The existing image fusion methods are largely implemented at the pixel level, which may introduce artifacts and/or inconsistencies, while the computational complexity is relatively high. In this article, we propose a symmetric encoder–decoder with residual block (SEDRFuse) network to fuse infrared and visible images for night vision applications. At the training stage, the SEDRFuse network is trained to create a fixed feature extractor. At the fusing stage, the trained extractor is utilized to extract the intermediate and compensation features, which are generated by the residual block and the first two convolutional layers from the input source images, respectively. Two attention maps, which are derived from the intermediate features, are then multiplied by the intermediate features for fusion. The salient compensation features obtained through elementwise selection are passed to the corresponding deconvolutional layers for processing. Finally, the fused intermediate features and the selected compensation features are decoded to reconstruct the fused image. Experimental results demonstrate that the proposed fusion solution, i.e., SEDRFuse, outperforms the state-of-the-art fusion methods in terms of both subjective and objective evaluations.

Proceedings Article
03 Mar 2021
TL;DR: A new simple approach for image compression: instead of storing the RGB values for each pixel of an image, the weights of a neural network overfitted to the image are stored, and this approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights.
Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

Journal ArticleDOI
TL;DR: Experimental results reveal that the proposed algorithm attains high robustness and improved security to the watermarked image against various kinds of attacks.
Abstract: Nowadays secure medical image watermarking had become a stringent task in telemedicine. This paper presents a novel medical image watermarking method by fuzzy based Region of Interest (ROI) selection and wavelet transformation approach to embed encrypted watermark. First, the source image will undergo fuzzification to determine the critical points through central and final intensity along the radial line for selecting region of interest (ROI). Second, watermark image is altered to time-frequency domain through wavelet decomposition where the sub-bands are swapped based on the magnitude value obtained through logistic mapping. In the each sub-band all the pixels get swapped, results in fully encrypted image which guarantees the watermark to a secure, reliable and an unbreakable form. In order to provide more robustness to watermark image, singular values are obtained for encrypted watermark image and key component is calculated for avoiding false positive error. Singular values of the source and watermark image are modified through key component. Experimental results reveal that the proposed algorithm attains high robustness and improved security to the watermarked image against various kinds of attacks.


Journal ArticleDOI
TL;DR: This paper proposes a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel, and shows that the weighted averaging process with sparsely sampled 3 × 3 kernels outperforms the state of the art by a significant margin in all cases.
Abstract: Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size $$640 \times 480$$ . We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled $$3 \times 3$$ kernels outperforms the state of the art by a significant margin in all cases.

Journal ArticleDOI
TL;DR: This work proposes a new efficient and high-speed image encryption scheme based on the Bülban chaotic map that is extremely secure and highly fast for real-time image processing at 80 fps (frames per second).
Abstract: In the last decades, a big number of image encryption schemes have been proposed. Most of these schemes reach a high-security level, however, their slow speeds due to their complex process make them unusable in real-time applications. Motivated by this, we propose a new efficient and high-speed image encryption scheme based on the Bulban chaotic map. Unlike most of the existing schemes, we make a wisely use of this simple chaotic map to generate only a few numbers of random rows and columns. Moreover, to further increase the speed, we raise the processing unit from the pixel level to the row/column level. Security of the new scheme is achieved through a substitution-permutation network, where we apply a circular shift of rows and columns to break the strong correlation of adjacent pixels. Then, we combine the XOR operation with the Modulo function to mask the pixels values and prevent any leak of information. High-security tests and simulation analysis have been carried out to demonstrate that the scheme is extremely secure and highly fast for real-time image processing at 80 fps (frames per second).

Journal ArticleDOI
13 Mar 2021-Entropy
TL;DR: In this article, the chaos sequence and wavelet transform value were used to find gaps in the image encryption. But, limited accuracy is one of the disadvantages of this technique, and it is a highly secure and fast method for image encryption, and the simulation and theoretical analysis indicate the proposed scheme's effectiveness and show that this technique is a suitable choice for actual image encryption and improved previous algorithms.
Abstract: In recent decades, image encryption, as one of the significant information security fields, has attracted many researchers and scientists. However, several studies have been performed with different methods, and novel and useful algorithms have been suggested to improve secure image encryption schemes. Nowadays, chaotic methods have been found in diverse fields, such as the design of cryptosystems and image encryption. Chaotic methods-based digital image encryptions are a novel image encryption method. This technique uses random chaos sequences for encrypting images, and it is a highly-secured and fast method for image encryption. Limited accuracy is one of the disadvantages of this technique. This paper researches the chaos sequence and wavelet transform value to find gaps. Thus, a novel technique was proposed for digital image encryption and improved previous algorithms. The technique is run in MATLAB, and a comparison is made in terms of various performance metrics such as the Number of Pixels Change Rate (NPCR), Peak Signal to Noise Ratio (PSNR), Correlation coefficient, and Unified Average Changing Intensity (UACI). The simulation and theoretical analysis indicate the proposed scheme’s effectiveness and show that this technique is a suitable choice for actual image encryption.

Journal ArticleDOI
TL;DR: A bidirectional long short-term memory (Bi-LSTM)-based network is designed for HSI classification and a spatial–spectral attention mechanism is designed and implemented in the proposed Bi-L STM network to emphasize the effective information and reduce the redundant information among spatial-spectral context of pixels, by which the performance of classification can be greatly improved.
Abstract: Deep neural networks have been widely applied to hyperspectral image (HSI) classification areas, in which recurrent neural network (RNN) is one of the most typical networks. Most of the existing RNN-based classifiers treat the spectral signature of pixels as an ordered sequence, in which only unidirectional correlation along the wavelength direction of adjacent bands is considered. However, each band image is related to not only its preceding band images but also its successive band images. In order to fully explore such bidirectional spectral correlation within an HSI, in this article, a bidirectional long short-term memory (Bi-LSTM)-based network is designed for HSI classification. Moreover, a spatial-spectral attention mechanism is designed and implemented in the proposed Bi-LSTM network to emphasize the effective information and reduce the redundant information among spatial-spectral context of pixels, by which the performance of classification can be greatly improved. Experimental results over three benchmark HSIs, i.e., Salinas Valley, Pavia Centre, and Pavia University, demonstrate that our proposed Bi-LSTM obviously outperforms several state-of-the-art unidirectional RNN-based classification algorithms. Moreover, the proposed spatial-spectral attention mechanism can further improve the classification accuracy of our proposed Bi-LSTM algorithm by effectively weighting spatial and spectral context of pixels. The source code of the proposed Bi-LSTM algorithm is available at https://github.com/MeiShaohui/Attention-based-Bidirectional-LSTM-Network.

Journal ArticleDOI
TL;DR: In this article, a bilayer MoS2 phototransistor was used to synthesize an active pixel image sensor array for image sensing applications, which is composed of two-dimensional transition metal dichalcogenides (MoS2).
Abstract: Various large-area growth methods for two-dimensional transition metal dichalcogenides have been developed recently for future electronic and photonic applications. However, they have not yet been employed for synthesizing active pixel image sensors. Here, we report on an active pixel image sensor array with a bilayer MoS2 film prepared via a two-step large-area growth method. The active pixel of image sensor is composed of 2D MoS2 switching transistors and 2D MoS2 phototransistors. The maximum photoresponsivity (Rph) of the bilayer MoS2 phototransistors in an 8 × 8 active pixel image sensor array is statistically measured as high as 119.16 A W−1. With the aid of computational modeling, we find that the main mechanism for the high Rph of the bilayer MoS2 phototransistor is a photo-gating effect by the holes trapped at subgap states. The image-sensing characteristics of the bilayer MoS2 active pixel image sensor array are successfully investigated using light stencil projection. Here, the authors report the realization of an active pixel image sensor array composed by 64 pairs of switching transistors and phototransistors, based on wafer-scale bilayer MoS2. The device exhibits sensitive photoresponse under RGB light illumination, showing the potential of 2D MoS2 for image sensing applications.

Journal ArticleDOI
TL;DR: In order to solve the issue mismatching and structure disconnecting in exemplar-based image inpainting, an image completion algorithm based on improved total variation minimization method had been proposed in the paper, refer as ETVM.
Abstract: In order to solve the issue mismatching and structure disconnecting in exemplar-based image inpainting, an image completion algorithm based on improved total variation minimization method had been proposed in the paper, refer as ETVM. The structure of image had been extracted using improved total variation minimization method, and the known information of image is sufficiently used by existing methods. The robust filling mechanism can be achieved according to the direction of image structure and it has less noise than original image. The priority term had been redefined to eliminate the product effect and ensure data term had always effective. The priority of repairing patch and the best matching patch are determined by the similarity of the known information and the consistency of the unknown information in the repairing patch. The comparisons with cognitive computing image algorithms had been shown that the proposed method can ensure better selection of candidate image pixel to fill with, and it is achieved better global coherence of image completion than others. The inpainting results of noisy images show that the proposed method has good robustness and can also get good inpainting results for noisy images.

Journal ArticleDOI
TL;DR: The proposed unsupervised pansharpening method in a deep-learning framework is able to reconstruct sharper MSI of different types, with more details and less spectral distortion compared with the state-of-the-art.
Abstract: Pansharpening is to fuse a multispectral image (MSI) of low-spatial-resolution (LR) but rich spectral characteristics with a panchromatic image (PAN) of high spatial resolution (HR) but poor spectral characteristics. Traditional methods usually inject the extracted high-frequency details from PAN into the upsampled MSI. Recent deep learning endeavors are mostly supervised assuming that the HR MSI is available, which is unrealistic especially for satellite images. Nonetheless, these methods could not fully exploit the rich spectral characteristics in the MSI. Due to the wide existence of mixed pixels in satellite images where each pixel tends to cover more than one constituent material, pansharpening at the subpixel level becomes essential. In this article, we propose an unsupervised pansharpening (UP) method in a deep-learning framework to address the abovementioned challenges based on the self-attention mechanism (SAM), referred to as UP-SAM. The contribution of this article is threefold. First, the SAM is proposed where the spatial varying detail extraction and injection functions are estimated according to the attention representations indicating spectral characteristics of the MSI with subpixel accuracy. Second, such attention representations are derived from mixed pixels with the proposed stacked attention network powered with a stick-breaking structure to meet the physical constraints of mixed pixel formulations. Third, the detail extraction and injection functions are spatial varying based on the attention representations, which largely improves the reconstruction accuracy. Extensive experimental results demonstrate that the proposed approach is able to reconstruct sharper MSI of different types, with more details and less spectral distortion compared with the state-of-the-art.

Journal ArticleDOI
TL;DR: The proposed SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics and utilizes the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios.
Abstract: Feature distortions of data are a typical problem in remote sensing image classification, especially in the area of transfer learning. In addition, many transfer learning-based methods only focus on spectral information and fail to utilize spatial information of remote sensing images. To tackle these problems, we propose spectral–spatial weighted kernel manifold embedded distribution alignment (SSWK-MEDA) for remote sensing image classification. The proposed method applies a novel spatial information filter to effectively use similarity between nearby sample pixels and avoid the influence of nonsample pixels. Then, a complex kernel combining spatial kernel and spectral kernel with different weights is constructed to adaptively balance the relative importance of spectral and spatial information of the remote sensing image. Finally, we utilize the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios. SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics. Extensive experiments have demonstrated that the proposed method is more effective than several state-of-the-art methods.