Showing papers on "Pixel published in 2021"

PDF

Open Access

Journal Article•DOI•

Remote Sensing Image Change Detection With Transformers

[...]

Hao Chen¹, Zipeng Qi¹, Zhenwei Shi¹•Institutions (1)

20 Jul 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: HuHuang et al. as discussed by the authors proposed a bitemporal image transformer (BIT) to efficiently and effectively model contexts within the spatial-temporal domain, and incorporated BIT in a deep feature differencing-based CD framework.

...read moreread less

Abstract: Modern change detection (CD) has achieved remarkable success by the powerful discriminative ability of deep convolutions. However, high-resolution remote sensing CD remains challenging due to the complexity of objects in the scene. Objects with the same semantic concept may show distinct spectral characteristics at different times and spatial locations. Most recent CD pipelines using pure convolutions are still struggling to relate long-range concepts in space-time. Nonlocal self-attention approaches show promising performance via modeling dense relationships among pixels, yet are computationally inefficient. Here, we propose a bitemporal image transformer (BIT) to efficiently and effectively model contexts within the spatial-temporal domain. Our intuition is that the high-level concepts of the change of interest can be represented by a few visual words, that is, semantic tokens. To achieve this, we express the bitemporal image into a few tokens and use a transformer encoder to model contexts in the compact token-based space-time. The learned context-rich tokens are then fed back to the pixel-space for refining the original features via a transformer decoder. We incorporate BIT in a deep feature differencing-based CD framework. Extensive experiments on three CD datasets demonstrate the effectiveness and efficiency of the proposed method. Notably, our BIT-based model significantly outperforms the purely convolutional baseline using only three times lower computational costs and model parameters. Based on a naive backbone (ResNet18) without sophisticated structures (e.g., feature pyramid network (FPN) and UNet), our model surpasses several state-of-the-art CD methods, including better than four recent attention-based methods in terms of efficiency and accuracy. Our code is available at https://github.com/justchenhao/BIT_CD.

...read moreread less

290 citations

Journal Article•DOI•

SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images

[...]

Sheng Fang¹, Kaiyu Li¹, Jinyuan Shao², Zhe Li¹•Institutions (2)

Shandong University of Science and Technology¹, Chinese Academy of Sciences²

17 Feb 2021-IEEE Geoscience and Remote Sensing Letters

TL;DR: Experimental results show that the proposed SNUNet-CD method improves greatly on many evaluation criteria and has a better tradeoff between accuracy and calculation amount than other state-of-the-art (SOTA) change detection methods.

...read moreread less

Abstract: Change detection is an important task in remote sensing (RS) image analysis. It is widely used in natural disaster monitoring and assessment, land resource planning, and other fields. As a pixel-to-pixel prediction task, change detection is sensitive about the utilization of the original position information. Recent change detection methods always focus on the extraction of deep change semantic feature, but ignore the importance of shallow-layer information containing high-resolution and fine-grained features, this often leads to the uncertainty of the pixels at the edge of the changed target and the determination miss of small targets. In this letter, we propose a densely connected siamese network for change detection, namely SNUNet-CD (the combination of Siamese network and NestedUNet). SNUNet-CD alleviates the loss of localization information in the deep layers of neural network through compact information transmission between encoder and decoder, and between decoder and decoder. In addition, Ensemble Channel Attention Module (ECAM) is proposed for deep supervision. Through ECAM, the most representative features of different semantic levels can be refined and used for the final classification. Experimental results show that our method improves greatly on many evaluation criteria and has a better tradeoff between accuracy and calculation amount than other state-of-the-art (SOTA) change detection methods.

...read moreread less

256 citations

Journal Article•DOI•

Reconstruction by inpainting for visual anomaly detection

[...]

Vitjan Zavrtanik¹, Matej Kristan¹, Danijel Skočaj¹•Institutions (1)

University of Ljubljana¹

01 Apr 2021-Pattern Recognition

TL;DR: The RIAD approach (RIAD) randomly removes partial image regions and reconstructs the image from partial inpaintings, thus addressing the drawbacks of auto-enocoding methods.

...read moreread less

191 citations

Journal Article•DOI•

A deep translation (GAN) based change detection network for optical and SAR remote sensing images

[...]

Xinghua Li¹, Zhengshun Du¹, Yanyuan Huang¹, Zhenyu Tan²•Institutions (2)

Wuhan University¹, Northwest University (China)²

01 Sep 2021-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: A deep translation based change detection network (DTCDN) for optical and SAR images is proposed that utilizes deep context features to separate the unchanged pixels and changed pixels in a supervised CD network.

...read moreread less

Abstract: With the development of space-based imaging technology, a larger and larger number of images with different modalities and resolutions are available. The optical images reflect the abundant spectral information and geometric shape of ground objects, whose qualities are degraded easily in poor atmospheric conditions. Although synthetic aperture radar (SAR) images cannot provide the spectral features of the region of interest (ROI), they can capture all-weather and all-time polarization information. In nature, optical and SAR images encapsulate lots of complementary information, which is of great significance for change detection (CD) in poor weather situations. However, due to the difference in imaging mechanisms of optical and SAR images, it is difficult to conduct their CD directly using the traditional difference or ratio algorithms. Most recent CD methods bring image translation to reduce their difference, but the results are obtained by ordinary algebraic methods and threshold segmentation with limited accuracy. Towards this end, this work proposes a deep translation based change detection network (DTCDN) for optical and SAR images. The deep translation firstly maps images from one domain (e.g., optical) to another domain (e.g., SAR) through a cyclic structure into the same feature space. With the similar characteristics after deep translation, they become comparable. Different from most previous researches, the translation results are imported to a supervised CD network that utilizes deep context features to separate the unchanged pixels and changed pixels. In the experiments, the proposed DTCDN was tested on four representative data sets from Gloucester, California, and Shuguang village. Compared with state-of-the-art methods, the effectiveness and robustness of the proposed method were confirmed.

...read moreread less

166 citations

Proceedings Article•DOI•

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

[...]

Paul-Edouard Sarlin¹, Ajaykumar Unagar¹, Måns Larsson², Hugo Germain³, Carl Toft², Viktor Larsson¹, Marc Pollefeys¹, Vincent Lepetit³, Lars Hammarstrand², Fredrik Kahl², Torsten Sattler² - Show less +7 more•Institutions (3)

ETH Zurich¹, Chalmers University of Technology², École Normale Supérieure³

01 Jun 2021

TL;DR: PixLoc as discussed by the authors aligns multiscale deep features with a 3D model to estimate a 6-DoF pose from an image and 3D models, which can localize in large environments given coarse pose priors.

...read moreread less

Abstract: Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model parameters and scene geometry. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching by jointly refining keypoints and poses with little overhead. The code will be publicly available at github.com/cvg/pixloc.

...read moreread less

157 citations

Journal Article•DOI•

OCNet: Object Context for Semantic Segmentation

[...]

Yuhui Yuan¹, Yuhui Yuan², Lang Huang³, Jianyuan Guo³, Chao Zhang³, Xilin Chen², Jingdong Wang¹ - Show less +3 more•Institutions (3)

Microsoft¹, Chinese Academy of Sciences², Peking University³

24 May 2021-International Journal of Computer Vision

TL;DR: This paper proposes an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices and empirically shows the advantages of this approach with competitive performances on five challenging benchmarks.

...read moreread less

Abstract: In this paper, we address the semantic segmentation task with a new context aggregation scheme named object context, which focuses on enhancing the role of object information. Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. We use a binary relation matrix to represent the relationship between all pixels, where the value one indicates the two selected pixels belong to the same category and zero otherwise. We propose to use a dense relation matrix to serve as a surrogate for the binary relation matrix. The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. Considering that the dense relation matrix estimation requires quadratic computation overhead and memory consumption w.r.t. the input size, we propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling (Zhao et al. 2017) and atrous spatial pyramid pooling (Chen et al. 2018). We empirically show the advantages of our approach with competitive performances on five challenging benchmarks including: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff.

...read moreread less

138 citations

Proceedings Article•DOI•

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

[...]

Hongyu Liu, Ziyu Wan¹, Wei Huang², Yibing Song³, Xintong Han, Jing Liao¹ - Show less +2 more•Institutions (3)

City University of Hong Kong¹, Hunan University², Tencent³

05 May 2021

TL;DR: PD-GAN as mentioned in this paper modulates deep features of input random noise from coarse-to-fine by injecting an initially restored image and the hole regions in multiple scales to generate multiple inpainting results with diverse and visually realistic content.

...read moreread less

Abstract: We propose PD-GAN, a probabilistic diverse GAN for image inpainting. Given an input image with arbitrary hole regions, PD-GAN produces multiple inpainting results with diverse and visually realistic content. Our PD-GAN is built upon a vanilla GAN which generates images based on random noise. During image generation, we modulate deep features of input random noise from coarse-to-fine by injecting an initially restored image and the hole regions in multiple scales. We argue that during hole filling, the pixels near the hole boundary should be more deterministic (i.e., with higher probability trusting the context and initially restored image to create natural inpainting boundary), while those pixels lie in the center of the hole should enjoy more degrees of freedom (i.e., more likely to depend on the random noise for enhancing diversity). To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information. SPDNorm dynamically balances the realism and diversity inside the hole region, making the generated content more diverse towards the hole center and resemble neighboring image content more towards the hole boundary. Meanwhile, we propose a perceptual diversity loss to further empower PD-GAN for diverse content generation. Experiments on benchmark datasets including CelebA-HQ, Places2 and Paris Street View indicate that PD-GAN is effective for diverse and visually realistic image restoration.

...read moreread less

138 citations

Book•

Graph Spectral Image Processing

[...]

Gene Cheung¹, Enrico Magli², Yuichi Tanaka³, Michael K. Ng⁴•Institutions (4)

National Institute of Informatics¹, Polytechnic University of Turin², Tokyo University of Agriculture and Technology³, Hong Kong Baptist University⁴

24 Aug 2021

TL;DR: In this article, the authors overview graph spectral techniques in graph signal processing (GSP) specifically for image/video processing, including image compression, image restoration, image filtering, and image segmentation.

...read moreread less

Abstract: Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2-D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this paper, we overview recent graph spectral techniques in GSP specifically for image/video processing. The topics covered include image compression, image restoration, image filtering, and image segmentation.

...read moreread less

126 citations

Journal Article•DOI•

MFF-GAN: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion

[...]

Hao Zhang¹, Zhuliang Le¹, Zhenfeng Shao¹, Han Xu¹, Jiayi Ma¹ - Show less +1 more•Institutions (1)

Wuhan University¹

01 Feb 2021-Information Fusion

TL;DR: A new generative adversarial network with adaptive and gradient joint constraints to fuse multi-focus images is presented with the superiority of the method over the state-of-the-art in terms of both subjective visual effect and quantitative metrics.

...read moreread less

125 citations

Proceedings Article•DOI•

Learning Continuous Image Representation with Local Implicit Image Function

[...]

Yinbo Chen¹, Sifei Liu², Xiaolong Wang¹•Institutions (2)

University of California, San Diego¹, Nvidia²

01 Jun 2021

TL;DR: LiFeng et al. as discussed by the authors proposed Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output.

...read moreread less

Abstract: How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with superresolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to ×30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. Our project page with code is at https://yinboc.github.io/liif/.

...read moreread less

124 citations

Journal Article•DOI•

A universal 3D imaging sensor on a silicon photonics platform

[...]

Christopher Rogers, Alexander Y. Piggott, David J. Thomson¹, Robert Francis Wiser, Ion E. Opris, Steven A. Fortune, Andrew J. Compston, Alexander Gondarenko, Fanfan Meng¹, Xia Chen¹, Graham T. Reed¹, Remus Nicolaescu - Show less +8 more•Institutions (1)

University of Southampton¹

10 Feb 2021-Nature

TL;DR: This result paves the way for the development and proliferation of low-cost, compact and high-performance 3D imaging cameras that could be used in applications from robotics and autonomous navigation to augmented reality and healthcare.

...read moreread less

Abstract: Accurate three-dimensional (3D) imaging is essential for machines to map and interact with the physical world1,2 Although numerous 3D imaging technologies exist, each addressing niche applications with varying degrees of success, none has achieved the breadth of applicability and impact that digital image sensors have in the two-dimensional imaging world3–10 A large-scale two-dimensional array of coherent detector pixels operating as a light detection and ranging system could serve as a universal 3D imaging platform Such a system would offer high depth accuracy and immunity to interference from sunlight, as well as the ability to measure the velocity of moving objects directly11 Owing to difficulties in providing electrical and photonic connections to every pixel, previous systems have been restricted to fewer than 20 pixels12–15 Here we demonstrate the operation of a large-scale coherent detector array, consisting of 512 pixels, in a 3D imaging system Leveraging recent advances in the monolithic integration of photonic and electronic circuits, a dense array of optical heterodyne detectors is combined with an integrated electronic readout architecture, enabling straightforward scaling to arbitrarily large arrays Two-axis solid-state beam steering eliminates any trade-off between field of view and range Operating at the quantum noise limit16,17, our system achieves an accuracy of 31 millimetres at a distance of 75 metres when using only 4 milliwatts of light, an order of magnitude more accurate than existing solid-state systems at such ranges Future reductions of pixel size using state-of-the-art components could yield resolutions in excess of 20 megapixels for arrays the size of a consumer camera sensor This result paves the way for the development and proliferation of low-cost, compact and high-performance 3D imaging cameras that could be used in applications from robotics and autonomous navigation to augmented reality and healthcare A compact, high-performance silicon photonics-based light detection and ranging system for three-dimensional imaging is developed that should be amenable to low-cost mass manufacturing

...read moreread less

Journal Article•DOI•

Color image encryption using orthogonal Latin squares and a new 2D chaotic system

[...]

Zhongyun Hua¹, Zhihua Zhu¹, Yongyong Chen¹, Yuanman Li²•Institutions (2)

Harbin Institute of Technology¹, Shenzhen University²

02 May 2021-Nonlinear Dynamics

TL;DR: This paper proposes a new color image encryption algorithm (CIEA) that sufficiently considers the properties of the color image and Latin square and designs a two-dimensional chaotic system called 2D-LSM to address the weaknesses of existing chaotic systems.

...read moreread less

Abstract: Recently, many image encryption schemes have been developed using Latin squares. When encrypting a color image, these algorithms treat the color image as three greyscale images and encrypt these greyscale images one by one using the Latin squares. Obviously, these algorithms do not sufficiently consider the inner connections between the color image and Latin square and thus result in many redundant operations and low efficiency. To address this issue, in this paper, we propose a new color image encryption algorithm (CIEA) that sufficiently considers the properties of the color image and Latin square. First, we propose a two-dimensional chaotic system called 2D-LSM to address the weaknesses of existing chaotic systems. Then, we design a new CIEA using orthogonal Latin squares and 2D-LSM. The proposed CIEA can make full use of the inherent connections of the orthogonal Latin squares and color image and executes the encryption process in the pixel level. Simulation and security analysis results show that the proposed CIEA has a high level of security and can outperform some representative image encryption algorithms.

...read moreread less

Journal Article•DOI•

SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion

[...]

Hao Zhang¹, Jiayi Ma¹•Institutions (1)

Wuhan University¹

30 Jul 2021-International Journal of Computer Vision

TL;DR: A squeeze-and-decomposition network (SDNet) is proposed to realize multi-modal and digital photography image fusion in real time and is much faster than the state-of-the-arts, which can deal with real-time fusion tasks.

...read moreread less

Abstract: In this paper, a squeeze-and-decomposition network (SDNet) is proposed to realize multi-modal and digital photography image fusion in real time. Firstly, we generally transform multiple fusion problems into the extraction and reconstruction of gradient and intensity information, and design a universal form of loss function accordingly, which is composed of intensity term and gradient term. For the gradient term, we introduce an adaptive decision block to decide the optimization target of the gradient distribution according to the texture richness at the pixel scale, so as to guide the fused image to contain richer texture details. For the intensity term, we adjust the weight of each intensity loss term to change the proportion of intensity information from different images, so that it can be adapted to multiple image fusion tasks. Secondly, we introduce the idea of squeeze and decomposition into image fusion. Specifically, we consider not only the squeeze process from source images to the fused result, but also the decomposition process from the fused result to source images. Because the quality of decomposed images directly depends on the fused result, it can force the fused result to contain more scene details. Experimental results demonstrate the superiority of our method over the state-of-the-arts in terms of subjective visual effect and quantitative metrics in a variety of fusion tasks. Moreover, our method is much faster than the state-of-the-arts, which can deal with real-time fusion tasks.

...read moreread less

Journal Article•DOI•

Super-Resolution Mapping Based on Spatial–Spectral Correlation for Spectral Imagery

[...]

Peng Wang¹, Liguo Wang², Henry Leung³, Gong Zhang¹•Institutions (3)

Nanjing University of Aeronautics and Astronautics¹, Minzu University of China², University of Calgary³

01 Mar 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Experimental results show that the proposed SSC yields better mapping results than state-of-the-art methods, and the utilized spectral properties are extracted directly by spectral imagery, thus avoiding the spectral unmixing errors.

...read moreread less

Abstract: Due to the influences of imaging conditions, spectral imagery can be coarse and contain a large number of mixed pixels. These mixed pixels can lead to inaccuracies in the land-cover class (LC) mapping. Super-resolution mapping (SRM) can be used to analyze such mixed pixels and obtain the LC mapping information at the subpixel level. However, traditional SRM methods mostly rely on spatial correlation based on linear distance, which ignores the influences of nonlinear imaging conditions. In addition, spectral unmixing errors affect the accuracy of utilized spectral properties. In order to overcome the influence of linear and nonlinear imaging conditions and utilize more accurate spectral properties, the SRM based on spatial–spectral correlation (SSC) is proposed in this work. Spatial correlation is obtained using the mixed spatial attraction model (MSAM) based on the linear Euclidean distance. Besides, a spectral correlation that utilizes spectral properties based on the nonlinear Kullback–Leibler distance (KLD) is proposed. Spatial and spectral correlations are combined to reduce the influences of linear and nonlinear imaging conditions, which results in an improved mapping result. The utilized spectral properties are extracted directly by spectral imagery, thus avoiding the spectral unmixing errors. Experimental results on the three spectral images show that the proposed SSC yields better mapping results than state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Pixel-Level Extrinsic Self Calibration of High Resolution LiDAR and Camera in Targetless Environments

[...]

Chongjian Yuan¹, Xiyuan Liu¹, Xiaoping Hong², Fu Zhang¹•Institutions (2)

University of Hong Kong¹, Southern University of Science and Technology²

26 Jul 2021

TL;DR: Li et al. as discussed by the authors presented a method for automatic extrinsic calibration of high-resolution LiDARs and RGB cameras in targetless environments, which can achieve pixel-level accuracy by aligning natural edge features in the two sensors.

...read moreread less

Abstract: In this letter, we present a novel method for automatic extrinsic calibration of high-resolution LiDARs and RGB cameras in targetless environments. Our approach does not require checkerboards but can achieve pixel-level accuracy by aligning natural edge features in the two sensors. On the theory level, we analyze the constraints imposed by edge features and the sensitivity of calibration accuracy with respect to edge distribution in the scene. On the implementation level, we carefully investigate the physical measuring principles of LiDARs and propose an efficient and accurate LiDAR edge extraction method based on point cloud voxel cutting and plane fitting. Due to the edges’ richness in natural scenes, we have carried out experiments in many indoor and outdoor scenes. The results show that this method has high robustness, accuracy, and consistency. It can promote the research and application of the fusion between LiDAR and camera. We have open sourced our code on GitHub 1 to benefit the community.

...read moreread less

Journal Article•DOI•

Auto-AD: Autonomous Hyperspectral Anomaly Detection Network Based on Fully Convolutional Autoencoder

[...]

Shaoyu Wang¹, Xinyu Wang¹, Liangpei Zhang¹, Yanfei Zhong¹•Institutions (1)

Wuhan University¹

22 Mar 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: An autonomous hyperspectral anomaly detection network (Auto-AD) is proposed, in which the background is reconstructed by the network and the anomalies appear as reconstruction errors, which confirms the effectiveness of the proposed Auto-AD method.

...read moreread less

Abstract: Hyperspectral anomaly detection is aimed at detecting observations that differ from their surroundings, and is an active area of research in hyperspectral image processing. Recently, autoencoders (AEs) have been applied in hyperspectral anomaly detection; however, the existing AE-based methods are complicated and involve manual parameter setting and preprocessing and/or postprocessing procedures. In this article, an autonomous hyperspectral anomaly detection network (Auto-AD) is proposed, in which the background is reconstructed by the network and the anomalies appear as reconstruction errors. Specifically, through a fully convolutional AE with skip connections, the background can be reconstructed while the anomalies are difficult to reconstruct, since the anomalies are relatively small compared to the background and have a low probability of occurring in the image. To further suppress the anomaly reconstruction, an adaptive-weighted loss function is designed, where the weights of potential anomalous pixels with large reconstruction errors are reduced during training. As a result, the anomalies have a higher contrast with the background in the map of reconstruction errors. The experimental results obtained on a public airborne data set and two unmanned aerial vehicle-borne hyperspectral data sets confirm the effectiveness of the proposed Auto-AD method.

...read moreread less

Journal Article•DOI•

Extended Feature Pyramid Network for Small Object Detection

[...]

Chunfang Deng¹, Mengmeng Wang¹, Liang Liu¹, Yong Liu, Yunliang Jiang - Show less +1 more•Institutions (1)

Zhejiang University¹

20 Apr 2021-IEEE Transactions on Multimedia

TL;DR: Wang et al. as discussed by the authors proposed an extended feature pyramid network (EFPN) with an extra high-resolution pyramid level specialized for small object detection, which is used to super-resolve features and extract credible regional details simultaneously.

...read moreread less

Abstract: Small object detection remains an unsolved challenge because it is hard to extract information of small objects with only a few pixels. While scale-level corresponding detection in feature pyramid network alleviates this problem, we find feature coupling of various scales still impairs the performance of small objects. In this paper, we propose an extended feature pyramid network (EFPN) with an extra high-resolution pyramid level specialized for small object detection. Specifically, we design a novel module, named feature texture transfer (FTT), which is used to super-resolve features and extract credible regional details simultaneously. Moreover, we introduce a cross resolution distillation mechanism to transfer the ability of perceiving details across the scales of the network, where a foreground-background-balanced loss function is designed to alleviate area imbalance of foreground and background. In our experiments, the proposed EFPN is efficient on both computation and memory, and yields state-of-the-art results on small traffic-sign dataset Tsinghua-Tencent 100K and small category of general object detection dataset MS COCO.

...read moreread less

Journal Article•DOI•

UIEC^2-Net: CNN-based underwater image enhancement using two color space

[...]

Yudong Wang¹, Jichang Guo¹, Huan Gao¹, Huihui Yue¹•Institutions (1)

Tianjin University¹

01 Aug 2021-Signal Processing-image Communication

TL;DR: This method is the first to use HSV color space for underwater image enhancement based on deep learning and efficiently and effectively integrate both RGB Color Space and HSV Color Space in one single CNN.

...read moreread less

Abstract: Underwater image enhancement has attracted much attention due to the rise of marine resource development in recent years. Benefit from the powerful representation capabilities of Convolution Neural Networks(CNNs), multiple underwater image enhancement algorithms based on CNNs have been proposed in the past few years. However, almost all of these algorithms employ RGB color space setting, which is insensitive to image properties such as luminance and saturation. To address this problem, we proposed Underwater Image Enhancement Convolution Neural Network using 2 Color Space (UICE^2-Net) that efficiently and effectively integrate both RGB Color Space and HSV Color Space in one single CNN. To our best knowledge, this method is the first one to use HSV color space for underwater image enhancement based on deep learning. UIEC^2-Net is an end-to-end trainable network, consisting of three blocks as follow: a RGB pixel-level block implements fundamental operations such as denoising and removing color cast, a HSV global-adjust block for globally adjusting underwater image luminance, color and saturation by adopting a novel neural curve layer, and an attention map block for combining the advantages of RGB and HSV block output images by distributing weight to each pixel. Experimental results on synthetic and real-world underwater images show that the proposed method has good performance in both subjective comparisons and objective metrics. The code is available at https://github.com/BIGWangYuDong/UWEnhancement .

...read moreread less

Journal Article•DOI•

SEDRFuse: A Symmetric Encoder–Decoder With Residual Block Network for Infrared and Visible Image Fusion

[...]

Lihua Jian¹, Xiaomin Yang¹, Zheng Liu², Gwanggil Jeon³, Mingliang Gao⁴, David Chisholm² - Show less +2 more•Institutions (4)

Sichuan University¹, University of British Columbia², Xidian University³, Shandong University of Technology⁴

01 Jan 2021-IEEE Transactions on Instrumentation and Measurement

TL;DR: Experimental results demonstrate that the proposed fusion solution, i.e., SEDRFuse, outperforms the state-of-the-art fusion methods in terms of both subjective and objective evaluations.

...read moreread less

Abstract: Image fusion is an important task for computer vision as a diverse range of applications are benefiting from the fusion operation. The existing image fusion methods are largely implemented at the pixel level, which may introduce artifacts and/or inconsistencies, while the computational complexity is relatively high. In this article, we propose a symmetric encoder–decoder with residual block (SEDRFuse) network to fuse infrared and visible images for night vision applications. At the training stage, the SEDRFuse network is trained to create a fixed feature extractor. At the fusing stage, the trained extractor is utilized to extract the intermediate and compensation features, which are generated by the residual block and the first two convolutional layers from the input source images, respectively. Two attention maps, which are derived from the intermediate features, are then multiplied by the intermediate features for fusion. The salient compensation features obtained through elementwise selection are passed to the corresponding deconvolutional layers for processing. Finally, the fused intermediate features and the selected compensation features are decoded to reconstruct the fused image. Experimental results demonstrate that the proposed fusion solution, i.e., SEDRFuse, outperforms the state-of-the-art fusion methods in terms of both subjective and objective evaluations.

...read moreread less

Proceedings Article•

COIN: COmpression with Implicit Neural representations

[...]

Emilien Dupont¹, Adam Golinski¹, Milad Alizadeh², Yee Whye Teh¹, Arnaud Doucet¹ - Show less +1 more•Institutions (2)

University of Oxford¹, Qualcomm²

03 Mar 2021

TL;DR: A new simple approach for image compression: instead of storing the RGB values for each pixel of an image, the weights of a neural network overfitted to the image are stored, and this approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights.

...read moreread less

Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

...read moreread less

Journal Article•DOI•

A fuzzy based ROI selection for encryption and watermarking in medical image using DWT and SVD

[...]

K. Balasamy¹, S. Suganyadevi²•Institutions (2)

Dr. Mahalingam College of Engineering and Technology¹, Avinashilingam Institute for Home Science and Higher Education for Women²

01 Feb 2021-Multimedia Tools and Applications

TL;DR: Experimental results reveal that the proposed algorithm attains high robustness and improved security to the watermarked image against various kinds of attacks.

...read moreread less

Abstract: Nowadays secure medical image watermarking had become a stringent task in telemedicine. This paper presents a novel medical image watermarking method by fuzzy based Region of Interest (ROI) selection and wavelet transformation approach to embed encrypted watermark. First, the source image will undergo fuzzification to determine the critical points through central and final intensity along the radial line for selecting region of interest (ROI). Second, watermark image is altered to time-frequency domain through wavelet decomposition where the sub-bands are swapped based on the magnitude value obtained through logistic mapping. In the each sub-band all the pixels get swapped, results in fully encrypted image which guarantees the watermark to a secure, reliable and an unbreakable form. In order to provide more robustness to watermark image, singular values are obtained for encrypted watermark image and key component is calculated for avoiding false positive error. Singular values of the source and watermark image are modified through key component. Experimental results reveal that the proposed algorithm attains high robustness and improved security to the watermarked image against various kinds of attacks.

...read moreread less

Journal Article•DOI•

Low-Rank and Sparse Representation for Hyperspectral Image Processing: A Review

[...]

Jiangtao Peng¹, Weiwei Sun², Heng-Chao Li³, Wei Li⁴, Xiangchao Meng², Chiru Ge⁵, Qian Du⁶ - Show less +3 more•Institutions (6)

Hubei University¹, Ningbo University², Southwest Jiaotong University³, Beijing Institute of Technology⁴, Shandong Normal University⁵, Mississippi State University⁶

10 Jun 2021-IEEE Geoscience and Remote Sensing Magazine

Journal Article•DOI•

Deformable Kernel Networks for Joint Image Filtering

[...]

Beom Jun Kim¹, Jean Ponce², Bumsub Ham¹•Institutions (2)

Yonsei University¹, French Institute for Research in Computer Science and Automation²

01 Feb 2021-International Journal of Computer Vision

TL;DR: This paper proposes a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel, and shows that the weighted averaging process with sparsely sampled 3 × 3 kernels outperforms the state of the art by a significant margin in all cases.

...read moreread less

Abstract: Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size $$640 \times 480$$ . We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled $$3 \times 3$$ kernels outperforms the state of the art by a significant margin in all cases.

...read moreread less

Journal Article•DOI•

Fast image encryption algorithm with high security level using the Bülban chaotic map

[...]

Mohamed Zakariya Talhaoui¹, Xingyuan Wang¹, Xingyuan Wang², Mohamed Amine Midoun¹•Institutions (2)

Dalian University of Technology¹, Dalian Maritime University²

01 Feb 2021-Journal of Real-time Image Processing

TL;DR: This work proposes a new efficient and high-speed image encryption scheme based on the Bülban chaotic map that is extremely secure and highly fast for real-time image processing at 80 fps (frames per second).

...read moreread less

Abstract: In the last decades, a big number of image encryption schemes have been proposed. Most of these schemes reach a high-security level, however, their slow speeds due to their complex process make them unusable in real-time applications. Motivated by this, we propose a new efficient and high-speed image encryption scheme based on the Bulban chaotic map. Unlike most of the existing schemes, we make a wisely use of this simple chaotic map to generate only a few numbers of random rows and columns. Moreover, to further increase the speed, we raise the processing unit from the pixel level to the row/column level. Security of the new scheme is achieved through a substitution-permutation network, where we apply a circular shift of rows and columns to break the strong correlation of adjacent pixels. Then, we combine the XOR operation with the Modulo function to mask the pixels values and prevent any leak of information. High-security tests and simulation analysis have been carried out to demonstrate that the scheme is extremely secure and highly fast for real-time image processing at 80 fps (frames per second).

...read moreread less

Journal Article•DOI•

A New Algorithm for Digital Image Encryption Based on Chaos Theory

[...]

Yaghoub Pourasad¹, Ramin Ranjbarzadeh², Abbas Mardani³•Institutions (3)

Urmia University of Technology¹, University of Gilan², University of South Florida³

13 Mar 2021-Entropy

TL;DR: In this article, the chaos sequence and wavelet transform value were used to find gaps in the image encryption. But, limited accuracy is one of the disadvantages of this technique, and it is a highly secure and fast method for image encryption, and the simulation and theoretical analysis indicate the proposed scheme's effectiveness and show that this technique is a suitable choice for actual image encryption and improved previous algorithms.

...read moreread less

Abstract: In recent decades, image encryption, as one of the significant information security fields, has attracted many researchers and scientists. However, several studies have been performed with different methods, and novel and useful algorithms have been suggested to improve secure image encryption schemes. Nowadays, chaotic methods have been found in diverse fields, such as the design of cryptosystems and image encryption. Chaotic methods-based digital image encryptions are a novel image encryption method. This technique uses random chaos sequences for encrypting images, and it is a highly-secured and fast method for image encryption. Limited accuracy is one of the disadvantages of this technique. This paper researches the chaos sequence and wavelet transform value to find gaps. Thus, a novel technique was proposed for digital image encryption and improved previous algorithms. The technique is run in MATLAB, and a comparison is made in terms of various performance metrics such as the Number of Pixels Change Rate (NPCR), Peak Signal to Noise Ratio (PSNR), Correlation coefficient, and Unified Average Changing Intensity (UACI). The simulation and theoretical analysis indicate the proposed scheme’s effectiveness and show that this technique is a suitable choice for actual image encryption.

...read moreread less

Journal Article•DOI•

Hyperspectral Image Classification Using Attention-Based Bidirectional Long Short-Term Memory Network

[...]

Shaohui Mei¹, Xingang Li¹, Xiao Liu¹, Huimin Cai, Qian Du² - Show less +1 more•Institutions (2)

Northwestern Polytechnical University¹, Mississippi State University²

11 Aug 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A bidirectional long short-term memory (Bi-LSTM)-based network is designed for HSI classification and a spatial–spectral attention mechanism is designed and implemented in the proposed Bi-L STM network to emphasize the effective information and reduce the redundant information among spatial-spectral context of pixels, by which the performance of classification can be greatly improved.

...read moreread less

Abstract: Deep neural networks have been widely applied to hyperspectral image (HSI) classification areas, in which recurrent neural network (RNN) is one of the most typical networks. Most of the existing RNN-based classifiers treat the spectral signature of pixels as an ordered sequence, in which only unidirectional correlation along the wavelength direction of adjacent bands is considered. However, each band image is related to not only its preceding band images but also its successive band images. In order to fully explore such bidirectional spectral correlation within an HSI, in this article, a bidirectional long short-term memory (Bi-LSTM)-based network is designed for HSI classification. Moreover, a spatial-spectral attention mechanism is designed and implemented in the proposed Bi-LSTM network to emphasize the effective information and reduce the redundant information among spatial-spectral context of pixels, by which the performance of classification can be greatly improved. Experimental results over three benchmark HSIs, i.e., Salinas Valley, Pavia Centre, and Pavia University, demonstrate that our proposed Bi-LSTM obviously outperforms several state-of-the-art unidirectional RNN-based classification algorithms. Moreover, the proposed spatial-spectral attention mechanism can further improve the classification accuracy of our proposed Bi-LSTM algorithm by effectively weighting spatial and spectral context of pixels. The source code of the proposed Bi-LSTM algorithm is available at https://github.com/MeiShaohui/Attention-based-Bidirectional-LSTM-Network.

...read moreread less

Journal Article•DOI•

Highly sensitive active pixel image sensor array driven by large-area bilayer MoS2 transistor circuitry

[...]

Seongin Hong¹, Nicolo Zagni², Sooho Choo¹, Na Liu¹, Seungho Baek¹, Arindam Bala¹, Hocheon Yoo³, Byung Ha Kang⁴, Hyun Jae Kim⁴, Hyung-Joong Yun, Muhammad A. Alam⁵, Sunkook Kim¹ - Show less +8 more•Institutions (5)

Sungkyunkwan University¹, University of Modena and Reggio Emilia², Gachon University³, Yonsei University⁴, Purdue University⁵

11 Jun 2021-Nature Communications

TL;DR: In this article, a bilayer MoS2 phototransistor was used to synthesize an active pixel image sensor array for image sensing applications, which is composed of two-dimensional transition metal dichalcogenides (MoS2).

...read moreread less

Abstract: Various large-area growth methods for two-dimensional transition metal dichalcogenides have been developed recently for future electronic and photonic applications. However, they have not yet been employed for synthesizing active pixel image sensors. Here, we report on an active pixel image sensor array with a bilayer MoS2 film prepared via a two-step large-area growth method. The active pixel of image sensor is composed of 2D MoS2 switching transistors and 2D MoS2 phototransistors. The maximum photoresponsivity (Rph) of the bilayer MoS2 phototransistors in an 8 × 8 active pixel image sensor array is statistically measured as high as 119.16 A W−1. With the aid of computational modeling, we find that the main mechanism for the high Rph of the bilayer MoS2 phototransistor is a photo-gating effect by the holes trapped at subgap states. The image-sensing characteristics of the bilayer MoS2 active pixel image sensor array are successfully investigated using light stencil projection. Here, the authors report the realization of an active pixel image sensor array composed by 64 pairs of switching transistors and phototransistors, based on wafer-scale bilayer MoS2. The device exhibits sensitive photoresponse under RGB light illumination, showing the potential of 2D MoS2 for image sensing applications.

...read moreread less

Journal Article•DOI•

Research on image inpainting algorithm of improved total variation minimization method

[...]

Yuantao Chen¹, Haopeng Zhang¹, Linwu Liu¹, Jiajun Tao¹, Qian Zhang, Kai Yang, Runlong Xia, Jingbo Xie - Show less +4 more•Institutions (1)

Changsha University of Science and Technology¹

09 Jan 2021-Journal of Ambient Intelligence and Humanized Computing

TL;DR: In order to solve the issue mismatching and structure disconnecting in exemplar-based image inpainting, an image completion algorithm based on improved total variation minimization method had been proposed in the paper, refer as ETVM.

...read moreread less

Abstract: In order to solve the issue mismatching and structure disconnecting in exemplar-based image inpainting, an image completion algorithm based on improved total variation minimization method had been proposed in the paper, refer as ETVM. The structure of image had been extracted using improved total variation minimization method, and the known information of image is sufficiently used by existing methods. The robust filling mechanism can be achieved according to the direction of image structure and it has less noise than original image. The priority term had been redefined to eliminate the product effect and ensure data term had always effective. The priority of repairing patch and the best matching patch are determined by the similarity of the known information and the consistency of the unknown information in the repairing patch. The comparisons with cognitive computing image algorithms had been shown that the proposed method can ensure better selection of candidate image pixel to fill with, and it is achieved better global coherence of image completion than others. The inpainting results of noisy images show that the proposed method has good robustness and can also get good inpainting results for noisy images.

...read moreread less

Journal Article•DOI•

Unsupervised Pansharpening Based on Self-Attention Mechanism

[...]

Ying Qu¹, Razieh Kaviani Baghbaderani¹, Hairong Qi¹, Chiman Kwan•Institutions (1)

University of Tennessee¹

01 Apr 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The proposed unsupervised pansharpening method in a deep-learning framework is able to reconstruct sharper MSI of different types, with more details and less spectral distortion compared with the state-of-the-art.

...read moreread less

Abstract: Pansharpening is to fuse a multispectral image (MSI) of low-spatial-resolution (LR) but rich spectral characteristics with a panchromatic image (PAN) of high spatial resolution (HR) but poor spectral characteristics. Traditional methods usually inject the extracted high-frequency details from PAN into the upsampled MSI. Recent deep learning endeavors are mostly supervised assuming that the HR MSI is available, which is unrealistic especially for satellite images. Nonetheless, these methods could not fully exploit the rich spectral characteristics in the MSI. Due to the wide existence of mixed pixels in satellite images where each pixel tends to cover more than one constituent material, pansharpening at the subpixel level becomes essential. In this article, we propose an unsupervised pansharpening (UP) method in a deep-learning framework to address the abovementioned challenges based on the self-attention mechanism (SAM), referred to as UP-SAM. The contribution of this article is threefold. First, the SAM is proposed where the spatial varying detail extraction and injection functions are estimated according to the attention representations indicating spectral characteristics of the MSI with subpixel accuracy. Second, such attention representations are derived from mixed pixels with the proposed stacked attention network powered with a stick-breaking structure to meet the physical constraints of mixed pixel formulations. Third, the detail extraction and injection functions are spatial varying based on the attention representations, which largely improves the reconstruction accuracy. Extensive experimental results demonstrate that the proposed approach is able to reconstruct sharper MSI of different types, with more details and less spectral distortion compared with the state-of-the-art.

...read moreread less

Journal Article•DOI•

Spectral–Spatial Weighted Kernel Manifold Embedded Distribution Alignment for Remote Sensing Image Classification

[...]

Yanni Dong¹, Tianyang Liang¹, Yuxiang Zhang¹, Bo Du²•Institutions (2)

China University of Geosciences (Wuhan)¹, Wuhan University²

18 May 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: The proposed SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics and utilizes the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios.

...read moreread less

Abstract: Feature distortions of data are a typical problem in remote sensing image classification, especially in the area of transfer learning. In addition, many transfer learning-based methods only focus on spectral information and fail to utilize spatial information of remote sensing images. To tackle these problems, we propose spectral–spatial weighted kernel manifold embedded distribution alignment (SSWK-MEDA) for remote sensing image classification. The proposed method applies a novel spatial information filter to effectively use similarity between nearby sample pixels and avoid the influence of nonsample pixels. Then, a complex kernel combining spatial kernel and spectral kernel with different weights is constructed to adaptively balance the relative importance of spectral and spatial information of the remote sensing image. Finally, we utilize the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios. SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics. Extensive experiments have demonstrated that the proposed method is more effective than several state-of-the-art methods.

...read moreread less

Collapse