scispace - formally typeset
Search or ask a question

Showing papers on "Image resolution published in 2021"


Journal ArticleDOI
TL;DR: This work proposes a novel recurrent network to reconstruct videos from a stream of events, and trains it on a large amount of simulated event data, and shows that off-the-shelf computer vision algorithms can be applied to the reconstructions and that this strategy consistently outperforms algorithms that were specifically designed for event data.
Abstract: Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous “events” instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( $>\!20\%$ > 20 % ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( $>5,000$ > 5 , 000 frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.

164 citations


Journal ArticleDOI
TL;DR: Whether image classification performance drops with each kind of degradation, whether this drop can be avoided by including degraded images into training, and whether existing computer vision algorithms that attempt to remove such degradations can help improve the image classificationperformance are studied.
Abstract: Just like many other topics in computer vision, image classification has achieved significant progress recently by using deep learning neural networks, especially the Convolutional Neural Networks (CNNs). Most of the existing works focused on classifying very clear natural images, evidenced by the widely used image databases, such as Caltech-256, PASCAL VOCs, and ImageNet. However, in many real applications, the acquired images may contain certain degradations that lead to various kinds of blurring, noise, and distortions. One important and interesting problem is the effect of such degradations to the performance of CNN-based image classification and whether degradation removal helps CNN-based image classification. More specifically, we wonder whether image classification performance drops with each kind of degradation, whether this drop can be avoided by including degraded images into training, and whether existing computer vision algorithms that attempt to remove such degradations can help improve the image classification performance. In this article, we empirically study those problems for nine kinds of degraded images—hazy images, motion-blurred images, fish-eye images, underwater images, low resolution images, salt-and-peppered images, images with white Gaussian noise, Gaussian-blurred images, and out-of-focus images. We expect this article can draw more interests from the community to study the classification of degraded images.

123 citations


Journal ArticleDOI
TL;DR: In this article, a dynamic curriculum learning strategy is proposed to progressively learn the object detectors by feeding training images with increasing difficulty that matches current detection ability, and an effective instance-aware focal loss function for detector learning is developed to alleviate the influence of positive instances of bad quality and meanwhile enhance the discriminative information of class-specific hard negative instances.
Abstract: In this article, we focus on tackling the problem of weakly supervised object detection from high spatial resolution remote sensing images, which aims to learn detectors with only image-level annotations, i.e., without object location information during the training stage. Although promising results have been achieved, most approaches often fail to provide high-quality initial samples and thus are difficult to obtain optimal object detectors. To address this challenge, a dynamic curriculum learning strategy is proposed to progressively learn the object detectors by feeding training images with increasing difficulty that matches current detection ability. To this end, an entropy-based criterion is firstly designed to evaluate the difficulty for localizing objects in images. Then, an initial curriculum that ranks training images in ascending order of difficulty is generated, in which easy images are selected to provide reliable instances for learning object detectors. With the gained stronger detection ability, the subsequent order in the curriculum for retraining detectors is accordingly adjusted by promoting difficult images as easy ones. In such way, the detectors can be well prepared by training on easy images for learning from more difficult ones and thus gradually improve their detection ability more effectively. Moreover, an effective instance-aware focal loss function for detector learning is developed to alleviate the influence of positive instances of bad quality and meanwhile enhance the discriminative information of class-specific hard negative instances. Comprehensive experiments and comparisons with state-of-the-art methods on two publicly available data sets demonstrate the superiority of our proposed method.

108 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, a simple depth merging network is proposed to take advantage of the duality between a consistent scene structure and high-frequency details, and a patch selection method is used to add local details to the final result.
Abstract: Neural networks have shown great abilities in estimating depth from a single image. However, the inferred depth maps are well below one-megapixel resolution and often lack fine-grained details, which limits their practicality. Our method builds on our analysis on how the input resolution and the scene structure affects depth estimation performance. We demonstrate that there is a trade-off between a consistent scene structure and the high-frequency details, and merge low- and high-resolution estimations to take advantage of this duality using a simple depth merging network. We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details to the final result. We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail using a pre-trained model.

95 citations


Journal ArticleDOI
TL;DR: In this article, a joint sparse and low-rank learning (J-SLoL) method was proposed to spectrally enhance multispectral (MS) images by jointly learning low rank HS-MS dictionary pairs from overlapped regions.
Abstract: Extensive attention has been widely paid to enhance the spatial resolution of hyperspectral (HS) images with the aid of multispectral (MS) images in remote sensing. However, the ability in the fusion of HS and MS images remains to be improved, particularly in large-scale scenes, due to the limited acquisition of HS images. Alternatively, we super-resolve MS images in the spectral domain by the means of partially overlapped HS images, yielding a novel and promising topic: spectral superresolution (SSR) of MS imagery. This is challenging and less investigated task due to its high ill-posedness in inverse imaging. To this end, we develop a simple but effective method, called joint sparse and low-rank learning (J-SLoL), to spectrally enhance MS images by jointly learning low-rank HS–MS dictionary pairs from overlapped regions. J-SLoL infers and recovers the unknown HS signals over a larger coverage by sparse coding on the learned dictionary pair. Furthermore, we validate the SSR performance on three HS–MS data sets (two for classification and one for unmixing) in terms of reconstruction, classification, and unmixing by comparing with several existing state-of-the-art baselines, showing the effectiveness and superiority of the proposed J-SLoL algorithm. Furthermore, the codes and data sets will be available at https://github.com/danfenghong/IEEE_TGRS_J-SLoL , contributing to the remote sensing (RS) community.

94 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a simple and efficient architecture of deep convolutional neural networks to fuse a low-resolution HSI (LR-HSI) and a high-resolution multispectral image (HR-MSI), yielding a high resolution HSI.
Abstract: Hyperspectral images (HSIs) are of crucial importance in order to better understand features from a large number of spectral channels. Restricted by its inner imaging mechanism, the spatial resolution is often limited for HSIs. To alleviate this issue, in this work, we propose a simple and efficient architecture of deep convolutional neural networks to fuse a low-resolution HSI (LR-HSI) and a high-resolution multispectral image (HR-MSI), yielding a high-resolution HSI (HR-HSI). The network is designed to preserve both spatial and spectral information thanks to a new architecture based on: 1) the use of the LR-HSI at the HR-MSI's scale to get an output with satisfied spectral preservation and 2) the application of the attention and pixelShuffle modules to extract information, aiming to output high-quality spatial details. Finally, a plain mean squared error loss function is used to measure the performance during the training. Extensive experiments demonstrate that the proposed network architecture achieves the best performance (both qualitatively and quantitatively) compared with recent state-of-the-art HSI super-resolution approaches. Moreover, other significant advantages can be pointed out by the use of the proposed approach, such as a better network generalization ability, a limited computational burden, and the robustness with respect to the number of training samples. Please find the source code and pretrained models from https://liangjiandeng.github.io/Projects_Res/HSRnet_2021tnnls.html.

83 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, two optimization problems regularized by the deep prior are formulated, and they are separately responsible for the generative models for panchromatic images and low-resolution multispectral images.
Abstract: Pan-sharpening is an important technique for remote sensing imaging systems to obtain high resolution multi-spectral images. Recently, deep learning has become the most popular tool for pan-sharpening. This paper develops a model-based deep pan-sharpening approach. Specifically, two optimization problems regularized by the deep prior are formulated, and they are separately responsible for the generative models for panchromatic images and low resolution multispectral images. Then, the two problems are solved by a gradient projection algorithm, and the iterative steps are generalized into two network blocks. By alternatively stacking the two blocks, a novel network, called gradient projection based pan-sharpening neural network, is constructed. The experimental results on different kinds of satellite datasets demonstrate that the new network out-performs state-of-the-art methods both visually and quantitatively. The codes are available at https://github.com/xsxjtu/GPPNN.

82 citations


Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate an instrumental blurring of under 20 picometers by solving the multiple scattering problem and overcoming the aberrations of the electron probe using electron ptychography to recover a linear phase response in thick samples.
Abstract: Transmission electron microscopes use electrons with wavelengths of a few picometers, potentially capable of imaging individual atoms in solids at a resolution ultimately set by the intrinsic size of an atom. Unfortunately, due to imperfections in the imaging lenses and multiple scattering of electrons in the sample, the image resolution reached is 3 to 10 times worse. Here, by inversely solving the multiple scattering problem and overcoming the aberrations of the electron probe using electron ptychography to recover a linear phase response in thick samples, we demonstrate an instrumental blurring of under 20 picometers. The widths of atomic columns in the measured electrostatic potential are now no longer limited by the imaging system, but instead by the thermal fluctuations of the atoms. We also demonstrate that electron ptychography can potentially reach a sub-nanometer depth resolution and locate embedded atomic dopants in all three dimensions with only a single projection measurement.

82 citations


Journal ArticleDOI
TL;DR: An innovative mixed high-order attention network (MHAN) for remote sensing SR that not only obtains better accuracy than the state-of-the-art methods but also shows the superiority in terms of running time and GPU cost.
Abstract: Recently, remote sensing images have become increasingly popular in a number of tasks, such as environmental monitoring. However, the observed images from satellite sensors often suffer from low-resolution (LR), making it difficult to meet the requirements for further analysis. Super-resolution (SR) aims to increase the image resolution while providing finer spatial details, which perfectly remedies the weakness of satellite images. Therefore, in this article, we propose an innovative mixed high-order attention network (MHAN) for remote sensing SR. It comprises two components: a feature extraction network for feature extraction, and a feature refinement network with high-order attention (HOA) mechanism for detail restoration. In the feature extraction network, we replace the elementwise addition with weighted channelwise concatenation in all skip connections, which greatly facilitates the information flow. In the feature refinement network, rather than exploring the first-order statistics (spatial or channel attention), we introduce the HOA module to restore the missing details. Finally, to fully exploit hierarchical features, we introduce the frequency-aware connection to bridge the feature extraction and feature refinement networks. Experiments on two widely used remote sensing image data sets demonstrate that our MHAN not only obtains better accuracy than the state-of-the-art methods but also shows the superiority in terms of running time and GPU cost. Code is available at https://github.com/ZhangDY827/MHAN .

79 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe how the quality of synthetic pictures created by DCGAN, LSGAN, and WGAN is determined and combine synthetic images with original images to enhance datasets and verify the effectiveness of synthetic datasets.
Abstract: Convolutional Neural Networks (CNN) achieves perfection in traffic sign identification with enough annotated training data. The dataset determines the quality of the complete visual system based on CNN. Unfortunately, databases for traffic signs from the majority of the world’s nations are few. In this scenario, Generative Adversarial Networks (GAN) may be employed to produce more realistic and varied training pictures to supplement the actual arrangement of images. The purpose of this research is to describe how the quality of synthetic pictures created by DCGAN, LSGAN, and WGAN is determined. Our work combines synthetic images with original images to enhance datasets and verify the effectiveness of synthetic datasets. We use different numbers and sizes of images for training. Likewise, the Structural Similarity Index (SSIM) and Mean Square Error (MSE) were employed to assess picture quality. Our study quantifies the SSIM difference between the synthetic and actual images. When additional images are used for training, the synthetic image exhibits a high degree of resemblance to the genuine image. The highest SSIM value was achieved when using 200 total images as input and $32\times 32$ image size. Further, we augment the original picture dataset with synthetic pictures and compare the original image model to the synthesis image model. For this experiment, we are using the latest iterations of Yolo, Yolo V3, and Yolo V4. After mixing the real image with the synthesized image produced by LSGAN, the recognition performance has been improved, achieving an accuracy of 84.9% on Yolo V3 and an accuracy of 89.33% on Yolo V4.

78 citations


Journal ArticleDOI
TL;DR: A remote sensing image spatiotemporal fusion method using a GAN (STFGAN), which adopts a two-stage framework with an end-to-end image fusion GAN for each stage, and significantly improves the accuracy of phenological change and land-cover-type change prediction.
Abstract: Due to technological limitations and budget constraints, spatiotemporal fusion is considered a promising way to deal with the tradeoff between the temporal and spatial resolutions of remote sensing images. Furthermore, the generative adversarial network (GAN) has shown its capability in a variety of applications. This article presents a remote sensing image spatiotemporal fusion method using a GAN (STFGAN), which adopts a two-stage framework with an end-to-end image fusion GAN (IFGAN) for each stage. The IFGAN contains a generator and a discriminator in competition with each other under the guidance of the optimization function. Considering the huge spatial resolution gap between the high-spatial, low-temporal (HSLT) resolution Landsat imagery and the corresponding low-spatial, high-temporal (LSHT) resolution MODIS imagery, a feature-level fusion strategy is adopted. Specifically, for the generator, we first super-resolve the MODIS images while also extracting the high-frequency features of the Landsat images. Finally, we integrate the features from the MODIS and Landsat images. STFGAN is able to learn an end-to-end mapping between the Landsat–MODIS image pairs and predicts the Landsat-like image for a prediction date by considering all the bands. STFGAN significantly improves the accuracy of phenological change and land-cover-type change prediction with the help of residual blocks and two prior Landsat–MODIS image pairs. To examine the performance of the proposed STFGAN method, experiments were conducted on three representative Landsat–MODIS data sets. The results clearly illustrate the effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: The results and numerical analysis have collectively demonstrated the robust performance of the model to reconstruct PAM images with as few as 2% of the original pixels, which can effectively shorten the imaging time without substantially sacrificing the image quality.
Abstract: One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed. In this study, we propose a novel application of deep learning principles to reconstruct undersampled PAM images and transcend the trade-off between spatial resolution and imaging speed. We compared various convolutional neural network (CNN) architectures, and selected a Fully Dense U-net (FD U-net) model that produced the best results. To mimic various undersampling conditions in practice, we artificially downsampled fully-sampled PAM images of mouse brain vasculature at different ratios. This allowed us to not only definitively establish the ground truth, but also train and test our deep learning model at various imaging conditions. Our results and numerical analysis have collectively demonstrated the robust performance of our model to reconstruct PAM images with as few as 2% of the original pixels, which can effectively shorten the imaging time without substantially sacrificing the image quality.

Journal ArticleDOI
TL;DR: SPARNet as mentioned in this paper introduces a spatial attention mechanism to the vanilla residual blocks to adaptively bootstrap features related to the key face structures and pay less attention to those less feature-rich regions.
Abstract: General image super-resolution techniques have difficulties in recovering detailed face structures when applying to low resolution face images. Recent deep learning based methods tailored for face images have achieved improved performance by jointly trained with additional task such as face parsing and landmark prediction. However, multi-task learning requires extra manually labeled data. Besides, most of the existing works can only generate relatively low resolution face images ( e.g ., $128\times 128$ ), and their applications are therefore limited. In this paper, we introduce a novel SPatial Attention Residual Network (SPARNet) built on our newly proposed Face Attention Units (FAUs) for face super-resolution. Specifically, we introduce a spatial attention mechanism to the vanilla residual blocks. This enables the convolutional layers to adaptively bootstrap features related to the key face structures and pay less attention to those less feature-rich regions. This makes the training more effective and efficient as the key face structures only account for a very small portion of the face image. Visualization of the attention maps shows that our spatial attention network can capture the key face structures well even for very low resolution faces ( e.g ., $16\times 16$ ). Quantitative comparisons on various kinds of metrics (including PSNR, SSIM, identity similarity, and landmark detection) demonstrate the superiority of our method over current state-of-the-arts. We further extend SPARNet with multi-scale discriminators, named as SPARNetHD, to produce high resolution results ( i.e ., $512\times 512$ ). We show that SPARNetHD trained with synthetic data can not only produce high quality and high resolution outputs for synthetically degraded face images, but also show good generalization ability to real world low quality face images. Codes are available at https://github.com/chaofengc/Face-SPARNet .

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper investigated the utility of the C-band, one-polarization radar quantitative precipitation estimation (QPE) and the Global Satellite Mapping of Precipitation (GSMaP) satellite QPE for the integrated prediction of floods and landslides in two hilly basins of southern Shaanxi Province of China.

Journal ArticleDOI
TL;DR: Sparse structured illumination microscopy (Sparse-SIM) as mentioned in this paper uses a priori knowledge about the sparsity and continuity of biological structures to develop a deconvolution algorithm that increases the resolution of live-cell super-resolution microscopes nearly twofold.
Abstract: A main determinant of the spatial resolution of live-cell super-resolution (SR) microscopes is the maximum photon flux that can be collected. To further increase the effective resolution for a given photon flux, we take advantage of a priori knowledge about the sparsity and continuity of biological structures to develop a deconvolution algorithm that increases the resolution of SR microscopes nearly twofold. Our method, sparse structured illumination microscopy (Sparse-SIM), achieves ~60-nm resolution at a frame rate of up to 564 Hz, allowing it to resolve intricate structures, including small vesicular fusion pores, ring-shaped nuclear pores formed by nucleoporins and relative movements of inner and outer mitochondrial membranes in live cells. Sparse deconvolution can also be used to increase the three-dimensional resolution of spinning-disc confocal-based SIM, even at low signal-to-noise ratios, which allows four-color, three-dimensional live-cell SR imaging at ~90-nm resolution. Overall, sparse deconvolution will be useful to increase the spatiotemporal resolution of live-cell fluorescence microscopy. The resolution of fluorescence microscopy is increased by incorporating prior information into deconvolution algorithms.

Proceedings ArticleDOI
07 Apr 2021
TL;DR: In this paper, a differentiable Top-K operator is proposed to select the most relevant parts of the input to efficiently process high-resolution images for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.
Abstract: Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.

Journal ArticleDOI
TL;DR: This work proposes a new perception consistency ultrasound image SR method, which only requires the LR ultrasound data and can ensure the re-degenerated image of the generated SR one to be consistent with the original LR image, and vice versa.
Abstract: Due to the limitations of sensors, the transmission medium, and the intrinsic properties of ultrasound, the quality of ultrasound imaging is always not ideal, especially its low spatial resolution. To remedy this situation, deep learning networks have been recently developed for ultrasound image super-resolution (SR) because of the powerful approximation capability. However, most current supervised SR methods are not suitable for ultrasound medical images because the medical image samples are always rare, and usually, there are no low-resolution (LR) and high-resolution (HR) training pairs in reality. In this work, based on self-supervision and cycle generative adversarial network, we propose a new perception consistency ultrasound image SR method, which only requires the LR ultrasound data and can ensure the re-degenerated image of the generated SR one to be consistent with the original LR image, and vice versa. We first generate the HR fathers and the LR sons of the test ultrasound LR image through image enhancement, and then make full use of the cycle loss of LR–SR–LR and HR–LR–SR and the adversarial characteristics of the discriminator to promote the generator to produce better perceptually consistent SR results. The evaluation of PSNR/IFC/SSIM, inference efficiency and visual effects under the benchmark CCA-US and CCA-US datasets illustrate our proposed approach is effective and superior to other state-of-the-art methods.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, the LandCover.ai dataset is used for semantic segmentation of land cover and land use in rural areas with a resolution of tens centimeters per pixel, manual fine labels, and highly publicly important environmental instances like buildings, woodlands, water, or roads.
Abstract: Monitoring of land cover and land use is crucial in natural resources management. Automatic visual mapping can carry enormous economic value for agriculture, forestry, or public administration. Satellite or aerial images combined with computer vision and deep learning enable precise assessment and can significantly speed up change detection. Aerial imagery usually provides images with much higher pixel resolution than satellite data allowing more detailed mapping. However, there is still a lack of aerial datasets made for the segmentation, covering rural areas with a resolution of tens centimeters per pixel, manual fine labels, and highly publicly important environmental instances like buildings, woods, water, or roads.Here we introduce LandCover.ai (Land Cover from Aerial Imagery) dataset for semantic segmentation. We collected images of 216.27 km2 rural areas across Poland, a country in Central Europe, 39.51 km2 with resolution 50 cm per pixel and 176.76 km2 with resolution 25 cm per pixel and manually fine annotated four following classes of objects: buildings, woodlands, water, and roads. Additionally, we report simple benchmark results, achieving 85.56% of mean intersection over union on the test set. It proves that the automatic mapping of land cover is possible with a relatively small, cost-efficient, RGB-only dataset. The dataset is publicly available at https://landcover.ai/

Journal ArticleDOI
TL;DR: In this paper, a three-dimensional residual channel attention network (RCAN) is proposed to restore noisy four-dimensional super-resolution data, enabling image capture of over tens of thousands of images (thousands of volumes) without apparent photobleaching.
Abstract: We demonstrate residual channel attention networks (RCAN) for the restoration and enhancement of volumetric time-lapse (four-dimensional) fluorescence microscopy data. First we modify RCAN to handle image volumes, showing that our network enables denoising competitive with three other state-of-the-art neural networks. We use RCAN to restore noisy four-dimensional super-resolution data, enabling image capture of over tens of thousands of images (thousands of volumes) without apparent photobleaching. Second, using simulations we show that RCAN enables resolution enhancement equivalent to, or better than, other networks. Third, we exploit RCAN for denoising and resolution improvement in confocal microscopy, enabling ~2.5-fold lateral resolution enhancement using stimulated emission depletion microscopy ground truth. Fourth, we develop methods to improve spatial resolution in structured illumination microscopy using expansion microscopy data as ground truth, achieving improvements of ~1.9-fold laterally and ~3.6-fold axially. Finally, we characterize the limits of denoising and resolution enhancement, suggesting practical benchmarks for evaluation and further enhancement of network performance. Three-dimensional residual channel attention networks (RCAN) enable improved image denoising and resolution enhancement on volumetric time-lapse fluorescence microscopy data, allowing longitudinal super-resolution imaging of living samples.

Journal ArticleDOI
TL;DR: A novel unsupervised deep-learning-based CD method that can effectively model contextual information and handle the large number of bands in multispectral HR images is presented.
Abstract: To overcome the limited capability of most state-of-the-art change detection (CD) methods in modeling spatial context of multispectral high spatial resolution (HR) images and exploiting all spectral bands jointly, this letter presents a novel unsupervised deep-learning-based CD method that can effectively model contextual information and handle the large number of bands in multispectral HR images. This is achieved by exploiting all spectral bands after grouping them into spectral-dedicated band groups. To eliminate the necessity of multitemporal training data, the proposed method exploits a data set targeted for image classification to train spectral-dedicated Auxiliary Classifier Generative Adversarial Networks (ACGANs). They are used to obtain pixelwise deep change hypervector from multitemporal images. Each feature in deep change hypervector is analyzed based on the magnitude to identify changed pixels. An ensemble decision fusion strategy is used to combine change information from different features. Experimental results on the urban, Alpine, and agricultural Sentinel-2 data sets confirm the effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: This article introduces the conditional generative adversarial network (CGAN) and switchable normalization technique into the spatiotemporal fusion problem and proposes a flexible deep network named the GAN-based SpatioTemporal Fusion Model (GAN-STFM) to reduce the number of model inputs and broke the time restriction on reference image selection.
Abstract: Due to the tradeoff between spatial and temporal resolutions of remote sensing images, spatiotemporal fusion models were proposed to synthesize the high spatiotemporal image series. Currently, spatiotemporal fusion models usually employ one coarse-resolution image acquired on a prediction date and at least another pair of coarse-fine resolution images close to the prediction time as references to derive the fine-resolution image on the prediction date. After years of development, the model accuracy has gained a certain improvement, but nearly, all the models require at least three image inputs and rigid time constraints must be applied to the references to guarantee the fusion accuracy. However, it is not always that easy to collect adequate data pairs for fine-resolution image series simulation in practice because of the bad weather condition or the time inconsistency between the coarse-fine resolution data sources, which causes some difficulties in the actual application. This article introduces the conditional generative adversarial network (CGAN) and switchable normalization technique into the spatiotemporal fusion problem and proposes a flexible deep network named the GAN-based SpatioTemporal Fusion Model (GAN-STFM) to reduce the number of model inputs and broke the time restriction on reference image selection. The GAN-STFM just needs a coarse-resolution image on the prediction date and another fine-resolution reference image at an arbitrary time in the same area for model inputs. As far as we know, this is the first spatiotemporal fusion model that requires only two images as model inputs and puts no restriction on the acquisition time of references. Even so, the GAN-STFM performs on par or better than other classical fusion models in the experiments. With this improvement, the data preparation for spatiotemporal fusion tends to be much easier than before, showing a promising perspective for practical applications.

Journal ArticleDOI
TL;DR: A cross-resolution difference learning method is proposed to detect changes from multitemporal images in the originally different resolutions without resizing operations and demonstrates the effectiveness of this method for detecting changes from different resolution images.
Abstract: Change detection (CD) aims to identify the differences between multitemporal images acquired over the same geographical area at different times. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention of researchers. Multitemporal images tend to have different resolutions as they are usually captured at different times with different sensor properties. It is difficult to directly obtain one pixelwise change map for two images with different resolutions, so current methods usually resize multitemporal images to a unified size. However, resizing operations change the original information of pixels, which limits the final CD performance. This article aims to detect changes from multitemporal images in the originally different resolutions without resizing operations. To achieve this, a cross-resolution difference learning method is proposed. Specifically, two cross-resolution pixelwise difference maps are generated for the two different resolution images and fused to produce the final change map. First, the two input images are segmented into individual homogeneous regions separately due to different resolutions. Second, each pixelwise difference map is produced according to two measure distances, the mutual information distance and the deep feature distance, between image regions in which the pixel lies. Third, the final binary change map is generated by fusing and binarizing the two cross-resolution difference maps. Extensive experiments on four datasets demonstrate the effectiveness of the proposed method for detecting changes from different resolution images.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a high-resolution land surface temperature (LST) with spatial continuity and high spatiotemporal resolution (hereafter referred to as high resolution ) for studying the thermal environment.
Abstract: Remotely sensed land surface temperature (LST) with spatial continuity and high spatiotemporal resolution (hereafter referred to as high resolution ) is a crucial parameter for studying the thermal environment and has important applications in many fields. However, adverse atmospheric conditions, sensor malfunctioning, and scanning gaps between orbits frequently introduce spatial discontinuities into satellite-retrieved LST products. For a single sensor, a tradeoff occurs between temporal and spatial resolutions; therefore, it is almost impossible to obtain images in high resolution.

Journal ArticleDOI
Yi Xiao1, Xin Su1, Qiangqiang Yuan1, Denghong Liu1, Huanfeng Shen1, Liangpei Zhang1 
TL;DR: A novel fusion strategy of temporal grouping projection and an accurate alignment module are proposed for satellite VSR, which can reduce the complexity of projection and make the spatial features of reference frames play a continuous guiding role in spatial-temporal information fusion.
Abstract: As a new earth observation tool, satellite video has been widely used in remote-sensing field for dynamic analysis. Video super-resolution (VSR) technique has thus attracted increasing attention due to its improvement to spatial resolution of satellite video. However, the difficulty of remote-sensing image alignment and the low efficiency of spatial-temporal information fusion make poor generalization of the conventional VSR methods applied to satellite videos. In this article, a novel fusion strategy of temporal grouping projection and an accurate alignment module are proposed for satellite VSR. First, we propose a deformable convolution alignment module with a multiscale residual block to alleviate the alignment difficulties caused by scarce motion and various scales of moving objects in remote-sensing images. Second, a temporal grouping projection fusion strategy is proposed, which can reduce the complexity of projection and make the spatial features of reference frames play a continuous guiding role in spatial-temporal information fusion. Finally, a temporal attention module is designed to adaptively learn the different contributions of temporal information extracted from each group. Extensive experiments on Jilin-1 satellite video demonstrate that our method is superior to current state-of-the-art VSR methods.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an unsupervised deep image stitching framework consisting of two stages, which consists of an ablation-based loss and a transformer layer to warp the input images in the stitching-domain space.
Abstract: Traditional feature-based image stitching technologies rely heavily on feature detection quality, often failing to stitch images with few features or low resolution. The learning-based image stitching solutions are rarely studied due to the lack of labeled data, making the supervised methods unreliable. To address the above limitations, we propose an unsupervised deep image stitching framework consisting of two stages: unsupervised coarse image alignment and unsupervised image reconstruction. In the first stage, we design an ablation-based loss to constrain an unsupervised homography network, which is more suitable for large-baseline scenes. Moreover, a transformer layer is introduced to warp the input images in the stitching-domain space. In the second stage, motivated by the insight that the misalignments in pixel-level can be eliminated to a certain extent in feature-level, we design an unsupervised image reconstruction network to eliminate the artifacts from features to pixels. Specifically, the reconstruction network can be implemented by a low-resolution deformation branch and a high-resolution refined branch, learning the deformation rules of image stitching and enhancing the resolution simultaneously. To establish an evaluation benchmark and train the learning framework, a comprehensive real-world image dataset for unsupervised deep image stitching is presented and released. Extensive experiments well demonstrate the superiority of our method over other state-of-the-art solutions. Even compared with the supervised solutions, our image stitching quality is still preferred by users.

Journal ArticleDOI
TL;DR: This study proposes a variational pansharpening method by exploiting cartoon-texture similarities by incorporating a data fidelity term for preserving the spectral information on the basis that the down-sampled fused MS image is consistent with the MS image, and formulate panshARPening as an optimization problem and solve it efficiently using the alternative direction multiplier method.
Abstract: Pansharpening aims to fuse a multispectral (MS) image with low spatial resolution and a panchromatic (PAN) image with a high-spatial resolution to produce an image with both high spectral and high spatial resolution. In this study, we propose a variational pansharpening method by exploiting cartoon-texture similarities. After decomposition of the PAN image, the cartoon component always contains the global structure information, while the texture component includes the locally patterned information. This enables that the fused high-spatial resolution MS image can preserve the global and local spatial details (e.g., high-order information) well after leveraging the similarities of cartoon and texture components from PAN and MS images. To explore such cartoon-texture similarities, we describe cartoon similarity as gradient sparsity, formulated as a reweighted total variation term. Meanwhile, we use group low-rank constraint for texture similarity that is presented as repetitive texture patterns. By incorporating a data fidelity term for preserving the spectral information on the basis that the down-sampled fused MS image is consistent with the MS image, we further formulate pansharpening as an optimization problem and solve it efficiently using the alternative direction multiplier method. Extensive experiments have been conducted on a series of satellite data sets, and we also carry out a simulated vegetation coverage change experiment to verify the efficiency of the proposed method in remote sensing. The qualitative and quantitative results demonstrate that our method outperforms the state-of-the-art pansharpening methods in terms of both visual effect and objective metrics.

Journal ArticleDOI
TL;DR: In this paper, a 2-stage hierarchical MIMO-SAR processing workflow is proposed to reduce the computation load while preserving image resolution, and a radar odometry algorithm that estimates the trajectory of ego-radar is integrated to enable coherent processing over the synthetic aperture.
Abstract: Millimeter-wave radars are being increasingly integrated into commercial vehicles to support advanced driver-assistance system features. A key shortcoming for present-day vehicular radar imaging is poor azimuth resolution (for side-looking operation) due to the form factor limits on antenna size and placement. In this paper, we propose a solution via a new multiple-input and multiple-output synthetic aperture radar (MIMO-SAR) imaging technique, that applies coherent SAR principles to vehicular MIMO radar to improve the side-view (angular) resolution. The proposed 2-stage hierarchical MIMO-SAR processing workflow drastically reduces the computation load while preserving image resolution. To enable coherent processing over the synthetic aperture, we integrate a radar odometry algorithm that estimates the trajectory of ego-radar. The MIMO-SAR algorithm is validated by both simulations and real experiment data collected by a vehicle-mounted radar platform.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors designed a bidirectional 3D quasi-recurrent neural network for hyperspectral image (HSI) spatial super-resolution with arbitrary number of bands.
Abstract: Hyperspectral imaging is unable to acquire images with high resolution in both spatial and spectral dimensions yet, due to physical hardware limitations. It can only produce low spatial resolution images in most cases and thus hyperspectral image (HSI) spatial super-resolution is important. Recently, deep learning-based methods for HSI spatial super-resolution have been actively exploited. However, existing methods do not focus on structural spatial-spectral correlation and global correlation along spectra, which cannot fully exploit useful information for super-resolution. Also, some of the methods are straightforward extension of RGB super-resolution methods, which have fixed number of spectral channels and cannot be generally applied to hyperspectral images whose number of channels varies. Furthermore, unlike RGB images, existing HSI datasets are small and limit the performance of learning-based methods. In this article, we design a bidirectional 3D quasi-recurrent neural network for HSI super-resolution with arbitrary number of bands. Specifically, we introduce a core unit that contains a 3D convolutional module and a bidirectional quasi-recurrent pooling module to effectively extract structural spatial-spectral correlation and global correlation along spectra, respectively. By combining domain knowledge of HSI with a novel pretraining strategy, our method can be well generalized to remote sensing HSI datasets with limited number of training data. Extensive evaluations and comparisons on HSI super-resolution demonstrate improvements over state-of-the-art methods, in terms of both restoration accuracy and visual quality.

Journal ArticleDOI
TL;DR: This work proposes a novel method to directly train convolutional neural networks using any input image size end-to-end, and shows a proof of concept using images of up to 66-megapixels (8192×8192), saving approximately 50GB of memory per image.
Abstract: Due to memory constraints on current hardware, most convolution neural networks (CNN) are trained on sub-megapixel images. For example, most popular datasets in computer vision contain images much less than a megapixel in size (0.09MP for ImageNet and 0.001MP for CIFAR-10). In some domains such as medical imaging, multi-megapixel images are needed to identify the presence of disease accurately. We propose a novel method to directly train CNNs using any input image size end-to-end. This method exploits the locality of most operations in modern CNNs by performing the forward and backward pass on smaller tiles of the image. In this work, we show a proof of concept using images of up to 66-megapixels (8192x8192), saving approximately 50GB of memory per image. Using two public challenge datasets, we demonstrate that CNNs can learn to extract relevant information from these large images and benefit from increasing resolution. We improved the area under the receiver-operating characteristic curve from 0.580 (4MP) to 0.706 (66MP) for metastasis detection in breast cancer (CAMELYON17). We also obtained a Spearman correlation metric approaching state-of-the-art performance on the TUPAC16 dataset, from 0.485 (1MP) to 0.570 (16MP). The code to reproduce a subset of the experiments is available at https://github.com/DIAGNijmegen/StreamingCNN.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed band divide-and-conquer framework (BDCF) has superior performance in both high spectral fidelity and sharp spatial details, and it obtained competitive fusion behaviors compared with other state-of-the-art methods.
Abstract: The nonoverlapped spectrum range between low spatial resolution (LR) hyperspectral (HS) and high spatial resolution (HR) multispectral (MS) images has been a fundamental but challenging problem for MS/HS fusion. The spectrum of HS data is generally 400-2500 nm, and the spectrum of MS data is generally 400-900 nm; how to obtain the high-fidelity HR HS fused image within the whole spectrum of 400-2500 nm? In this article, we proposed a band divide-and-conquer framework (BDCF) to solve the problem, by comprehensively considering spectral fidelity, spatial enhancement, and computational efficiency. First, the spectral bands of HS were divided into overlapped and nonoverlapped bands according to the spectral response between HS and MS. Then, a novel improved component substitution (CS)-based method by combing neural network was proposed to fuse the overlapped bands of LR HS. Then, a mapping-based method with the neural network was presented to construct the complicated nonlinear relationship between overlapped and nonoverlapped bands of the original LR HS data. The trained network was mapped to the fused overlapped HR HS bands to estimate the nonoverlapped HR HS bands. Experimental results on two simulated data sets and two realistic data sets of Gaofen (GF)-5 LR HS, GF-1 MS, and Sentinel-2A MS show that the proposed BDCF has superior performance in both high spectral fidelity and sharp spatial details, and it obtained competitive fusion behaviors compared with other state-of-the-art methods. Moreover, BDCF has relatively higher computational efficiency than optimal solution-based methods and deep learning-based fusion methods.