scispace - formally typeset
Search or ask a question

Showing papers on "Upsampling published in 2022"


Journal ArticleDOI
19 Jan 2022-Symmetry
TL;DR: A joint model of a fast guided filter and a matched filter is suggested for enhancing abnormal retinal images containing low vessel contrasts, shown to be an established technique for vessel extraction in diabetic retinopathy.
Abstract: Fundus images have been established as an important factor in analyzing and recognizing many cardiovascular and ophthalmological diseases. Consequently, precise segmentation of blood using computer vision is vital in the recognition of ailments. Although clinicians have adopted computer-aided diagnostics (CAD) in day-to-day diagnosis, it is still quite difficult to conduct fully automated analysis based exclusively on information contained in fundus images. In fundus image applications, one of the methods for conducting an automatic analysis is to ascertain symmetry/asymmetry details from corresponding areas of the retina and investigate their association with positive clinical findings. In the field of diabetic retinopathy, matched filters have been shown to be an established technique for vessel extraction. However, there is reduced efficiency in matched filters due to noisy images. In this work, a joint model of a fast guided filter and a matched filter is suggested for enhancing abnormal retinal images containing low vessel contrasts. Extracting all information from an image correctly is one of the important factors in the process of image enhancement. A guided filter has an excellent property in edge-preserving, but still tends to suffer from halo artifacts near the edges. Fast guided filtering is a technique that subsamples the filtering input image and the guidance image and calculates the local linear coefficients for upsampling. In short, the proposed technique applies a fast guided filter and a matched filter for attaining improved performance measures for vessel extraction. The recommended technique was assessed on DRIVE and CHASE_DB1 datasets and achieved accuracies of 0.9613 and 0.960, respectively, both of which are higher than the accuracy of the original matched filter and other suggested vessel segmentation algorithms.

42 citations


Journal ArticleDOI
TL;DR: SwinBTS, a new 3D medical picture segmentation approach, which combines a transformer, convolutional neural network, and encoder–decoder structure to define the 3D brain tumor semantic segmentation job as a sequence-to-sequence prediction challenge in this research is proposed.
Abstract: Brain tumor semantic segmentation is a critical medical image processing work, which aids clinicians in diagnosing patients and determining the extent of lesions. Convolutional neural networks (CNNs) have demonstrated exceptional performance in computer vision tasks in recent years. For 3D medical image tasks, deep convolutional neural networks based on an encoder–decoder structure and skip-connection have been frequently used. However, CNNs have the drawback of being unable to learn global and remote semantic information well. On the other hand, the transformer has recently found success in natural language processing and computer vision as a result of its usage of a self-attention mechanism for global information modeling. For demanding prediction tasks, such as 3D medical picture segmentation, local and global characteristics are critical. We propose SwinBTS, a new 3D medical picture segmentation approach, which combines a transformer, convolutional neural network, and encoder–decoder structure to define the 3D brain tumor semantic segmentation job as a sequence-to-sequence prediction challenge in this research. To extract contextual data, the 3D Swin Transformer is utilized as the network’s encoder and decoder, and convolutional operations are employed for upsampling and downsampling. Finally, we achieve segmentation results using an improved Transformer module that we built for increasing detail feature extraction. Extensive experimental results on the BraTS 2019, BraTS 2020, and BraTS 2021 datasets reveal that SwinBTS outperforms state-of-the-art 3D algorithms for brain tumor segmentation on 3D MRI scanned images.

36 citations


Journal ArticleDOI
TL;DR: A multimodal fusion module to explore the similarities and differences between features from the two information modalities for adequate fusion and introduces hierarchical feature interactions to mitigate the adverse effects of downsampling.
Abstract: Semantic segmentation of remote sensing images has received increasing attention in recent years; however, using a single imaging modality limits the segmentation performance. Thus, digital surface models have been integrated into semantic segmentation to improve performance. Nevertheless, existing methods based on neural networks simply combine data from the two modalities, mostly neglecting the similarities and differences between multimodal features. Consequently, the complementarity between multimodal features cannot be exploited, and excess noise is introduced during feature processing. To solve these problems, we propose a multimodal fusion module to explore the similarities and differences between features from the two information modalities for adequate fusion. In addition, although downsampling operations such as pooling and striding can improve the feature representativeness, they discard spatial details and often lead to segmentation errors. Thus, we introduce hierarchical feature interactions to mitigate the adverse effects of downsampling and introduce a two-way interactive pyramid pooling module to extract multiscale context features for guiding feature fusion. Extensive experiments performed on two benchmark datasets show that the proposed network integrating our novel modules substantially outperforms state-of-the-art semantic segmentation methods. The code and results can be found at https://github.com/NIT-JJH/CIMFNet.

35 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a multichannel feature fusion lozenge network (MLNet), which is a three-sided network composed of three branches: one branch uses different levels of feature indexes to sample to maintain the integrity of high-frequency information; one branch focuses on contextual information and strengthen the compatibility of information within and between classes; and the last branch uses feature integration to filter redundant information based on multiresolution segmentation to extract key features.
Abstract: The use of remote sensing images for land cover analysis has broad prospects. At present, the resolution of aerial remote sensing images is getting higher and higher, and the span of time and space is getting larger and larger, therefore segmenting target objects enconter great difficulties. Convolutional neural networks are widely used in many image semantic segmentation tasks, but existing models often use simple accumulation of various convolutional layers or the direct stacking of interfeature reuse of up- and downsampling, the network very heavy. To improve the accuracy of land cover segmentation, we propose a multichannel feature fusion lozenge network. The multichannel feature fusion lozenge network (MLNet) is a three-sided network composed of three branches: one branch uses different levels of feature indexes to sample to maintain the integrity of high-frequency information; one branch focuses on contextual information and strengthens the compatibility of information within and between classes; and the last branch uses feature integration to filter redundant information based on multiresolution segmentation to extract key features. Compared with FCN, UNet, PSP, and other serial single road computing models, the MLNet, which performs feature fusion after three-way parallelism structure, can significantly improve the accuracy with only small increase in complexity. Experimental results show that the average accuracy of 85.30% is obtained on the land cover data set, which is much higher than that of 82.98% of FCN, 81.87% of UNet, 77.52% of SegNet, and 83.09% of EspNet, which proves the effectiveness of the model.

33 citations


Journal ArticleDOI
Qi Feng1
TL;DR: In this paper , a channel enhancement feature pyramid network (CE-FPN) is proposed to solve the channel reduction problem in FPN-based methods, which is inspired by sub-pixel convolution.
Abstract: Feature pyramid network (FPN) has been an efficient framework to extract multi-scale features in object detection. However, current FPN-based methods mostly suffer from the intrinsic flaw of channel reduction, which brings about the loss of semantical information. And the miscellaneous feature maps may cause serious aliasing effects. In this paper, we present a novel channel enhancement feature pyramid network (CE-FPN) to alleviate these problems. Specifically, inspired by sub-pixel convolution, we propose sub-pixel skip fusion (SSF) to perform both channel enhancement and upsampling. Instead of the original 1 × 1 convolution and linear upsampling, it mitigates the information loss due to channel reduction. Then we propose sub-pixel context enhancement (SCE) for extracting stronger feature representations, which is superior to other context methods due to the utilization of rich channel information by sub-pixel convolution. Furthermore, we introduce a channel attention guided module (CAG) to optimize the final integrated features on each level. It alleviates the aliasing effect only with a few computational burdens. We evaluate our approaches on Pascal VOC and MS COCO benchmark. Extensive experiments show that CE-FPN achieves competitive performance and is more lightweight compared to state-of-the-art FPN-based detectors.

32 citations


Journal ArticleDOI
TL;DR: In this article , an attention UW-Net is proposed to improve the accuracy and give a probabilistic map for automatic annotation from small data set to reduce the use of tedious and prone to error manual annotations from chest X-rays.

30 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a contour-aware attention decoder to extract boundary, shape and shape cues from CT contours and leverage these features in refining the infected areas.

29 citations


Journal ArticleDOI
TL;DR: Comparative experimental results prove that the proposed method considerably improves the accuracy of small object detection on multiple benchmark datasets and achieves a high real-time performance.
Abstract: As one type of object detection, small object detection has been widely used in daily-life-related applications with many real-time requirements, such as autopilot and navigation. Although deep-learning-based object detection methods have achieved great success in recent years, they are not effective in small object detection and most of them cannot achieve real-time processing. Therefore, this paper proposes a single-stage small object detection network (SODNet) that integrates the specialized feature extraction and information fusion techniques. An adaptively spatial parallel convolution module (ASPConv) is proposed to alleviate the lack of spatial information for target objects and adaptively obtain the corresponding spatial information through multi-scale receptive fields, thereby improving the feature extraction ability. Additionally, a split-fusion sub-module (SF) is proposed to effectively reduce the time complexity of ASPConv. A fast multi-scale fusion module (FMF) is proposed to alleviate the insufficient fusion of both semantic and spatial information. FMF uses two fast upsampling operators to first unify the resolution of the multi-scale feature maps extracted by the network and then fuse them, thereby effectively improving the small object detection ability. Comparative experimental results prove that the proposed method considerably improves the accuracy of small object detection on multiple benchmark datasets and achieves a high real-time performance.

27 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a novel semantic segmentation framework for remote sensing images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet.
Abstract: Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at https://github.com/XinnHe/ST-UNet .

25 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a radar data-based U-Net model for precipitation nowcasting, which consists of three operations: upsampling, downsampling and skip connection.
Abstract: Convective precipitation nowcasting remains challenging due to the fast change in convective weather. Radar images are the most important data source in nowcasting research area. This study proposes a radar data-based U-Net model for precipitation nowcasting. The nowcasting problem is first transformed into an image-to-image translation problem in deep learning under the U-Net architecture, which is based on convolutional neural networks (CNNs). The input of the model is five consecutive radar images; the output is the predicted radar reflectivity image. The model consists of three operations: upsampling, downsampling, and skip connection. Three methods, U-Net, TREC, and TrajGRU, are used for comparison in the experiments. The experimental results show that both deep learning methods outperform the TREC method, and the CNN-based U-Net can achieve almost the same performance as TrajGRU which is a recurrent neural network (RNN)-based model. With the advantages that U-Net is simple, efficient, easy to understand, and customize, this result shows the great potential of CNN-based models in addressing time-series applications.

22 citations


Proceedings ArticleDOI
01 Jun 2022
TL;DR: Li et al. as discussed by the authors proposed Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network, which addresses the problem of insufficient receptive field in single stride architectures and cooperates well with the sparsity of point clouds.
Abstract: In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large-scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL_1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Our codes will be public soon.

Journal ArticleDOI
TL;DR: Poly-YOLO as discussed by the authors is a new version of YOLO with better performance and extended with instance segmentation, which is trained to detect size-independent polygons defined on a polar grid.
Abstract: We present a new version of YOLO with better performance and extended with instance segmentation called Poly-YOLO. Poly-YOLO builds on the original ideas of YOLOv3 and removes two of its weaknesses: a large amount of rewritten labels and an inefficient distribution of anchors. Poly-YOLO reduces the issues by aggregating features from a light SE-Darknet-53 backbone with a hypercolumn technique, using stairstep upsampling, and produces a single scale output with high resolution. In comparison with YOLOv3, Poly-YOLO has only 60% of its trainable parameters but improves the mean average precision by a relative 40%. We also present Poly-YOLO lite with fewer parameters and a lower output resolution. It has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices. Finally, Poly-YOLO performs instance segmentation by bounding polygons. The network is trained to detect size-independent polygons defined on a polar grid. Vertices of each polygon are being predicted with their confidence, and therefore, Poly-YOLO produces polygons with a varying number of vertices. Source code is available at https://gitlab.com/irafm-ai/poly-yolo .

Journal ArticleDOI
TL;DR: Wu et al. as mentioned in this paper proposed an Extremely-Downsampled Network (EDN), which employs an extreme downsampling technique to effectively learn a global view of the whole image, leading to accurate salient object localization.
Abstract: Recent progress on salient object detection (SOD) mainly benefits from multi-scale learning, where the high-level and low-level features collaborate in locating salient objects and discovering fine details, respectively. However, most efforts are devoted to low-level feature learning by fusing multi-scale features or enhancing boundary representations. High-level features, which although have long proven effective for many other tasks, yet have been barely studied for SOD. In this paper, we tap into this gap and show that enhancing high-level features is essential for SOD as well. To this end, we introduce an Extremely-Downsampled Network (EDN), which employs an extreme downsampling technique to effectively learn a global view of the whole image, leading to accurate salient object localization. To accomplish better multi-level feature fusion, we construct the Scale-Correlated Pyramid Convolution (SCPC) to build an elegant decoder for recovering object details from the above extreme downsampling. Extensive experiments demonstrate that EDN achieves state-of-the-art performance with real-time speed. Our efficient EDN-Lite also achieves competitive performance with a speed of 316fps. Hence, this work is expected to spark some new thinking in SOD. Code is available at https://github.com/yuhuan-wu/EDN.

Journal ArticleDOI
01 Apr 2022-Sensors
TL;DR: Compared with other models, the proposed model improved both detection accuracy and inference speed, indicating that the MobileNet_CenterNet model had better real-time performance and robustness.
Abstract: For the issue of low accuracy and poor real-time performance of insulator and defect detection by an unmanned aerial vehicle (UAV) in the process of power inspection, an insulator detection model MobileNet_CenterNet was proposed in this study. First, the lightweight network MobileNet V1 was used to replace the feature extraction network Resnet-50 of the original model, aiming to ensure the detection accuracy of the model while speeding up its detection speed. Second, a spatial and channel attention mechanism convolutional block attention module (CBAM) was introduced in CenterNet, aiming to improve the prediction accuracy of small target insulator position information. Then, three transposed convolution modules were added for upsampling, aiming to better restore the semantic information and position information of the image. Finally, the insulator dataset (ID) constructed by ourselves and the public dataset (CPLID) were used for model training and validation, aiming to improve the generalization ability of the model. The experimental results showed that compared with the CenterNet model, MobileNet_CenterNet improved the detection accuracy by 12.2%, the inference speed by 1.1 f/s for FPS-CPU and 4.9 f/s for FPS-GPU, and the model size was reduced by 37 MB. Compared with other models, our proposed model improved both detection accuracy and inference speed, indicating that the MobileNet_CenterNet model had better real-time performance and robustness.

Journal ArticleDOI
TL;DR: IndexNet as discussed by the authors proposes a new learnable module, termed IndexNet, which dynamically generates indices conditioned on the feature map to guide downsampling and upsampling stages, without extra training supervision.
Abstract: We show that existing upsampling operators in convolutional networks can be unified using the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can often recover boundary details considerably better than other upsampling operators such as bilinear interpolation. By viewing the indices as a function of the feature map, we introduce the concept of 'learning to index', and present a novel index-guided encoder-decoder framework where indices are learned adaptively from data and are used to guide downsampling and upsampling stages, without extra training supervision. At the core of this framework is a new learnable module, termed Index Network (IndexNet), which dynamically generates indices conditioned on the feature map. IndexNet can be used as a plug-in, applicable to almost all convolutional networks that have coupled downsampling and upsampling stages, enabling the networks to dynamically capture variations of local patterns. In particular, we instantiate and investigate five families of IndexNet. We highlight their superiority in delivering spatial information over other upsampling operators with experiments on synthetic data, and demonstrate their effectiveness on four dense prediction tasks, including image matting, image denoising, semantic segmentation, and monocular depth estimation. Code and models are available at https://git.io/IndexNet.

Journal ArticleDOI
TL;DR: An image enhancement-based detection algorithm to solve the problem that small objects are difficult to detect due to their small proportion or dimness, which outperforms the existing work on various evaluation indicators.
Abstract: : Today, target detection has an indispensable application in various fields. Infrared small-target detection, as a branch of target detection, can improve the perception capability of autonomous systems, and it has good application prospects in infrared alarm, automatic driving and other fields. There are many well-established algorithms that perform well in infrared small-target detection. Nevertheless, the current algorithms cannot achieve the expected detection effect in complex environments, such as background clutter, noise inundation or very small targets. We have designed an image enhancement-based detection algorithm to solve both problems through detail enhancement and target expansion. This method first enhances the mutation information, detail and edge information of the image and then improves the contrast between the target edge and the adjacent pixels to make the target more prominent. The enhancement improves the robustness of detection with background clutter or noise-flooded scenes. Moreover, bicubic interpolation is used on the input image, and the target pixels are expanded with upsampling, which enhances the detection effectiveness for tiny targets. From the results of qualitative and quantitative experiments, the algorithm proposed in this paper outperforms the existing work on various evaluation indicators. spatial filter enhances small targets at a subtle level, making them more distinctive. The upsampling process amplifies the enhanced small targets, making difficult-to-detect point targets relatively easy to detect. The proposed algorithm effectively solves the problem that small objects are difficult to detect due to their small proportion or dimness. We compare with existing methods on public datasets and conduct extensive ablation studies. The results show that our method outperforms existing methods.

Journal ArticleDOI
15 Jan 2022-Fuel
TL;DR: An upscaling method taking advantage of convolutional neural networks (CNNs) and downsampling techniques showed a satisfying match between the dynamic behaviour of the upscaled model through CNNs and high-resolution properties with a significant reduction of computational cost and time.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed an improved 3D object detection method based on a two-stage detector called the Improved Point-Voxel Region Convolutional Neural Network (IPV-RCNN), which contains online training for data augmentation, upsampling convolution and k-means clustering for the bounding box to achieve 3D detection tasks from raw point clouds.
Abstract: Recently, 3D object detection based on deep learning has achieved impressive performance in complex indoor and outdoor scenes. Among the methods, the two-stage detection method performs the best; however, this method still needs improved accuracy and efficiency, especially for small size objects or autonomous driving scenes. In this paper, we propose an improved 3D object detection method based on a two-stage detector called the Improved Point-Voxel Region Convolutional Neural Network (IPV-RCNN). Our proposed method contains online training for data augmentation, upsampling convolution and k-means clustering for the bounding box to achieve 3D detection tasks from raw point clouds. The evaluation results on the KITTI 3D dataset show that the IPV-RCNN achieved a 96% mAP, which is 3% more accurate than the state-of-the-art detectors.

Journal ArticleDOI
TL;DR: ICIF-Net as discussed by the authors proposes an intra-scale cross-interaction and inter-scale feature fusion network, where the local features and global features, respectively, extracted by CNN and Transformer, are interactively communicated at the same spatial resolution using a linearized Conv Attention module, which motivates the counterpart to glimpse the representation of another branch while preserving its own features.
Abstract: Change detection (CD) of remote sensing (RS) images has enjoyed remarkable success by virtue of convolutional neural networks (CNNs) with promising discriminative capabilities. However, CNNs lack the capability of modeling long-range dependencies in bitemporal image pairs, resulting in inferior identifiability against the same semantic targets yet with varying features. The recently thriving Transformer, on the contrary, is warranted, for practice, with global receptive fields. To jointly harvest the local-global features and circumvent the misalignment issues caused by step-by-step downsampling operations in traditional backbone networks, we propose an intra-scale cross-interaction and inter-scale feature fusion network (ICIF-Net), explicitly tapping the potential of integrating CNN and Transformer. In particular, the local features and global features, respectively, extracted by CNN and Transformer, are interactively communicated at the same spatial resolution using a linearized Conv Attention module, which motivates the counterpart to glimpse the representation of another branch while preserving its own features. In addition, with the introduction of two attention-based inter-scale fusion schemes, including mask-based aggregation and spatial alignment (SA), information integration is enforced at different resolutions. Finally, the integrated features are fed into a conventional change prediction head to generate the output. Extensive experiments conducted on four CD datasets of bitemporal (RS) images demonstrate that our ICIF-Net surpasses the other state-of-the-art (SOTA) approaches.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a hybrid multi-scale transformer module for HR remote images change detection, which fully models representation attentions at hybrid scales of each image via a fine-grained self-attention mechanism.
Abstract: Existing optical remote sensing image change detection (CD) methods aim to learn an appropriate discriminate decision by analyzing the feature information of bitemporal images obtained at the same place. However, the complex scenes in high-resolution (HR) remote images cause unsatisfied results, especially for some irregular and occluded objects. Although recent self-attention-driven change detection models with CNN achieve promising effects, the computational and consumed parameters costs emerge as an impassable gap for HR images. In this paper, we utilize a transformer structure replacing self-attention to learn stronger feature representations per image. In addition, concurrent vision transformer models only consider tokenizing single-dimensional image tokens, thus failing to build multi-scale long-range interactions among features. Here, we propose a hybrid multi-scale transformer module for HR remote images change detection, which fully models representation attentions at hybrid scales of each image via a fine-grained self-attention mechanism. The key idea of the hybrid transformer structure is to establish heterogeneous semantic tokens containing multiple receptive fields, thus simultaneously preserving large object and fine-grained features. For building relationships between features without embedding with token sequences from the Siamese tokenizer, we also introduced a hybrid difference transformer decoder (HDTD) layer to further strengthen multi-scale global dependencies of high-level features. Compared to capturing single-stream tokens, our HDTD layer directly focuses representing differential features without increasing exponential computational cost. Finally, we propose a cascade feature decoder (CFD) for aggregating different-dimensional upsampling features by establishing difference skip-connections. To evaluate the effectiveness of the proposed method, experiments on two HR remote sensing CD datasets are conducted. Compared to state-of-the-art methods, our Hybrid-TransCD achieved superior performance on both datasets (i.e., LEVIR-CD, SYSU-CD) with improvements of 0.75% and 1.98%, respectively.

Journal ArticleDOI
TL;DR: In this paper , a multiscale low-rank deep back projection fusion network (MLR-DBPFN) is proposed to fuse LR hyperspectral (HS) data and HR multispectral data.
Abstract: Fusing low spatial resolution (LR) hyperspectral (HS) data and high spatial resolution (HR) multispectral (MS) data aims to obtain HR HS data. However, due to bad weather and the aging of sensor equipment, HS images usually contain a lot of noise, e.g., Gaussian noise, strip noise, and mixed noise, which would make the fused image have low quality. To solve this problem, we propose the multiscale low-rank deep back projection fusion network (MLR-DBPFN). First, HS and MS are superimposed, and multiscale spectral features of the stacked image are extracted through multiscale low-rank decomposition and convolution operation, which effectively removes noisy spectral features. Second, the upsampling and downsampling network mechanisms are used to extract the multiscale spatial features from each layer of spectral features. Finally, the multiscale spectral features and multiscale spatial features are combined for network training, and the weight of the noisy spectrum features is reduced through the network feedback mechanism, which suppresses the noisy spectrum and improves the noisy HS fusion performance. Experimental results on datasets of different noise demonstrate that MLR-DBPFN has superior spatial and spectral fidelity, comparative fusion quality, and robust antinoise performance compared with state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this paper , a deep convolutional network within the mature Gaussian-Laplacian pyramid for pansharpening (LPPNet) is presented. But the overall structure of LPPNet is a cascade of the Laplacians pyramid dense network with a similar structure at each pyramid level.
Abstract: Hyperspectral (HS) pansharpening aims to create a pansharpened image that integrates the spatial details of the panchromatic (PAN) image and the spectral content of the HS image. In this article, we present a deep convolutional network within the mature Gaussian–Laplacian pyramid for pansharpening (LPPNet). The overall structure of LPPNet is a cascade of the Laplacian pyramid dense network with a similar structure at each pyramid level. Following the general idea of multiresolution analysis (MRA), the subband residuals of the desired HS images are extracted from the PAN image and injected into the upsampled HS image to reconstruct the high-resolution HS images level by level. Applying the mature Laplace pyramid decomposition technique to the convolution neural network (CNN) can simplify the pansharpening problem into several pyramid-level learning problems so that the pansharpening problem can be solved with a shallow CNN with fewer parameters. Specifically, the Laplacian pyramid technology is used to decompose the image into different levels that can differentiate large- and small-scale details, and each level is handled by a spatial subnetwork in a divide-and-conquer way to make the network more efficient. Experimental results show that the proposed LPPNet method performs favorably against some state-of-the-art pansharpening methods in terms of objective indexes and subjective visual appearance.

Proceedings ArticleDOI
23 May 2022
TL;DR: A novel Multi-level Enhancement Layers Network (MELNet) based on BLS framework is proposed for real-time vision tasks in a complex street scene on the unmanned mobile robot and reveals that MELNet could be run adequately on the embedded device and effectively operate in the real-robot system.
Abstract: This article investigates the real-time semantic segmentation in robot engineering applications based on the Broad Learning System (BLS), and a novel Multi-level Enhancement Layers Network (MELNet) based on BLS framework is proposed for real-time vision tasks in a complex street scene on the unmanned mobile robot. This network mainly solves two problems: (1) mitigating the contradiction between accuracy and speed while maintaining low model complexity, and (2) accurately describing objects based on their shape despite their different sizes. Firstly, the BLS architecture is expanded to the deep network with trainable parameters. This trainable network could adjust its weights in a complex environment, and mitigate the adverse impact of the environment on the complex tasks. Secondly, enhancement layers with the extended enhancement layers could extract both detailed information and semantic information. Moreover, an Upsampling Atrous Spatial Pyramid Pooling (UPASPP) is designed to fuse detail and semantic information to describe object features properly. Finally, in the case of the MNIST dataset and Cityscapes dataset, we get high accuracy with 8.01M parameters and quicker inference speed on a single GTX 1070 Ti card. At the same time, the unmanned mobile robot (BIT-NAZA) is employed to evaluate semantic performance in real-world situations. This reveals that MELNet could be run adequately on the embedded device and effectively operate in the real-robot system.

Journal ArticleDOI
TL;DR: In this article, a recursive long short-term memory (LSTM) network was proposed for predicting nonlinear structural seismic responses for arbitrary lengths and sampling rates, using the recursive prediction principle and is therefore applicable to structures and earthquakes with different spectral characteristics and amplitudes.


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a gray-scale fakeness map to preserve more information of fake regions and proposed an attention mechanism to guide the training of the model.
Abstract: Full face synthesis and partial face manipulation by virtue of the generative adversarial networks (GANs) and its variants have raised wide public concerns. In the multi-media forensics area, detecting and ultimately locating the image forgery has become an imperative task. In this work, we investigate the architecture of existing GAN-based face manipulation methods and observe that the imperfection of upsampling methods therewithin could be served as an important asset for GAN-synthesized fake image detection and forgery localization. Based on this basic observation, we have proposed a novel approach, termed FakeLocator, to obtain high localization accuracy, at full resolution, on manipulated facial images. To the best of our knowledge, this is the very first attempt to solve the GAN-based fake localization problem with a gray-scale fakeness map that preserves more information of fake regions. To improve the universality of FakeLocator across multifarious facial attributes, we introduce an attention mechanism to guide the training of the model. To improve the universality of FakeLocator across different DeepFake methods, we propose partial data augmentation and single sample clustering on the training images. Experimental results on popular FaceForensics++, DFFD datasets and seven different state-of-the-art GAN-based face generation methods have shown the effectiveness of our method. Compared with the baselines, our method performs better on various metrics. Moreover, the proposed method is robust against various real-world facial image degradations such as JPEG compression, low-resolution, noise, and blur.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a locally enhanced transformer network (LETNet) to detect pavement cracks from charge-coupled devices (CCDs) captured high-resolution images.
Abstract: Precisely identifying pavement cracks from charge-coupled devices (CCDs) captured high-resolution images faces many challenges. Even though convolutional neural networks (CNNs) have achieved impressive performance in this task, the stacked convolutional layers fail to extract long-range contextual features and impose high computational costs. Therefore, we propose a locally enhanced Transformer network (LETNet) to completely and efficiently detect pavement cracks. In the LETNet, Transformer is employed to model long-range dependencies. By designing a convolution stem and a local enhancement module, both low-level and high-level local features can be compensated. To take advantage of these rich features, a skip connection strategy and an efficient upsampling module is built to restore detailed information. In addition, a defect rectification module is further developed to reinforce the network for hard sample recognition. The quantitative comparison demonstrates that the proposed LETNet outperformed four advanced deep learning-based models with respect to both efficiency and effectiveness. Specifically, the average precision, recall, ODS, IoU, and frame per second (FPS) of the LETNet on three testing datasets are approximately 93.04%, 92.85%, 92.94%, 94.07%, and 30.80FPS, respectively. We also built a comprehensive pavement crack dataset containing 156 high-resolution manually annotated CCD images and made it publicly available on Zenodo.

Journal ArticleDOI
TL;DR: CANINE as discussed by the authors is a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias.
Abstract: Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE, a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by 2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.

Journal ArticleDOI
TL;DR: In this article , a recursive long short-term memory (LSTM) network was proposed for predicting nonlinear structural seismic responses for arbitrary lengths and sampling rates, and the results showed that the proposed recursive LSTM model can adequately reproduce the global and local characteristics of the time history responses on four different structural response datasets.

Journal ArticleDOI
TL;DR: In this article , a low-rank matrix approximation algorithm is proposed for both point clouds and meshes using a local isotropic structure for each point and finding its similar, non-local structures that are organized into a matrix.
Abstract: We propose a robust normal estimation method for both point clouds and meshes using a low rank matrix approximation algorithm. First, we compute a local isotropic structure for each point and find its similar, non-local structures that we organize into a matrix. We then show that a low rank matrix approximation algorithm can robustly estimate normals for both point clouds and meshes. Furthermore, we provide a new filtering method for point cloud data to smooth the position data to fit the estimated normals. We show the applications of our method to point cloud filtering, point set upsampling, surface reconstruction, mesh denoising, and geometric texture removal. Our experiments show that our method generally achieves better results than existing methods.