scispace - formally typeset
Search or ask a question
Topic

Upsampling

About: Upsampling is a research topic. Over the lifetime, 2426 publications have been published within this topic receiving 57613 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a three-stream self-attention network (TSNet) for indoor semantic segmentation comprising two asymmetric input streams (asymmetric encoder structure) and a cross-modal distillation stream with a self-Attention module.
Abstract: This article proposes a three-stream self-attention network (TSNet) for indoor semantic segmentation comprising two asymmetric input streams (asymmetric encoder structure) and a cross-modal distillation stream with a self-attention module. The two asymmetric input streams are ResNet34 for the red-green-blue (RGB) stream and VGGNet16 for the depth stream. Accompanying the RGB and depth streams, a cross-modal distillation stream with a self-attention module extracts new RGB plus depth features in each level in the bottom-up path. In addition, while using bilinear upsampling to recover the spatial resolution of the feature map, we incorporated the feature information of both the RGB flow and the depth flow through the self-attention module. We constructed the NYU Depth V2 dataset to evaluate the TSNet and achieved results comparable to those of current state-of-the-art methods.

69 citations

Journal ArticleDOI
TL;DR: A video compression framework based on spatio-temporal resolution adaptation (ViSTRA) is proposed, which dynamically resamples the input video spatially and temporally during encoding, based on a quantisation-resolution decision, and reconstructs the full resolution video at the decoder.
Abstract: A video compression framework based on spatio-temporal resolution adaptation (ViSTRA) is proposed, which dynamically resamples the input video spatially and temporally during encoding, based on a quantisation-resolution decision, and reconstructs the full resolution video at the decoder. Temporal upsampling is performed using frame repetition, whereas a convolutional neural network super-resolution model is employed for spatial resolution upsampling. ViSTRA has been integrated into the high efficiency video coding reference software (HM 16.14). Experimental results verified via an international challenge show significant improvements, with BD-rate gains of 15% based on PSNR and an average MOS difference of 0.5 based on subjective visual quality tests.

69 citations

Journal ArticleDOI
TL;DR: In this paper, an iterative image-reconstruction algorithm for application to low-intensity computed tomography (CT) projection data, which is based on constrained, total-variation (TV) minimization, was developed.
Abstract: Purpose: We develop an iterative image-reconstruction algorithm for application to low-intensity computed tomography (CT) projection data, which is based on constrained, total-variation (TV) minimization. The algorithm design focuses on recovering structure on length scales comparable to a detector-bin width. Method: Recovering the resolution on the scale of a detector bin, requires that pixel size be much smaller than the bin width. The resulting image array contains many more pixels than data, and this undersampling is overcome with a combination of Fourier upsampling of each projection and the use of constrained, TV-minimization, as suggested by compressive sensing. The presented pseudo-code for solving constrained, TV-minimization is designed to yield an accurate solution to this optimization problem within 100 iterations. Results: The proposed image-reconstruction algorithm is applied to a low-intensity scan of a rabbit with a thin wire, to test resolution. The proposed algorithm is compared with filtered back-projection (FBP). Conclusion: The algorithm may have some advantage over FBP in that the resulting noise-level is lowered at equivalent contrast levels of the wire.

69 citations

Journal ArticleDOI
TL;DR: This paper proposes a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel, and shows that the weighted averaging process with sparsely sampled 3 × 3 kernels outperforms the state of the art by a significant margin in all cases.
Abstract: Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size $$640 \times 480$$ . We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled $$3 \times 3$$ kernels outperforms the state of the art by a significant margin in all cases.

68 citations

Journal ArticleDOI
TL;DR: A more efficient fully convolutional network is proposed by combining the advantages from both atrous spatial pyramid pooling and encoder-decoder by utilizing the deep residual network followed by ASPP as the encoder and combines two scales of high-level features with corresponding low- level features as the decoder at the upsampling stage.
Abstract: Dense semantic labeling is significant in high-resolution remote sensing imagery research and it has been widely used in land-use analysis and environment protection. With the recent success of fully convolutional networks (FCN), various types of network architectures have largely improved performance. Among them, atrous spatial pyramid pooling (ASPP) and encoder-decoder are two successful ones. The former structure is able to extract multi-scale contextual information and multiple effective field-of-view, while the latter structure can recover the spatial information to obtain sharper object boundaries. In this study, we propose a more efficient fully convolutional network by combining the advantages from both structures. Our model utilizes the deep residual network (ResNet) followed by ASPP as the encoder and combines two scales of high-level features with corresponding low-level features as the decoder at the upsampling stage. We further develop a multi-scale loss function to enhance the learning procedure. In the postprocessing, a novel superpixel-based dense conditional random field is employed to refine the predictions. We evaluate the proposed method on the Potsdam and Vaihingen datasets and the experimental results demonstrate that our method performs better than other machine learning or deep learning methods. Compared with the state-of-the-art DeepLab_v3+ our model gains 0.4% and 0.6% improvements in overall accuracy on these two datasets respectively.

68 citations


Network Information
Related Topics (5)
Convolutional neural network
74.7K papers, 2M citations
90% related
Image segmentation
79.6K papers, 1.8M citations
90% related
Feature extraction
111.8K papers, 2.1M citations
89% related
Deep learning
79.8K papers, 2.1M citations
88% related
Feature (computer vision)
128.2K papers, 1.7M citations
87% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023469
2022859
2021330
2020322
2019298
2018236