scispace - formally typeset
Search or ask a question
Topic

Upsampling

About: Upsampling is a research topic. Over the lifetime, 2426 publications have been published within this topic receiving 57613 citations.


Papers
More filters
Posted Content
TL;DR: The problem of segmenting an object given a natural language expression that describes it is addressed and a novel method that integrates linguistic and visual information in the channel dimension and the intermediate information generated when downsampling the image is proposed, so that detailed segmentations can be obtained.
Abstract: We address the problem of segmenting an object given a natural language expression that describes it. Current techniques tackle this task by either (\textit{i}) directly or recursively merging linguistic and visual information in the channel dimension and then performing convolutions; or by (\textit{ii}) mapping the expression to a space in which it can be thought of as a filter, whose response is directly related to the presence of the object at a given spatial coordinate in the image, so that a convolution can be applied to look for the object. We propose a novel method that integrates these two insights in order to fully exploit the recursive nature of language. Additionally, during the upsampling process, we take advantage of the intermediate information generated when downsampling the image, so that detailed segmentations can be obtained. We compare our method against the state-of-the-art approaches in four standard datasets, in which it surpasses all previous methods in six of eight of the splits for this task.

34 citations

Posted Content
TL;DR: This work proposes a principled convolutional neural pyramid (CNP) framework for general low-level vision and image processing tasks based on the essential finding that many applications require large receptive fields for structure understanding.
Abstract: We propose a principled convolutional neural pyramid (CNP) framework for general low-level vision and image processing tasks. It is based on the essential finding that many applications require large receptive fields for structure understanding. But corresponding neural networks for regression either stack many layers or apply large kernels to achieve it, which is computationally very costly. Our pyramid structure can greatly enlarge the field while not sacrificing computation efficiency. Extra benefit includes adaptive network depth and progressive upsampling for quasi-realtime testing on VGA-size input. Our method profits a broad set of applications, such as depth/RGB image restoration, completion, noise/artifact removal, edge refinement, image filtering, image enhancement and colorization.

34 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a multichannel feature fusion lozenge network (MLNet), which is a three-sided network composed of three branches: one branch uses different levels of feature indexes to sample to maintain the integrity of high-frequency information; one branch focuses on contextual information and strengthen the compatibility of information within and between classes; and the last branch uses feature integration to filter redundant information based on multiresolution segmentation to extract key features.
Abstract: The use of remote sensing images for land cover analysis has broad prospects. At present, the resolution of aerial remote sensing images is getting higher and higher, and the span of time and space is getting larger and larger, therefore segmenting target objects enconter great difficulties. Convolutional neural networks are widely used in many image semantic segmentation tasks, but existing models often use simple accumulation of various convolutional layers or the direct stacking of interfeature reuse of up- and downsampling, the network very heavy. To improve the accuracy of land cover segmentation, we propose a multichannel feature fusion lozenge network. The multichannel feature fusion lozenge network (MLNet) is a three-sided network composed of three branches: one branch uses different levels of feature indexes to sample to maintain the integrity of high-frequency information; one branch focuses on contextual information and strengthens the compatibility of information within and between classes; and the last branch uses feature integration to filter redundant information based on multiresolution segmentation to extract key features. Compared with FCN, UNet, PSP, and other serial single road computing models, the MLNet, which performs feature fusion after three-way parallelism structure, can significantly improve the accuracy with only small increase in complexity. Experimental results show that the average accuracy of 85.30% is obtained on the land cover data set, which is much higher than that of 82.98% of FCN, 81.87% of UNet, 77.52% of SegNet, and 83.09% of EspNet, which proves the effectiveness of the model.

33 citations

Posted Content
TL;DR: This work designs a self-guided upsample module to tackle the interpolation blur problem caused by bilinear upsampling between pyramid levels, and proposes a pyramid distillation loss to add supervision for intermediate levels via distilling the finest flow as pseudo labels.
Abstract: We present an unsupervised learning approach for optical flow estimation by improving the upsampling and learning of pyramid network. We design a self-guided upsample module to tackle the interpolation blur problem caused by bilinear upsampling between pyramid levels. Moreover, we propose a pyramid distillation loss to add supervision for intermediate levels via distilling the finest flow as pseudo labels. By integrating these two components together, our method achieves the best performance for unsupervised optical flow learning on multiple leading benchmarks, including MPI-SIntel, KITTI 2012 and KITTI 2015. In particular, we achieve EPE=1.4 on KITTI 2012 and F1=9.38% on KITTI 2015, which outperform the previous state-of-the-art methods by 22.2% and 15.7%, respectively.

33 citations

Journal ArticleDOI
TL;DR: The recognition results and the comparison with the other target detectors demonstrate the effectiveness of the proposed YOLOv4 structure and the method of data preprocessing.
Abstract: The YOLOv4 neural network is employed for underwater target recognition. To improve the accuracy and speed of recognition, the structure of YOLOv4 is modified by replacing the upsampling module with a deconvolution module and by incorporating depthwise separable convolution into the network. Moreover, the training set used in the YOLO network is preprocessed by using a modified mosaic augmentation, in which the gray world algorithm is used to derive two images when performing mosaic augmentation. The recognition results and the comparison with the other target detectors demonstrate the effectiveness of the proposed YOLOv4 structure and the method of data preprocessing. According to both subjective and objective evaluation, the proposed target recognition strategy can effectively improve the accuracy and speed of underwater target recognition and reduce the requirement of hardware performance as well.

33 citations


Network Information
Related Topics (5)
Convolutional neural network
74.7K papers, 2M citations
90% related
Image segmentation
79.6K papers, 1.8M citations
90% related
Feature extraction
111.8K papers, 2.1M citations
89% related
Deep learning
79.8K papers, 2.1M citations
88% related
Feature (computer vision)
128.2K papers, 1.7M citations
87% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023469
2022859
2021330
2020322
2019298
2018236