scispace - formally typeset
Search or ask a question
Topic

Upsampling

About: Upsampling is a research topic. Over the lifetime, 2426 publications have been published within this topic receiving 57613 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel weighted feature fusion HRNet is designed to achieve higher detection precision and is used as the backbone to maintain high-resolution feature representation through the whole process rather than using upsampling to generate high- resolution feature representation as HourglassNet.
Abstract: Recently, anchor-free methods have brought new ideas to the field of object detection that eliminate the need for anchor boxes in object detection and provide a simpler detection structure. CenterNet is the representative anchor-free method. However, this method still has the problem of obtaining high-resolution representation from low-resolution representation using upsampling, and the predicted heatmap is not accurate enough in space and does not make full use of the shallow low-level features of the network. We introduce CenterNet-HRA to solve this problem. An attention module is proposed to calibrate the high-level semantic features of the network output using the shallow low-level features from different receptive fields; HRNet is used as the backbone to maintain high-resolution feature representation through the whole process rather than using upsampling to generate high-resolution feature representation as HourglassNet. Considering that the feature representations with different resolutions have different contributions to the network but HRNet fuses them without distinction, a novel weighted feature fusion HRNet is designed to achieve higher detection precision. Our method achieves an average precision (AP) of 42.3% at 13.5 frames-per-second (FPS) (40.3% AP at 13.3 FPS for CenterNet-HG) on the MS-COCO benchmark.

9 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed framework can achieve higher crowd counting performance in dense scenarios and can accurately predict the location of crowds.
Abstract: In the past ten years, crowd detection and counting have been applied in many fields such as station crowd statistics, urban safety prevention, and people flow statistics. However, obtaining accurate positions and improving the performance of crowd counting in dense scenes still face challenges, and it is worthwhile devoting much effort to this. In this paper, a new framework is proposed to resolve the problem. The proposed framework includes two parts. The first part is a fully convolutional neural network (CNN) consisting of backend and upsampling. In the first part, backend uses the residual network (ResNet) to encode the features of the input picture, and upsampling uses the deconvolution layer to decode the feature information. The first part processes the input image, and the processed image is input to the second part. The second part is a peak confidence map (PCM), which is proposed based on an improvement over the density map (DM). Compared with DM, PCM can not only solve the problem of crowd counting but also accurately predict the location of the person. The experimental results on several datasets (Beijing-BRT, Mall, Shanghai Tech, and UCF_CC_50 datasets) show that the proposed framework can achieve higher crowd counting performance in dense scenarios and can accurately predict the location of crowds.

9 citations

Journal ArticleDOI
TL;DR: In this paper, a sensor fusion approach that combines data having low-spatial resolution but high temporal precision gathered with a single-photon-avalanche-diode (SPAD) array with data that has high spatial but no temporal resolution, such as that acquired with a standard CMOS camera is proposed.
Abstract: Imaging across both the full transverse spatial and temporal dimensions of a scene with high precision in all three coordinates is key to applications ranging from LIDAR to fluorescence lifetime imaging. However, compromises that sacrifice, for example, spatial resolution at the expense of temporal resolution are often required, in particular when the full 3-dimensional data cube is required in short acquisition times. We introduce a sensor fusion approach that combines data having low-spatial resolution but high temporal precision gathered with a single-photon-avalanche-diode (SPAD) array with data that has high spatial but no temporal resolution, such as that acquired with a standard CMOS camera. Our method, based on blurring the image on the SPAD array and computational sensor fusion, reconstructs time-resolved images at significantly higher spatial resolution than the SPAD input, upsampling numerical data by a factor $$12 \times 12$$ , and demonstrating up to $$4 \times 4$$ upsampling of experimental data. We demonstrate the technique for both LIDAR applications and FLIM of fluorescent cancer cells. This technique paves the way to high spatial resolution SPAD imaging or, equivalently, FLIM imaging with conventional microscopes at frame rates accelerated by more than an order of magnitude.

9 citations

Journal ArticleDOI
TL;DR: A class of high-precision, multiplier-free realizations for FIR filters that use upsampling and downsampling in conjunction with a periodically time-varying system to achieve time-invariant, multiplier -free FIR filter operation is proposed.
Abstract: Proposes a class of high-precision, multiplier-free realizations for FIR filters. These realizations use upsampling and downsampling in conjunction with a periodically time-varying system to achieve time-invariant, multiplier-free FIR filter operation. Nonbinary encoding schemes are used for obtaining the filter coefficients, which are periodically time-varying (PTV), i.e., they vary in a periodic fashion. Each target filter coefficient is directly mapped into a set of PTV coefficients so that the realizations are easy to obtain. The values of the PTV coefficients are restricted to either the ternary set. >

9 citations

Proceedings ArticleDOI
24 Jun 2018
TL;DR: This paper presents a full reference objective video quality metric (SRQM), which characterises the relationship between variations in spatial resolution and visual quality in the context of adaptive video formats.
Abstract: This paper presents a full reference objective video quality metric (SRQM), which characterises the relationship between variations in spatial resolution and visual quality in the context of adaptive video formats. SRQM uses wavelet decomposition, subband combination with perceptually inspired weights, and spatial pooling, to estimate the relative quality between the frames of a high resolution reference video, and one that has been spatially adapted through a combination of down and upsampling. The uVI-SR video database is used to benchmark SRQM against five commonly-used quality metrics. The database contains 24 diverse video sequences that span a range of spatial resolutions up to UHD-I $(3840\times 2160)$. An in- depth analysis demonstrates that SRQM is statistically superior to the other quality metrics for all tested adaptation filters, and all with relatively low computational complexity.

9 citations


Network Information
Related Topics (5)
Convolutional neural network
74.7K papers, 2M citations
90% related
Image segmentation
79.6K papers, 1.8M citations
90% related
Feature extraction
111.8K papers, 2.1M citations
89% related
Deep learning
79.8K papers, 2.1M citations
88% related
Feature (computer vision)
128.2K papers, 1.7M citations
87% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023469
2022859
2021330
2020322
2019298
2018236