Topic
Pooling
About: Pooling is a research topic. Over the lifetime, 5583 publications have been published within this topic receiving 161394 citations.
Papers published on a yearly basis
Papers
More filters
••
07 Jun 2015TL;DR: In this paper, an efficient position refinement model is proposed to estimate the joint offset location within a small region of the image. And this model is jointly trained with a state-of-the-art ConvNet model to achieve improved accuracy in human joint location estimation.
Abstract: Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet architectures include pooling and sub-sampling layers which reduce computational requirements, introduce invariance and prevent over-training. These benefits of pooling come at the cost of reduced localization accuracy. We introduce a novel architecture which includes an efficient ‘position refinement’ model that is trained to estimate the joint offset location within a small region of the image. This refinement model is jointly trained in cascade with a state-of-the-art ConvNet model [21] to achieve improved accuracy in human joint location estimation. We show that the variance of our detector approaches the variance of human annotations on the FLIC [20] dataset and outperforms all existing approaches on the MPII-human-pose dataset [1].
941 citations
•
TL;DR: A novel architecture which includes an efficient `position refinement' model that is trained to estimate the joint offset location within a small region of the image to achieve improved accuracy in human joint location estimation is introduced.
Abstract: Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet architectures include pooling and sub-sampling layers which reduce computational requirements, introduce invariance and prevent over-training. These benefits of pooling come at the cost of reduced localization accuracy. We introduce a novel architecture which includes an efficient `position refinement' model that is trained to estimate the joint offset location within a small region of the image. This refinement model is jointly trained in cascade with a state-of-the-art ConvNet model to achieve improved accuracy in human joint location estimation. We show that the variance of our detector approaches the variance of human annotations on the FLIC dataset and outperforms all existing approaches on the MPII-human-pose dataset.
877 citations
••
27 Jun 2016TL;DR: The authors proposed two compact bilinear representations with the same discriminative power as the full Bilinear representation but with only a few thousand dimensions, which allow back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system.
Abstract: Bilinear models has been shown to achieve impressive performance on a wide range of visual tasks, such as semantic segmentation, fine grained recognition and face recognition. However, bilinear features are high dimensional, typically on the order of hundreds of thousands to a few million, which makes them impractical for subsequent analysis. We propose two compact bilinear representations with the same discriminative power as the full bilinear representation but with only a few thousand dimensions. Our compact representations allow back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system. The compact bilinear representations are derived through a novel kernelized analysis of bilinear pooling which provide insights into the discriminative power of bilinear pooling, and a platform for further research in compact pooling methods. Experimentation illustrate the utility of the proposed representations for image classification and few-shot learning across several datasets.
854 citations
•
TL;DR: DeepLabv3+ as discussed by the authors extends DeepLab v3+ by adding a simple decoder module to refine the segmentation results especially along object boundaries and further explore the Xception model and apply the depthwise separable convolution to both Atrous spatial pyramid pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.
Abstract: Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information In this work, we propose to combine the advantages from both methods Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 890\% and 821\% without any post-processing Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at \url{this https URL}
836 citations