Open AccessPosted Content
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
TLDR
This paper proposes a new hybrid architecture that consists of a deep Convolu-tional Network and a Markov Random Field and shows how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images.Abstract:
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.read more
Citations
More filters
Book
Deep Learning
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Proceedings ArticleDOI
Fully convolutional networks for semantic segmentation
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Posted Content
Fully Convolutional Networks for Semantic Segmentation
TL;DR: It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.
Proceedings Article
Spatial transformer networks
TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.
Book ChapterDOI
Stacked Hourglass Networks for Human Pose Estimation
TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.
References
More filters
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI
Object recognition from local scale-invariant features
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Book ChapterDOI
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler,Rob Fergus +1 more
TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Proceedings Article
On the importance of initialization and momentum in deep learning
TL;DR: It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.
Posted Content
CNN Features off-the-shelf: an Astounding Baseline for Recognition
TL;DR: A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.