VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

doi:10.1145/3072959.3073596

Open AccessJournal ArticleDOI

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Dushyant Mehta, +8 more

- 03 May 2017 -

arXiv: Computer Vision and Pattern Recog...

TLDR

This work presents the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera and shows that the approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

Abstract:

We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e. it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

Citations

PDF

Open Access

More filters

Posted Content

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Zhe Cao, +3 more

- 24 Nov 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents an approach to efficiently detect the 2D pose of multiple people in an image using a nonparametric representation, which it refers to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image.

...read moreread less

Posted Content

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Zhe Cao, +4 more

- 18 Dec 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.

...read moreread less

Proceedings ArticleDOI

VIBE: Video Inference for Human Body Pose and Shape Estimation

Muhammed Kocabas, +2 more

TL;DR: This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.

...read moreread less

Book ChapterDOI

Recovering Accurate {3D} Human Pose in the Wild Using {IMUs} and a Moving Camera

Timo von Marcard, +4 more

TL;DR: This work proposes a method that combines a single hand-held camera and a set of Inertial Measurement Units (IMUs) attached at the body limbs to estimate accurate 3D poses in the wild and obtains an accuracy of 26 mm, which makes it accurate enough to serve as a benchmark for image-based 3D pose estimation in theWild.

...read moreread less

Proceedings ArticleDOI

Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation

Mohamed Omran, +4 more

TL;DR: Neural Body Fitting (NBF) as discussed by the authors integrates a statistical body model as a layer within a CNN leveraging both reliable bottom-up body part segmentation and robust top-down body model constraints.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, +3 more

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Book ChapterDOI

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

...read moreread less