scispace - formally typeset
Proceedings ArticleDOI

An End-To-End Framework For Pose Estimation Of Occluded Pedestrians

Reads0
Chats0
TLDR
A novel multi-task framework for end-to-end training towards the entire pose estimation of pedestrians including in situations of any kind of occlusion, which outperforms the SOTA results for pose estimation, instance segmentation and pedestrian detection in cases of heavy occlusions.
Abstract
Pose estimation in the wild is a challenging problem, particularly in situations of(i) occlusions of varying degrees, and (ii) crowded outdoor scenes. Most of the existing studies of pose estimation did not report the performance in similar situations. Moreover, pose annotations for occluded parts of the human figures have not been provided in any of the relevant standard datasets, which in turn creates further difficulties to the required studies for pose estimation of the entire Figure for occluded humans. Well known pedestrian detection datasets such as CityPersons contains samples of outdoor scenes but it does not include pose annotations. Here we propose a novel multi-task framework for end-to-end training towards the entire pose estimation of pedestrians including in situations of any kind of occlusion. To tackle this problem, we make use of a pose estimation dataset, MS-COCO, and employ unsupervised adversarial instance-level domain adaptation for estimating the entire pose of occluded pedestrians. The experimental studies show that the proposed framework outperforms the SOTA results for pose estimation, instance segmentation and pedestrian detection in cases of heavy occlusions (HO) and reasonable + heavy occlusions (R+HO) on the two benchmark datasets.

read more

Citations
More filters
Posted Content

Spatio-Contextual Deep Network Based Multimodal Pedestrian Detection For Autonomous Driving.

TL;DR: In this paper, the authors proposed an end-to-end multimodal fusion model for pedestrian detection using RGB and thermal images, which consists of two distinct deformable ResNeXt-50 encoders for feature extraction from the two modalities.
Journal ArticleDOI

Spatio-Contextual Deep Network-Based Multimodal Pedestrian Detection for Autonomous Driving

TL;DR: In this paper , a multimodal fusion model for pedestrian detection using RGB and thermal images is proposed, which consists of two distinct deformable ResNeXt-50 encoders for feature extraction from the two modalities.
Journal ArticleDOI

Using closed-circuit television cameras to analyze traffic safety at intersections based on vehicle key points detection.

TL;DR: In this paper , the authors proposed a framework named "Near Miss Event Detection System (NMEDS)" for road safety diagnostics using video data collected from CCTV cameras, which combined the Mask-RCNN bounding box detection and occlusion-Net detection algorithm to reconstruct vehicles' key points in a 3D view.
Journal ArticleDOI

Deep Multi-Task Networks For Occluded Pedestrian Pose Estimation

TL;DR: This work proposes a multi-task framework to extract pedestrian features through detection and instance segmentation tasks performed separately on these two distributions, and improves state-of-the-art performances of pose estimation, pedestrian detection, and instance segmentsation.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Related Papers (5)