Home
/
Authors
/
Umer Rafi

Author

Umer Rafi

Bio: Umer Rafi is an academic researcher from RWTH Aachen University. The author has contributed to research in topics: Pose & Deep learning. The author has an hindex of 6, co-authored 10 publications receiving 372 citations. Previous affiliations of Umer Rafi include University of Bonn.

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

SPENCER: A Socially Aware Service Robot for Passenger Guidance and Help in Busy Airports

[...]

Rudolph Triebel¹, Kai O. Arras², Rachid Alami, Lucas Beyer³, Stefan Breuers³, Raja Chatila, Mohamed Chetouani, Daniel Cremers¹, Vanessa Evers⁴, Michelangelo Fiore, Hayley Hung⁵, Omar Adair Islas Ramirez, Michiel Joosse⁴, Harmish Khambhaita, Tomasz Piotr Kucner⁶, Bastian Leibe³, Achim J. Lilienthal⁶, Timm Linder², Manja Lohse⁴, Martin Magnusson⁶, Billy Okal², Luigi Palmieri², Umer Rafi³, Marieke M. J. W. van Rooij⁷, Lu Zhang⁵, Lu Zhang⁴ - Show less +22 more•Institutions (7)

Technische Universität München¹, University of Freiburg², RWTH Aachen University³, University of Twente⁴, Delft University of Technology⁵, Örebro University⁶, University of Amsterdam⁷

24 Jun 2015

TL;DR: How the SPENCER project advances the fields of detection and tracking of individuals and groups, recognition of human social relations and activities, normative human behavior learning, socially-aware task and motion planning, learning socially annotated maps, and conducting empirical experiments to assess socio-psychological effects of normative robot behaviors is described.

...read moreread less

Abstract: We present an ample description of a socially compliant mobile robotic platform, which is developed in the EU-funded project SPENCER. The purpose of this robot is to assist, inform and guide passengers in large and busy airports. One particular aim is to bring travellers of connecting flights conveniently and efficiently from their arrival gate to the passport control. The uniqueness of the project stems from the strong demand of service robots for this application with a large potential impact for the aviation industry on one side, and on the other side from the scientific advancements in social robotics, brought forward and achieved in SPENCER. The main contributions of SPENCER are novel methods to perceive, learn, and model human social behavior and to use this knowledge to plan appropriate actions in real-time for mobile platforms. In this paper, we describe how the project advances the fields of detection and tracking of individuals and groups, recognition of human social relations and activities, normative human behavior learning, socially-aware task and motion planning, learning socially annotated maps, and conducting empirical experiments to assess socio-psychological effects of normative robot behaviors.

...read moreread less

240 citations

Proceedings Article•DOI•

An Efficient Convolutional Network for Human Pose Estimation

[...]

Umer Rafi¹, Bastian Leibe¹, Juergen Gall², Ilya Kostrikov¹•Institutions (2)

RWTH Aachen University¹, University of Bonn²

01 Jan 2016

TL;DR: An efficient deep network architecture is proposed that is trained efficiently with a transparent procedure and exploits the best available ingredients from deep learning with low computational budget and achieves impressive performance on popular benchmarks in human pose estimation.

...read moreread less

Abstract: In recent years, human pose estimation has greatly benefited from deep learning and huge gains in performance have been achieved. However, to push for maximum performance recent approaches exploit computationally expensive deep network architectures, train on multiple datasets, apply additional post-processing and provide limited details about used design choices. This makes it hard not only to compare different methods and but also to reproduce existing results . In this work, we propose an efficient deep network architecture that is trained efficiently with a transparent procedure and exploits the best available ingredients from deep learning with low computational budget. The network is trained only on the same dataset without pre-training and achieves impressive performance on popular benchmarks in human pose estimation.

...read moreread less

128 citations

Proceedings Article•DOI•

A semantic occlusion model for human pose estimation from a single depth image

[...]

Umer Rafi¹, Juergen Gall², Bastian Leibe¹•Institutions (2)

RWTH Aachen University¹, University of Bonn²

07 Jun 2015

TL;DR: A semantic occlusion model is introduced that is incorporated into a regression forest approach for human pose estimation from depth data and shows that it increases the joint estimation accuracy and outperforms the commercial Kinect 2 SDK for occluded joints.

...read moreread less

Abstract: Human pose estimation from depth data has made significant progress in recent years and commercial sensors estimate human poses in real-time. However, state-of-the-art methods fail in many situations when the humans are partially occluded by objects. In this work, we introduce a semantic occlusion model that is incorporated into a regression forest approach for human pose estimation from depth data. The approach exploits the context information of occluding objects like a table to predict the locations of occluded joints. In our experiments on synthetic and real data, we show that our occlusion model increases the joint estimation accuracy and outperforms the commercial Kinect 2 SDK for occluded joints.

...read moreread less

57 citations

Book Chapter•DOI•

Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos

[...]

Umer Rafi¹, Andreas Doering¹, Bastian Leibe², Juergen Gall¹•Institutions (2)

University of Bonn¹, RWTH Aachen University²

27 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes an approach that relies on keypoint correspondences for associating persons in videos that achieves state-of-the-art results for multi-frame pose estimation and multi- person pose tracking on the PosTrack and PoseTrack data sets.

...read moreread less

Abstract: Video annotation is expensive and time consuming. Consequently, datasets for multi-person pose estimation and tracking are less diverse and have more sparse annotations compared to large scale image datasets for human pose estimation. This makes it challenging to learn deep learning based models for associating keypoints across frames that are robust to nuisance factors such as motion blur and occlusions for the task of multi-person pose tracking. To address this issue, we propose an approach that relies on keypoint correspondences for associating persons in videos. Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image datasets for human pose estimation using self-supervision. Combined with a top-down framework for human pose estimation, we use keypoints correspondences to (i) recover missed pose detections (ii) associate pose detections across video frames. Our approach achieves state-of-the-art results for multi-frame pose estimation and multi-person pose tracking on the PosTrack $2017$ and PoseTrack $2018$ data sets.

...read moreread less

16 citations

Book Chapter•DOI•

Self-supervised Keypoint Correspondences for Multi-person Pose Estimation and Tracking in Videos

[...]

Umer Rafi¹, Andreas Doering¹, Bastian Leibe², Juergen Gall¹•Institutions (2)

University of Bonn¹, RWTH Aachen University²

23 Aug 2020

TL;DR: In this paper, the authors propose an approach that relies on keypoint correspondences for associating persons in videos, which is trained on a large scale image dataset for human pose estimation using self-supervision.

...read moreread less

Abstract: Video annotation is expensive and time consuming. Consequently, datasets for multi-person pose estimation and tracking are less diverse and have more sparse annotations compared to large scale image datasets for human pose estimation. This makes it challenging to learn deep learning based models for associating keypoints across frames that are robust to nuisance factors such as motion blur and occlusions for the task of multi-person pose tracking. To address this issue, we propose an approach that relies on keypoint correspondences for associating persons in videos. Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image dataset for human pose estimation using self-supervision. Combined with a top-down framework for human pose estimation, we use keypoint correspondences to (i) recover missed pose detections and to (ii) associate pose detections across video frames. Our approach achieves state-of-the-art results for multi-frame pose estimation and multi-person pose tracking on the PoseTrack 2017 and 2018 datasets .

...read moreread less

11 citations

Cited by

PDF

Open Access

More filters

Posted Content•

In Defense of the Triplet Loss for Person Re-Identification

[...]

Alexander Hermans, Lucas Beyer, Bastian Leibe

22 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms most other published methods by a large margin.

...read moreread less

Abstract: In the past few years, the field of computer vision has gone through a revolution fueled mainly by the advent of large datasets and the adoption of deep convolutional neural networks for end-to-end learning. The person re-identification subfield is no exception to this. Unfortunately, a prevailing belief in the community seems to be that the triplet loss is inferior to using surrogate losses (classification, verification) followed by a separate metric learning step. We show that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms most other published methods by a large margin.

...read moreread less

2,679 citations

Proceedings Article•DOI•

Multi-context Attention for Human Pose Estimation

[...]

Xiao Chu¹, Wei Yang¹, Wanli Ouyang¹, Cheng Ma², Alan L. Yuille³, Xiaogang Wang¹ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, Tsinghua University², Johns Hopkins University³

21 Jul 2017

TL;DR: Zhang et al. as mentioned in this paper adopted stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics, and designed Hourglass Residual Units (HRUs) to increase the receptive field of the network.

...read moreread less

Abstract: In this paper, we propose to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation. We adopt stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics. The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map. We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on detailed descriptions for different body parts. Hence our model has the ability to focus on different granularity from local salient regions to global semantic consistent spaces. Additionally, we design novel Hourglass Residual Units (HRUs) to increase the receptive field of the network. These units are extensions of residual units with a side branch incorporating filters with larger receptive field, hence features with various scales are learned and combined within the HRUs. The effectiveness of the proposed multi-context attention mechanism and the hourglass residual units is evaluated on two widely used human pose estimation benchmarks. Our approach outperforms all existing methods on both benchmarks over all the body parts. Code has been made publicly available.

...read moreread less

543 citations

Book Chapter•DOI•

Integral Human Pose Regression

[...]

Xiao Sun¹, Bin Xiao¹, Fangyin Wei², Shuang Liang³, Yichen Wei¹ - Show less +1 more•Institutions (3)

Microsoft¹, Peking University², Tongji University³

08 Sep 2018

TL;DR: In this paper, a simple integral operation relates and unifies the heat map representation and joint regression, thus avoiding the non-differentiable post-processing and quantization error of human pose estimation.

...read moreread less

Abstract: State-of-the-art human pose estimation methods are based on heat map representation. In spite of the good performance, the representation has a few issues in nature, such as non-differentiable post-processing and quantization error. This work shows that a simple integral operation relates and unifies the heat map representation and joint regression, thus avoiding the above issues. It is differentiable, efficient, and compatible with any heat map based methods. Its effectiveness is convincingly validated via comprehensive ablation experiments under various settings, specifically on 3D pose estimation, for the first time.

...read moreread less

536 citations

Proceedings Article•DOI•

3D Human Pose Estimation = 2D Pose Estimation + Matching

[...]

Ching-Hang Chen¹, Deva Ramanan¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 2017

TL;DR: In this paper, the authors explore 3D human pose estimation from a single RGB image, using a simple architecture that reasons through intermediate 2D pose predictions, and demonstrate that their approach outperforms almost all state-of-the-art 3D pose estimation systems.

...read moreread less

Abstract: We explore 3D human pose estimation from a single RGB image. While many approaches try to directly predict 3D pose from image measurements, we explore a simple architecture that reasons through intermediate 2D pose predictions. Our approach is based on two key observations (1) Deep neural nets have revolutionized 2D pose estimation, producing accurate 2D predictions even for poses with self-occlusions (2) Big-datasets of 3D mocap data are now readily available, making it tempting to lift predicted 2D poses to 3D through simple memorization (e.g., nearest neighbors). The resulting architecture is straightforward to implement with off-the-shelf 2D pose estimation systems and 3D mocap libraries. Importantly, we demonstratethatsuchmethodsoutperformalmostallstate-of-theart 3D pose estimation systems, most of which directly try to regress 3D pose from 2D measurements.

...read moreread less

465 citations

Proceedings Article•DOI•

2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning

[...]

Diogo C. Luvizon, David Picard¹, Hedi Tabia•Institutions (1)

Centre national de la recherche scientifique¹

26 Feb 2018

TL;DR: It is shown that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results, and that optimization from end-to-end leads to significantly higher accuracy than separated learning.

...read moreread less

Abstract: Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results. Additionally, we demonstrate that optimization from end-to-end leads to significantly higher accuracy than separated learning. The proposed architecture can be trained with data from different categories simultaneously in a seamlessly way. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU) demonstrate the effectiveness of our method on the targeted tasks.

...read moreread less

455 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

Collapse