Home
/
Authors
/
Shih-En Wei

Author

Shih-En Wei

Other affiliations: Academia Sinica, Carnegie Mellon University, National Taiwan University

Bio: Shih-En Wei is an academic researcher from Facebook. The author has contributed to research in topics: Pose & Augmented reality. The author has an hindex of 13, co-authored 33 publications receiving 10678 citations. Previous affiliations of Shih-En Wei include Academia Sinica & Carnegie Mellon University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

[...]

Zhe Cao¹, Tomas Simon¹, Shih-En Wei¹, Yaser Sheikh¹•Institutions (1)

Carnegie Mellon University¹

21 Jul 2017

TL;DR: Part Affinity Fields (PAFs) as discussed by the authors uses a nonparametric representation to learn to associate body parts with individuals in the image and achieves state-of-the-art performance on the MPII Multi-Person benchmark.

...read moreread less

Abstract: We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed first in the inaugural COCO 2016 keypoints challenge, and significantly exceeds the previous state-of-the-art result on the MPII Multi-Person benchmark, both in performance and efficiency.

...read moreread less

3,958 citations

Posted Content•

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

[...]

Zhe Cao¹, Tomas Simon¹, Shih-En Wei¹, Yaser Sheikh¹•Institutions (1)

Carnegie Mellon University¹

24 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents an approach to efficiently detect the 2D pose of multiple people in an image using a nonparametric representation, which it refers to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image.

...read moreread less

3,791 citations

Journal Article•DOI•

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

[...]

Zhe Cao¹, Gines Hidalgo², Tomas Simon³, Shih-En Wei³, Yaser Sheikh² - Show less +1 more•Institutions (3)

University of California, Berkeley¹, Carnegie Mellon University², Facebook³

01 Jan 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: OpenPose as mentioned in this paper uses Part Affinity Fields (PAFs) to learn to associate body parts with individuals in the image, which achieves high accuracy and real-time performance.

...read moreread less

Abstract: Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.

...read moreread less

2,911 citations

Proceedings Article•DOI•

Convolutional Pose Machines

[...]

Shih-En Wei¹, Varun Ramakrishna¹, Takeo Kanade¹, Yaser Sheikh¹•Institutions (1)

Carnegie Mellon University¹

30 Jan 2016

TL;DR: In this paper, a convolutional network is incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation, which can implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation.

...read moreread less

Abstract: Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.

...read moreread less

2,687 citations

Posted Content•

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

[...]

Zhe Cao¹, Gines Hidalgo², Tomas Simon³, Shih-En Wei³, Yaser Sheikh² - Show less +1 more•Institutions (3)

University of California, Berkeley¹, Carnegie Mellon University², Facebook³

18 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.

...read moreread less

986 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Proceedings Article•DOI•

Mask R-CNN

[...]

Kaiming He¹, Georgia Gkioxari¹, Piotr Dollár², Ross Girshick²•Institutions (2)

Facebook¹, École Centrale Paris²

20 Mar 2017

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

...read moreread less

14,299 citations

Proceedings Article•

Mask R-CNN

[...]

Kaiming He¹, Georgia Gkioxari², Piotr Dollár³, Ross Girshick³•Institutions (3)

Microsoft¹, University of California², École Centrale Paris³

20 Mar 2017

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation that outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: this https URL

...read moreread less

11,343 citations

Proceedings Article•DOI•

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

[...]

Zhe Cao¹, Tomas Simon¹, Shih-En Wei¹, Yaser Sheikh¹•Institutions (1)

Carnegie Mellon University¹

21 Jul 2017

...read moreread less

3,958 citations

Book Chapter•DOI•

Stacked Hourglass Networks for Human Pose Estimation

[...]

Alejandro Newell¹, Kaiyu Yang¹, Jia Deng¹•Institutions (1)

University of Michigan¹

08 Oct 2016

TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.

...read moreread less

Abstract: This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.

...read moreread less

3,865 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse