Home
/
Authors
/
Michael S. Ryoo

Author

Michael S. Ryoo

Other affiliations: California Institute of Technology, Indiana University, Electronics and Telecommunications Research Institute ...read more

Bio: Michael S. Ryoo is an academic researcher from Stony Brook University. The author has contributed to research in topics: Activity recognition & Convolutional neural network. The author has an hindex of 34, co-authored 136 publications receiving 6435 citations. Previous affiliations of Michael S. Ryoo include California Institute of Technology & Indiana University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Human activity analysis: A review

[...]

Jake K. Aggarwal¹, Michael S. Ryoo²•Institutions (2)

University of Texas at Austin¹, Electronics and Telecommunications Research Institute²

29 Apr 2011-ACM Computing Surveys

TL;DR: This article provides a detailed overview of various state-of-the-art research papers on human activity recognition, discussing both the methodologies developed for simple human actions and those for high-level activities.

...read moreread less

Abstract: Human activity recognition is an important area of computer vision research. Its applications include surveillance systems, patient monitoring systems, and a variety of systems that involve interactions between persons and electronic devices such as human-computer interfaces. Most of these applications require an automated recognition of high-level activities, composed of multiple simple (or atomic) actions of persons. This article provides a detailed overview of various state-of-the-art research papers on human activity recognition. We discuss both the methodologies developed for simple human actions and those for high-level activities. An approach-based taxonomy is chosen that compares the advantages and limitations of each approach. Recognition methodologies for an analysis of the simple actions of a single person are first presented in the article. Space-time volume approaches and sequential approaches that represent and recognize activities directly from input images are discussed. Next, hierarchical recognition methodologies for high-level activities are presented and compared. Statistical approaches, syntactic approaches, and description-based approaches for hierarchical recognition are discussed in the article. In addition, we further discuss the papers on the recognition of human-object interactions and group activities. Public datasets designed for the evaluation of the recognition methodologies are illustrated in our article as well, comparing the methodologies' performances. This review will provide the impetus for future research in more productive areas.

...read moreread less

2,084 citations

Proceedings Article•DOI•

Human activity prediction: Early recognition of ongoing activities from streaming videos

[...]

Michael S. Ryoo¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

06 Nov 2011

TL;DR: The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag- of-words to handle noisy observations, and reliably recognizes ongoing activities from streaming videos with a high accuracy.

...read moreread less

Abstract: In this paper, we present a novel approach of human activity prediction. Human activity prediction is a probabilistic process of inferring ongoing activities from videos only containing onsets (i.e. the beginning part) of the activities. The goal is to enable early recognition of unfinished activities as opposed to the after-the-fact classification of completed activities. Activity prediction methodologies are particularly necessary for surveillance systems which are required to prevent crimes and dangerous activities from occurring. We probabilistically formulate the activity prediction problem, and introduce new methodologies designed for the prediction. We represent an activity as an integral histogram of spatio-temporal features, efficiently modeling how feature distributions change over time. The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag-of-words to handle noisy observations. Our experiments confirm that our approach reliably recognizes ongoing activities from streaming videos with a high accuracy.

...read moreread less

617 citations

Proceedings Article•DOI•

Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities

[...]

Michael S. Ryoo¹, Jake K. Aggarwal²•Institutions (2)

Electronics and Telecommunications Research Institute¹, University of Texas at Austin²

01 Sep 2009

TL;DR: A novel matching, spatio-temporal relationship match, which is designed to measure structural similarity between sets of features extracted from two videos, thereby enabling detection and localization of complex non-periodic activities.

...read moreread less

Abstract: Human activity recognition is a challenging task, especially when its background is unknown or changing, and when scale or illumination differs in each video. Approaches utilizing spatio-temporal local features have proved that they are able to cope with such difficulties, but they mainly focused on classifying short videos of simple periodic actions. In this paper, we present a new activity recognition methodology that overcomes the limitations of the previous approaches using local features. We introduce a novel matching, spatio-temporal relationship match, which is designed to measure structural similarity between sets of features extracted from two videos. Our match hierarchically considers spatio-temporal relationships among feature points, thereby enabling detection and localization of complex non-periodic activities. In contrast to previous approaches to ‘classify’ videos, our approach is designed to ‘detect and localize’ all occurring activities from continuous videos where multiple actors and pedestrians are present. We implement and test our methodology on a newly-introduced dataset containing videos of multiple interacting persons and individual pedestrians. The results confirm that our system is able to recognize complex non-periodic activities (e.g. ‘push’ and ‘hug’) from sets of spatio-temporal features even when multiple activities are present in the scene

...read moreread less

612 citations

Proceedings Article•DOI•

First-Person Activity Recognition: What Are They Doing to Me?

[...]

Michael S. Ryoo¹, Larry Matthies¹•Institutions (1)

California Institute of Technology¹

23 Jun 2013

TL;DR: This paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos.

...read moreread less

Abstract: This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand 'what activity others are performing to it' from continuous video inputs. These include friendly interactions such as 'a person hugging the observer' as well as hostile interactions like 'punching the observer' or 'throwing objects to the observer', whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.

...read moreread less

323 citations

Proceedings Article•DOI•

Recognition of Composite Human Activities through Context-Free Grammar Based Representation

[...]

Michael S. Ryoo¹, Jake K. Aggarwal¹•Institutions (1)

University of Texas at Austin¹

17 Jun 2006

TL;DR: The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push, and the results show that the system is able to represent composite actions and interactions naturally.

...read moreread less

Abstract: This paper describes a general methodology for automated recognition of complex human activities. The methodology uses a context-free grammar (CFG) based representation scheme to represent composite actions and interactions. The CFG-based representation enables us to formally define complex human activities based on simple actions or movements. Human activities are classified into three categories: atomic action, composite action, and interaction. Our system is not only able to represent complex human activities formally, but also able to recognize represented actions and interactions with high accuracy. Image sequences are processed to extract poses and gestures. Based on gestures, the system detects actions and interactions occurring in a sequence of image frames. Our results show that the system is able to represent composite actions and interactions naturally. The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push. The experiments show that the system can recognize sequences of represented composite actions and interactions with a high recognition rate.

...read moreread less

287 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•DOI•

Action Recognition with Improved Trajectories

[...]

Heng Wang¹, Cordelia Schmid¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Dec 2013

TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.

...read moreread less

Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results on four challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.

...read moreread less

3,487 citations

Journal Article•DOI•

A survey of advances in vision-based human motion capture and analysis

[...]

Thomas B. Moeslund¹, Adrian Hilton², Volker Krüger³•Institutions (3)

Aalborg University¹, University of Surrey², Aalborg University – Copenhagen³

01 Nov 2006-Computer Vision and Image Understanding

TL;DR: This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.

...read moreread less

2,738 citations

Proceedings Article•DOI•

Social LSTM: Human Trajectory Prediction in Crowded Spaces

[...]

Alexandre Alahi¹, Kratarth Goel¹, Vignesh Ramanathan¹, Alexandre Robicquet¹, Li Fei-Fei¹, Silvio Savarese¹ - Show less +2 more•Institutions (1)

Stanford University¹

27 Jun 2016

TL;DR: This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.

...read moreread less

Abstract: Pedestrians follow different trajectories to avoid obstacles and accommodate fellow pedestrians. Any autonomous vehicle navigating such a scene should be able to foresee the future positions of pedestrians and accordingly adjust its path to avoid collisions. This problem of trajectory prediction can be viewed as a sequence generation task, where we are interested in predicting the future trajectory of people based on their past positions. Following the recent success of Recurrent Neural Network (RNN) models for sequence prediction tasks, we propose an LSTM model which can learn general human movement and predict their future trajectories. This is in contrast to traditional approaches which use hand-crafted functions such as Social forces. We demonstrate the performance of our method on several public datasets. Our model outperforms state-of-the-art methods on some of these datasets. We also analyze the trajectories predicted by our model to demonstrate the motion behaviour learned by our model.

...read moreread less

2,587 citations

Journal Article•DOI•

A survey on vision-based human action recognition

[...]

Ronald Poppe¹•Institutions (1)

University of Twente¹

01 Jun 2010-Image and Vision Computing

TL;DR: A detailed overview of current advances in vision-based human action recognition is provided, including a discussion of limitations of the state of the art and outline promising directions of research.

...read moreread less

2,282 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse