Home
/
Authors
/
Amir Shahroudy

Author

Amir Shahroudy

Bio: Amir Shahroudy is an academic researcher from Nanyang Technological University. The author has contributed to research in topics: Feature extraction & Deep learning. The author has an hindex of 4, co-authored 5 publications receiving 1510 citations.

Papers

PDF

Open Access

More filters

Posted Content•

Recent Advances in Convolutional Neural Networks

[...]

Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen - Show less +8 more

22 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper details the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation, and introduces various applications of convolutional neural networks in computer vision, speech and natural language processing.

...read moreread less

Abstract: In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

...read moreread less

1,302 citations

Journal Article•DOI•

Multimodal Multipart Learning for Action Recognition in Depth Videos

[...]

Amir Shahroudy¹, Tian-Tsong Ng², Qingxiong Yang³, Gang Wang¹•Institutions (3)

Nanyang Technological University¹, Institute for Infocomm Research Singapore², City University of Hong Kong³

01 Oct 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts, to represent dynamics and appearance of parts.

...read moreread less

Abstract: The articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on these partial descriptors. We propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy.

...read moreread less

128 citations

Proceedings Article•DOI•

SSNet: Scale Selection Network for Online 3D Action Prediction

[...]

Jun Liu¹, Amir Shahroudy¹, Gang Wang², Ling-Yu Duan³, Alex C. Kot¹ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Alibaba Group², Peking University³

18 Jun 2018

TL;DR: This paper focuses on online action prediction in streaming 3D skeleton sequences and proposes a novel window scale selection scheme to make the network focus on the performed part of the ongoing action and try to suppress the noise from the previous actions at each time step.

...read moreread less

Abstract: In action prediction (early action recognition), the goal is to predict the class label of an ongoing action using its observed part so far. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the time axis. As there are significant temporal scale variations of the observed part of the ongoing action at different progress levels, we propose a novel window scale selection scheme to make our network focus on the performed part of the ongoing action and try to suppress the noise from the previous actions at each time step. Furthermore, an activation sharing scheme is proposed to deal with the overlapping computations among the adjacent steps, which allows our model to run more efficiently. The extensive experiments on two challenging datasets show the effectiveness of the proposed action prediction framework.

...read moreread less

61 citations

Proceedings Article•DOI•

Multi-modal feature fusion for action recognition in RGB-D sequences

[...]

Amir Shahroudy¹, Gang Wang¹, Tian-Tsong Ng²•Institutions (2)

Nanyang Technological University¹, Institute for Infocomm Research Singapore²

21 May 2014

TL;DR: This paper proposed a new hierarchical bag-of-words feature fusion technique based on multi-view structured spar-sity learning to fuse atomic features from RGB and skeletons for the task of action recognition.

...read moreread less

Abstract: Microsoft Kinect's output is a multi-modal signal which gives RGB videos, depth sequences and skeleton information simultaneously. Various action recognition techniques focused on different single modalities of the signals and built their classifiers over the features extracted from one of these channels. For better recognition performance, it's desirable to fuse these multi-modal information into an integrated set of discriminative features. Most of current fusion methods merged heterogeneous features in a holistic manner and ignored the complementary properties of these modalities in finer levels. In this paper, we proposed a new hierarchical bag-of-words feature fusion technique based on multi-view structured spar-sity learning to fuse atomic features from RGB and skeletons for the task of action recognition.

...read moreread less

59 citations

Dissertation•DOI•

Activity recognition in depth videos

[...]

Amir Shahroudy

01 Jan 2016

TL;DR: This thesis proposes a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts, and proposes a new deep autoencoder-based correlation-independence factorization network to separate input multi-modality signals into a hierarchy of extracted components.

...read moreread less

Abstract: Introduction of depth sensors made a big impact on research in visual recognition. By providing 3D information, these cameras help us to have a view-invariant and robust representation of the observed scenes and human bodies. Detection and 3D localization of human body parts are done more accurately and more efficiently in depth maps in comparison with RGB counterparts. Having the 3D structure of the body parts, the articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on the partial descriptors. As the first work in this thesis, we propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy. In addition to depth based representation of human actions, commonly used 3D sensors also provide RGB videos. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In the second work, we propose a new deep autoencoder-based correlation-independence factorization network to separate input multimodal signals into a hierarchy of extracted components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis

...read moreread less

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A survey on deep learning in medical image analysis

[...]

Geert Litjens¹, Thijs Kooi¹, Babak Ehteshami Bejnordi¹, Arnaud Arindra Adiyoso Setio¹, Francesco Ciompi¹, Mohsen Ghafoorian¹, Jeroen van der Laak¹, Bram van Ginneken¹, Clara I. Sánchez¹ - Show less +5 more•Institutions (1)

Radboud University Nijmegen¹

01 Dec 2017-Medical Image Analysis

TL;DR: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year, to survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks.

...read moreread less

8,730 citations

Journal Article•DOI•

Deep convolutional neural networks for image classification: A comprehensive review

[...]

Waseem Rawat¹, Zenghui Wang¹•Institutions (1)

University of South Africa¹

01 Sep 2017-Neural Computation

TL;DR: This review, which focuses on the application of CNNs to image classification tasks, covers their development, from their predecessors up to recent state-of-the-art deep learning systems.

...read moreread less

Abstract: Convolutional neural networks CNNs have been applied to visual tasks since the late 1980s. However, despite a few scattered applications, they were dormant until the mid-2000s when developments in computing power and the advent of large amounts of labeled data, supplemented by improved algorithms, contributed to their advancement and brought them to the forefront of a neural network renaissance that has seen rapid progression since 2012. In this review, which focuses on the application of CNNs to image classification tasks, we cover their development, from their predecessors up to recent state-of-the-art deep learning systems. Along the way, we analyze 1 their early successes, 2 their role in the deep learning renaissance, 3 selected symbolic works that have contributed to their recent popularity, and 4 several improvement attempts by reviewing contributions and challenges of over 300 publications. We also introduce some of their current trends and remaining challenges.

...read moreread less

2,366 citations

Journal Article•DOI•

Deep Learning for Generic Object Detection: A Survey

[...]

Li Liu¹, Li Liu², Wanli Ouyang³, Xiaogang Wang⁴, Paul Fieguth⁵, Jie Chen¹, Xinwang Liu², Matti Pietikäinen¹ - Show less +4 more•Institutions (5)

University of Oulu¹, National University of Defense Technology², University of Sydney³, The Chinese University of Hong Kong⁴, University of Waterloo⁵

01 Feb 2020-International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

...read moreread less

1,897 citations

Journal Article•DOI•

A systematic study of the class imbalance problem in convolutional neural networks

[...]

Mateusz Buda¹, Atsuto Maki², Maciej A. Mazurowski¹•Institutions (2)

Duke University¹, Royal Institute of Technology²

01 Oct 2018-Neural Networks

TL;DR: The effect of class imbalance on classification performance is detrimental; the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; and thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.

...read moreread less

1,777 citations

Posted Content•

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

[...]

Amir Shahroudy¹, Jun Liu², Tian-Tsong Ng¹, Gang Wang²•Institutions (2)

Institute for Infocomm Research Singapore¹, Nanyang Technological University²

11 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a large-scale dataset for RGB+D human action recognition was introduced with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects.

...read moreread less

Abstract: Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art hand-crafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

...read moreread less

1,448 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse