Home
/
Authors
/
Chenyang Si

Author

Chenyang Si

Bio: Chenyang Si is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Computer science & Discriminative model. The author has an hindex of 9, co-authored 19 publications receiving 807 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

[...]

Chenyang Si¹, Wentao Chen, Wei Wang¹, Liang Wang, Tieniu Tan - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

15 Jun 2019

TL;DR: Zhang et al. as mentioned in this paper proposed an attention enhanced graph convolutional LSTM network (AGC-LSTM) for human action recognition from skeleton data, which can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains.

...read moreread less

Abstract: Skeleton-based action recognition is an important task that requires the adequate understanding of movement characteristics of a human action from the given skeleton sequence. Recent studies have shown that exploring spatial and temporal features of the skeleton sequence is vital for this task. Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem. In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data. The proposed AGC-LSTM can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains. We also present a temporal hierarchical architecture to increase temporal receptive fields of the top AGC-LSTM layer, which boosts the ability to learn the high-level semantic representation and significantly reduces the computation cost. Furthermore, to select discriminative spatial information, the attention mechanism is employed to enhance information of key joints in each AGC-LSTM layer. Experimental results on two datasets are provided: NTU RGB+D dataset and Northwestern-UCLA dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.

...read moreread less

435 citations

Posted Content•

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

[...]

Chenyang Si¹, Wentao Chen, Wei Wang¹, Liang Wang, Tieniu Tan - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

25 Feb 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains.

...read moreread less

Abstract: Skeleton-based action recognition is an important task that requires the adequate understanding of movement characteristics of a human action from the given skeleton sequence. Recent studies have shown that exploring spatial and temporal features of the skeleton sequence is vital for this task. Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem. In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data. The proposed AGC-LSTM can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains. We also present a temporal hierarchical architecture to increases temporal receptive fields of the top AGC-LSTM layer, which boosts the ability to learn the high-level semantic representation and significantly reduces the computation cost. Furthermore, to select discriminative spatial information, the attention mechanism is employed to enhance information of key joints in each AGC-LSTM layer. Experimental results on two datasets are provided: NTU RGB+D dataset and Northwestern-UCLA dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.

...read moreread less

382 citations

Book Chapter•DOI•

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

[...]

Chenyang Si¹, Ya Jing¹, Wei Wang¹, Liang Wang¹, Tieniu Tan¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

08 Sep 2018

TL;DR: A novel model with spatial reasoning and temporal stack learning (SR-TSL) for skeleton-based action recognition, which consists of a spatial reasoning network (SRN) and a temporal stacklearning network (TSLN).

...read moreread less

Abstract: Skeleton-based action recognition has made great progress recently, but many problems still remain unsolved. For example, the representations of skeleton sequences captured by most of the previous methods lack spatial structure information and detailed temporal dynamics features. In this paper, we propose a novel model with spatial reasoning and temporal stack learning (SR-TSL) for skeleton-based action recognition, which consists of a spatial reasoning network (SRN) and a temporal stack learning network (TSLN). The SRN can capture the high-level spatial structural information within each frame by a residual graph neural network, while the TSLN can model the detailed temporal dynamics of skeleton sequences by a composition of multiple skip-clip LSTMs. During training, we propose a clip-based incremental loss to optimize the model. We perform extensive experiments on the SYSU 3D Human-Object Interaction dataset and NTU RGB+D dataset and verify the effectiveness of each network of our model. The comparison results illustrate that our approach achieves much better results than the state-of-the-art methods.

...read moreread less

307 citations

Proceedings Article•DOI•

Multistage Adversarial Losses for Pose-Based Human Image Synthesis

[...]

Chenyang Si¹, Wei Wang¹, Liang Wang¹, Tieniu Tan¹•Institutions (1)

Chinese Academy of Sciences¹

18 Jun 2018

TL;DR: This paper proposes a pose-based human image synthesis method which can keep the human posture unchanged in novel viewpoints and adopt multistage adversarial losses separately for the foreground and background generation, which fully exploits the multi-modal characteristics of generative loss to generate more realistic looking images.

...read moreread less

Abstract: Human image synthesis has extensive practical applications e.g. person re-identification and data augmentation for human pose estimation. However, it is much more challenging than rigid object synthesis, e.g. cars and chairs, due to the variability of human posture. In this paper, we propose a pose-based human image synthesis method which can keep the human posture unchanged in novel viewpoints. Furthermore, we adopt multistage adversarial losses separately for the foreground and background generation, which fully exploits the multi-modal characteristics of generative loss to generate more realistic looking images. We perform extensive experiments on the Human3.6M dataset and verify the effectiveness of each stage of our method. The generated human images not only keep the same pose as the input image, but also have clear detailed foreground and background. The quantitative comparison results illustrate that our approach achieves much better results than several state-of-the-art methods.

...read moreread less

83 citations

Journal Article•DOI•

Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search

[...]

Ya Jing¹, Chenyang Si¹, Junbo Wang¹, Wei Wang¹, Liang Wang¹, Tieniu Tan¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

03 Apr 2020

TL;DR: Zhang et al. as discussed by the authors proposed a pose-guided multi-granularity attention network (PMA) to exploit the multilevel corresponding visual contents, which employs pose information to learn latent semantic alignment between visual body part and textual noun phrase.

...read moreread less

Abstract: Text-based person search aims to retrieve the corresponding person images in an image database by virtue of a describing sentence about the person, which poses great potential for various applications such as video surveillance. Extracting visual contents corresponding to the human description is the key to this cross-modal matching problem. Moreover, correlated images and descriptions involve different granularities of semantic relevance, which is usually ignored in previous methods. To exploit the multilevel corresponding visual contents, we propose a pose-guided multi-granularity attention network (PMA). Firstly, we propose a coarse alignment network (CA) to select the related image regions to the global description by a similarity-based attention. To further capture the phrase-related visual body part, a fine-grained alignment network (FA) is proposed, which employs pose information to learn latent semantic alignment between visual body part and textual noun phrase. To verify the effectiveness of our model, we perform extensive experiments on the CUHK Person Description Dataset (CUHK-PEDES) which is currently the only available dataset for text-based person search. Experimental results show that our approach outperforms the state-of-the-art methods by 15 % in terms of the top-1 metric.

...read moreread less

63 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Skeleton-Based Action Recognition With Shift Graph Convolutional Network

[...]

Ke Cheng¹, Yifan Zhang¹, Xiangyu He¹, Weihan Chen¹, Jian Cheng¹, Hanqing Lu¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

14 Jun 2020

TL;DR: The proposed Shift-GCN notably exceeds the state-of-the-art methods with more than 10 times less computational complexity, and is composed of novel shift graph operations and lightweight point-wise convolutions.

...read moreread less

Abstract: Action recognition with skeleton data is attracting more attention in computer vision. Recently, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have obtained remarkable performance. However, the computational complexity of GCN-based methods are pretty heavy, typically over 15 GFLOPs for one action sample. Recent works even reach about 100 GFLOPs. Another shortcoming is that the receptive fields of both spatial graph and temporal graph are inflexible. Although some works enhance the expressiveness of spatial graph by introducing incremental adaptive modules, their performance is still limited by regular GCN structures. In this paper, we propose a novel shift graph convolutional network (Shift-GCN) to overcome both shortcomings. Instead of using heavy regular graph convolutions, our Shift-GCN is composed of novel shift graph operations and lightweight point-wise convolutions, where the shift graph operations provide flexible receptive fields for both spatial graph and temporal graph. On three datasets for skeleton-based action recognition, the proposed Shift-GCN notably exceeds the state-of-the-art methods with more than 10 times less computational complexity.

...read moreread less

467 citations

Proceedings Article•DOI•

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

[...]

Chenyang Si¹, Wentao Chen, Wei Wang¹, Liang Wang, Tieniu Tan - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

15 Jun 2019

...read moreread less

435 citations

Journal Article•DOI•

A review on the long short-term memory model

[...]

Greg Van Houdt¹, Carlos Mosquera², Gonzalo Nápoles³, Gonzalo Nápoles¹•Institutions (3)

University of Hasselt¹, Vrije Universiteit Brussel², Tilburg University³

01 Dec 2020-Artificial Intelligence Review

TL;DR: A comprehensive review of LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example are presented.

...read moreread less

Abstract: Long short-term memory (LSTM) has transformed both machine learning and neurocomputing fields. According to several online sources, this model has improved Google’s speech recognition, greatly improved machine translations on Google Translate, and the answers of Amazon’s Alexa. This neural system is also employed by Facebook, reaching over 4 billion LSTM-based translations per day as of 2017. Interestingly, recurrent neural networks had shown a rather discrete performance until LSTM showed up. One reason for the success of this recurrent network lies in its ability to handle the exploding/vanishing gradient problem, which stands as a difficult issue to be circumvented when training recurrent or very deep neural networks. In this paper, we present a comprehensive review that covers LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example.

...read moreread less

412 citations

Posted Content•

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

[...]

Chenyang Si¹, Wentao Chen, Wei Wang¹, Liang Wang, Tieniu Tan - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

25 Feb 2019-arXiv: Computer Vision and Pattern Recognition

...read moreread less

Abstract: Skeleton-based action recognition is an important task that requires the adequate understanding of movement characteristics of a human action from the given skeleton sequence. Recent studies have shown that exploring spatial and temporal features of the skeleton sequence is vital for this task. Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem. In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data. The proposed AGC-LSTM can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains. We also present a temporal hierarchical architecture to increases temporal receptive fields of the top AGC-LSTM layer, which boosts the ability to learn the high-level semantic representation and significantly reduces the computation cost. Furthermore, to select discriminative spatial information, the attention mechanism is employed to enhance information of key joints in each AGC-LSTM layer. Experimental results on two datasets are provided: NTU RGB+D dataset and Northwestern-UCLA dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.

...read moreread less

382 citations

Proceedings Article•DOI•

STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction

[...]

Huang Yingfan¹, Huikun Bi¹, Zhaoxin Li¹, Tianlu Mao¹, Zhaoqi Wang¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Oct 2019

TL;DR: This work proposes a Spatial-Temporal Graph Attention network (STGAT), based on a sequence-to-sequence architecture to predict future trajectories of pedestrians, which achieves superior performance on two publicly available crowd datasets and produces more "socially" plausible trajectories for pedestrians.

...read moreread less

Abstract: Human trajectory prediction is challenging and critical in various applications (e.g., autonomous vehicles and social robots). Because of the continuity and foresight of the pedestrian movements, the moving pedestrians in crowded spaces will consider both spatial and temporal interactions to avoid future collisions. However, most of the existing methods ignore the temporal correlations of interactions with other pedestrians involved in a scene. In this work, we propose a Spatial-Temporal Graph Attention network (STGAT), based on a sequence-to-sequence architecture to predict future trajectories of pedestrians. Besides the spatial interactions captured by the graph attention mechanism at each time-step, we adopt an extra LSTM to encode the temporal correlations of interactions. Through comparisons with state-of-the-art methods, our model achieves superior performance on two publicly available crowd datasets (ETH and UCY) and produces more "socially" plausible trajectories for pedestrians.

...read moreread less

369 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse