scispace - formally typeset

Author

Chang Tang

Bio: Chang Tang is an academic researcher from China University of Geosciences (Wuhan). The author has contributed to research in topic(s): Cluster analysis & Graph (abstract data type). The author has an hindex of 25, co-authored 67 publication(s) receiving 2034 citation(s). Previous affiliations of Chang Tang include Information Technology University & Tianjin University.
Papers
More filters

Journal ArticleDOI
Pichao Wang1, Wanqing Li1, Zhimin Gao1, Jing Zhang1  +2 moreInstitutions (2)
TL;DR: The proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions, and the method achieved 2-9% better results on most of the individual datasets.
Abstract: This paper proposes a new method, i.e., weighted hierarchical depth motion maps (WHDMM) + three-channel deep convolutional neural networks (3ConvNets), for human action recognition from depth maps on small training datasets. Three strategies are developed to leverage the capability of ConvNets in mining discriminative features for recognition. First, different viewpoints are mimicked by rotating the 3-D points of the captured depth maps. This not only synthesizes more data, but also makes the trained ConvNets view-tolerant. Second, WHDMMs at several temporal scales are constructed to encode the spatiotemporal motion patterns of actions into 2-D spatial structures. The 2-D spatial structures are further enhanced for recognition by converting the WHDMMs into pseudocolor images. Finally, the three ConvNets are initialized with the models obtained from ImageNet and fine-tuned independently on the color-coded WHDMMs constructed in three orthogonal planes. The proposed algorithm was evaluated on the MSRAction3D, MSRAction3DExt, UTKinect-Action, and MSRDailyActivity3D datasets using cross-subject protocols. In addition, the method was evaluated on the large dataset constructed from the above datasets. The proposed method achieved 2–9% better results on most of the individual datasets. Furthermore, the proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions.

257 citations


Journal ArticleDOI
Jing Zhang1, Wanqing Li1, Philip Ogunbona1, Pichao Wang1  +1 moreInstitutions (2)
Abstract: Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-a-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols. HighlightsA detailed review and in-depth analysis of 44 publicly available RGB-D-based action datasets.Recommendations on the selection of datasets and evaluation protocols for use in future research.Identification of some limitations of these datasets and evaluation protocols.Recommendations on future creation of datasets and use of evaluation protocols.

216 citations


Proceedings ArticleDOI
Pichao Wang1, Wanqing Li1, Zhimin Gao1, Chang Tang1  +2 moreInstitutions (1)
13 Oct 2015
TL;DR: Through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes, achieving state-of-the-art results on these datasets.
Abstract: In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the state-of-the-art results on these datasets.

109 citations


Journal ArticleDOI
Xinwang Liu1, Xinzhong Zhu2, Miaomiao Li, Lei Wang3  +5 moreInstitutions (7)
TL;DR: This work proposes Late Fusion Incomplete Multi-view Clustering (LF-IMVC) which effectively and efficiently integrates the incomplete clustering matrices generated by incomplete views and develops a three-step iterative algorithm to solve the resultant optimization problem with linear computational complexity and theoretically prove its convergence.
Abstract: Incomplete multi-view clustering optimally integrates a group of pre-specified incomplete views to improve clustering performance. Among various excellent solutions, multiple kernel $k$k-means with incomplete kernels forms a benchmark, which redefines the incomplete multi-view clustering as a joint optimization problem where the imputation and clustering are alternatively performed until convergence. However, the comparatively intensive computational and storage complexities preclude it from practical applications. To address these issues, we propose Late Fusion Incomplete Multi-view Clustering (LF-IMVC) which effectively and efficiently integrates the incomplete clustering matrices generated by incomplete views. Specifically, our algorithm jointly learns a consensus clustering matrix, imputes each incomplete base matrix, and optimizes the corresponding permutation matrices. We develop a three-step iterative algorithm to solve the resultant optimization problem with linear computational complexity and theoretically prove its convergence. Further, we conduct comprehensive experiments to study the proposed LF-IMVC in terms of clustering accuracy, running time, advantages of late fusion multi-view clustering, evolution of the learned consensus clustering matrix, parameter sensitivity and convergence. As indicated, our algorithm significantly and consistently outperforms some state-of-the-art algorithms with much less running time and memory.

105 citations


Proceedings ArticleDOI
Pichao Wang1, Wanqing Li1, Zhimin Gao1, Yuyao Zhang1  +2 moreInstitutions (2)
01 Jul 2017
TL;DR: A new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition from RGB-D data and takes better advantage of the trained ConvNets models over ImageNet.
Abstract: Scene flow describes the motion of 3D objects in real world and potentially could be the basis of a good feature for 3D action recognition. However, its use for action recognition, especially in the context of convolutional neural networks (ConvNets), has not been previously studied. In this paper, we propose the extraction and use of scene flow for action recognition from RGB-D data. Previous works have considered the depth and RGB modalities as separate channels and extract features for later fusion. We take a different approach and consider the modalities as one entity, thus allowing feature extraction for action recognition at the beginning. Two key questions about the use of scene flow for action recognition are addressed: how to organize the scene flow vectors and how to represent the long term dynamics of videos based on scene flow. In order to calculate the scene flow correctly on the available datasets, we propose an effective self-calibration method to align the RGB and depth data spatially without knowledge of the camera parameters. Based on the scene flow vectors, we propose a new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition. We adopt a channel transform kernel to transform the scene flow vectors to an optimal color space analogous to RGB. This transformation takes better advantage of the trained ConvNets models over ImageNet. Experimental results indicate that this new representation can surpass the performance of state-of-the-art methods on two large public datasets.

96 citations


Cited by
More filters

Journal ArticleDOI
Jiuxiang Gu1, Zhenhua Wang1, Jason Kuen1, Lianyang Ma1  +7 moreInstitutions (1)
Abstract: We give an overview of the basic components of CNN.We discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.We introduce the applications of CNN on various tasks, including image classification, object detection, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech and natural language processing.We discuss the challenges in CNN and give several future research directions. In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

1,445 citations


Posted Content
TL;DR: This paper details the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation, and introduces various applications of convolutional neural networks in computer vision, speech and natural language processing.
Abstract: In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

1,299 citations


Proceedings ArticleDOI
Amir Shahroudy1, Jun Liu2, Tian-Tsong Ng1, Gang Wang2Institutions (2)
01 Jun 2016
TL;DR: A large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects is introduced and a new recurrent neural network structure is proposed to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification.
Abstract: Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+Dbased action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art handcrafted features on the suggested cross-subject and crossview evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

971 citations


Posted Content
Amir Shahroudy1, Jun Liu2, Tian-Tsong Ng1, Gang Wang2Institutions (2)
Abstract: Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art hand-crafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

939 citations


Journal Article
TL;DR: An independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator, or HSIC, is proposed.
Abstract: We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

891 citations


Network Information
Related Authors (5)
Pichao Wang

89 papers, 3.3K citations

94% related
Xinwang Liu

202 papers, 5.1K citations

85% related
Jianping Yin

64 papers, 1.2K citations

77% related
Philip Ogunbona

182 papers, 4.4K citations

76% related
Wanqing Li

274 papers, 9.2K citations

75% related
Performance
Metrics

Author's H-index: 25

No. of papers from the Author in previous years
YearPapers
20221
20219
202012
201915
201810
20177