scispace - formally typeset
Search or ask a question
Author

Xiangbo Shu

Bio: Xiangbo Shu is an academic researcher from Nanjing University of Science and Technology. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has an hindex of 21, co-authored 65 publications receiving 1572 citations. Previous affiliations of Xiangbo Shu include National University of Singapore & Nanjing University.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
13 Oct 2015
TL;DR: This paper develops a novel deep network structure, capable of transferring labeling information across heterogeneous domains, especially from text domain to image domain, and presents a novel architecture of DTNs to translate cross-domain information from text to image.
Abstract: In recent years, deep networks have been successfully applied to model image concepts and achieved competitive performance on many data sets. In spite of impressive performance, the conventional deep networks can be subjected to the decayed performance if we have insufficient training examples. This problem becomes extremely severe for deep networks with powerful representation structure, making them prone to over fitting by capturing nonessential or noisy information in a small data set. In this paper, to address this challenge, we will develop a novel deep network structure, capable of transferring labeling information across heterogeneous domains, especially from text domain to image domain. This weakly-shared Deep Transfer Networks (DTNs) can adequately mitigate the problem of insufficient image training data by bringing in rich labels from the text domain. Specifically, we present a novel architecture of DTNs to translate cross-domain information from text to image. To share the labels between two domains, we will build multiple weakly shared layers of features. It allows to represent both shared inter-domain features and domain-specific features, making this structure more flexible and powerful in capturing complex data of different domains jointly than the strongly shared layers. Experiments on real world dataset will show its competitive performance as compared with the other state-of-the-art methods.

202 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A recurrent face aging (RFA) framework based on a recurrent neural network which can identify the ages of people from 0 to 80 is introduced and demonstrates the proposed RFA provides better aging faces over other state-of-the-art age progression methods.
Abstract: Modeling the aging process of human face is important for cross-age face verification and recognition. In this paper, we introduce a recurrent face aging (RFA) framework based on a recurrent neural network which can identify the ages of people from 0 to 80. Due to the lack of labeled face data of the same person captured in a long range of ages, traditional face aging models usually split the ages into discrete groups and learn a one-step face feature transformation for each pair of adjacent age groups. However, those methods neglect the in-between evolving states between the adjacent age groups and the synthesized faces often suffer from severe ghosting artifacts. Since human face aging is a smooth progression, it is more appropriate to age the face by going through smooth transition states. In this way, the ghosting artifacts can be effectively eliminated and the intermediate aged faces between two discrete age groups can also be obtained. Towards this target, we employ a twolayer gated recurrent unit as the basic recurrent module whose bottom layer encodes a young face to a latent representation and the top layer decodes the representation to a corresponding older face. The experimental results demonstrate our proposed RFA provides better aging faces over other state-of-the-art age progression methods.

202 citations

Journal ArticleDOI
TL;DR: This work proposes a novel Coherence Constrained Graph LSTM (CCG-LSTM) with STCC and GCC to effectively recognize group activity, by modeling the relevant motions of individuals while suppressing the irrelevant motions.
Abstract: This work aims to address the group activity recognition problem by exploring human motion characteristics. Traditional methods hold that the motions of all persons contribute equally to the group activity, which suppresses the contributions of some relevant motions to the whole activity while overstates some irrelevant motions. To handle this problem, we present a Spatio-Temporal Context Coherence (STCC) constraint and a Global Context Coherence (GCC) constraint to capture the relevant motions and quantify their contributions to the group activity, respectively. Based on this, we propose a novel Coherence Constrained Graph LSTM (CCG-LSTM) with STCC and GCC to effectively recognize group activity, by modeling the relevant motions of individuals while suppressing the irrelevant motions. Specifically, to capture the relevant motions, we build the CCG-LSTM with a temporal confidence gate and a spatial confidence gate to control the memory state updating in terms of the temporally previous state and the spatially neighboring states, respectively. Besides, an attention mechanism is employed to quantify the contribution of a certain motion by measuring the consistency between itself and the whole activity at each time step. Finally, we conduct experiments on two widely-used datasets to illustrate the effectiveness of the proposed CCG-LSTM compared with the state-of-the-arts methods.

140 citations

Journal ArticleDOI
18 Nov 2016
TL;DR: A novel generalized deep transfer networks (DTNs) capable of transferring label information across heterogeneous domains, textual domain to visual domain, and to share the labels between two domains are proposed, able to generate domain-specific and shared interdomain features.
Abstract: In recent years, deep neural networks have been successfully applied to model visual concepts and have achieved competitive performance on many tasks. Despite their impressive performance, traditional deep networks are subjected to the decayed performance under the condition of lacking sufficient training data. This problem becomes extremely severe for deep networks trained on a very small dataset, making them overfitting by capturing nonessential or noisy information in the training set. Toward this end, we propose a novel generalized deep transfer networks (DTNs), capable of transferring label information across heterogeneous domains, textual domain to visual domain. The proposed framework has the ability to adequately mitigate the problem of insufficient training images by bringing in rich labels from the textual domain. Specifically, to share the labels between two domains, we build parameter- and representation-shared layers. They are able to generate domain-specific and shared interdomain features, making this architecture flexible and powerful in capturing complex information from different domains jointly. To evaluate the proposed method, we release a new dataset extended from NUS-WIDE at http://imag.njust.edu.cn/NUS-WIDE-128.html. Experimental results on this dataset show the superior performance of the proposed DTNs compared to existing state-of-the-art methods.

137 citations

Journal ArticleDOI
TL;DR: A novel Hierarchical Long Short-Term Concurrent Memory (H-LSTCM) is proposed to model the long-term inter-related dynamics among a group of persons for recognizing human interactions by comparing against baseline and state-of-the-art methods.
Abstract: In this work, we aim to address the problem of human interaction recognition in videos by exploring the long-term inter-related dynamics among multiple persons. Recently, Long Short-Term Memory (LSTM) has become a popular choice to model individual dynamic for single-person action recognition due to its ability to capture the temporal motion information in a range. However, most existing LSTM-based methods focus only on capturing the dynamics of human interaction by simply combining all dynamics of individuals or modeling them as a whole. Such methods neglect the inter-related dynamics of how human interactions change over time. To this end, we propose a novel Hierarchical Long Short-Term Concurrent Memory (H-LSTCM) to model the long-term inter-related dynamics among a group of persons for recognizing human interactions. Specifically, we first feed each person's static features into a Single-Person LSTM to model the single-person dynamic. Subsequently, at one time step, the outputs of all Single-Person LSTM units are fed into a novel Concurrent LSTM (Co-LSTM) unit, which mainly consists of multiple sub-memory units, a new cell gate, and a new co-memory cell. In the Co-LSTM unit, each sub-memory unit stores individual motion information, while this Co-LSTM unit selectively integrates and stores inter-related motion information between multiple interacting persons from multiple sub-memory units via the cell gate and co-memory cell, respectively. Extensive experiments on several public datasets validate the effectiveness of the proposed H-LSTCM by comparing against baseline and state-of-the-art methods.

131 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Deep domain adaptation has emerged as a new learning technique to address the lack of massive amounts of labeled data as discussed by the authors, which leverages deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning.

1,211 citations

Proceedings ArticleDOI
27 Feb 2017
TL;DR: In this article, a conditional adversarial autoencoder (CAAE) is proposed to learn a face manifold, traversing on which smooth age progression and regression can be realized simultaneously.
Abstract: If I provide you a face image of mine (without telling you the actual age when I took the picture) and a large amount of face images that I crawled (containing labeled faces of different ages but not necessarily paired), can you show me what I would look like when I am 80 or what I was like when I was 5? The answer is probably a No. Most existing face aging works attempt to learn the transformation between age groups and thus would require the paired samples as well as the labeled query image. In this paper, we look at the problem from a generative modeling perspective such that no paired samples is required. In addition, given an unlabeled image, the generative model can directly produce the image with desired age attribute. We propose a conditional adversarial autoencoder (CAAE) that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator. The latent vector preserves personalized face features (i.e., personality) and the age condition controls progression vs. regression. Two adversarial networks are imposed on the encoder and generator, respectively, forcing to generate more photo-realistic faces. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.

766 citations

01 Jan 1983
TL;DR: The neocognitron recognizes stimulus patterns correctly without being affected by shifts in position or even by considerable distortions in shape of the stimulus patterns.
Abstract: Suggested by the structure of the visual nervous system, a new algorithm is proposed for pattern recognition. This algorithm can be realized with a multilayered network consisting of neuron-like cells. The network, “neocognitron”, is self-organized by unsupervised learning, and acquires the ability to recognize stimulus patterns according to the differences in their shapes: Any patterns which we human beings judge to be alike are also judged to be of the same category by the neocognitron. The neocognitron recognizes stimulus patterns correctly without being affected by shifts in position or even by considerable distortions in shape of the stimulus patterns.

649 citations

Proceedings ArticleDOI
26 Jul 2017
TL;DR: This paper presents the first, to the best of knowledge, manually collected "in-the-wild" age database, dubbed AgeDB, containing images annotated with accurate to the year, noise-free labels, which renders AgeDB suitable when performing experiments on age-invariant face verification, age estimation and face age progression "in the wild".
Abstract: Over the last few years, increased interest has arisen with respect to age-related tasks in the Computer Vision community. As a result, several "in-the-wild" databases annotated with respect to the age attribute became available in the literature. Nevertheless, one major drawback of these databases is that they are semi-automatically collected and annotated and thus they contain noisy labels. Therefore, the algorithms that are evaluated in such databases are prone to noisy estimates. In order to overcome such drawbacks, we present in this paper the first, to the best of knowledge, manually collected "in-the-wild" age database, dubbed AgeDB, containing images annotated with accurate to the year, noise-free labels. As demonstrated by a series of experiments utilizing state-of-the-art algorithms, this unique property renders AgeDB suitable when performing experiments on age-invariant face verification, age estimation and face age progression "in-the-wild".

520 citations