scispace - formally typeset
Search or ask a question

Showing papers by "Guo-Jun Qi published in 2015"


Proceedings ArticleDOI
10 Aug 2015
TL;DR: It is demonstrated that the rich content and linkage information in a heterogeneous network can be captured by a multi-resolution deep embedding function, so that similarities among cross-modal data can be measured directly in a common embedding space.
Abstract: Data embedding is used in many machine learning applications to create low-dimensional feature representations, which preserves the structure of data points in their original space. In this paper, we examine the scenario of a heterogeneous network with nodes and content of various types. Such networks are notoriously difficult to mine because of the bewildering combination of heterogeneous contents and structures. The creation of a multidimensional embedding of such data opens the door to the use of a wide variety of off-the-shelf mining techniques for multidimensional data. Despite the importance of this problem, limited efforts have been made on embedding a network of scalable, dynamic and heterogeneous data. In such cases, both the content and linkage structure provide important cues for creating a unified feature representation of the underlying network. In this paper, we design a deep embedding algorithm for networked data. A highly nonlinear multi-layered embedding function is used to capture the complex interactions between the heterogeneous data in a network. Our goal is to create a multi-resolution deep embedding function, that reflects both the local and global network structures, and makes the resulting embedding useful for a variety of data mining tasks. In particular, we demonstrate that the rich content and linkage information in a heterogeneous network can be captured by such an approach, so that similarities among cross-modal data can be measured directly in a common embedding space. Once this goal has been achieved, a wide variety of data mining problems can be solved by applying off-the-shelf algorithms designed for handling vector representations. Our experiments on real-world network datasets show the effectiveness and scalability of the proposed algorithm as compared to the state-of-the-art embedding methods.

594 citations


Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this article, a differential recurrent neural network (dRNN) is proposed to learn complex time-series representations via high-order derivatives of states, where the change in information gain caused by the salient motions between successive frames is quantified by Derivative of States (DoS), and thus the proposed LSTM model is termed as differential RNN.
Abstract: The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. It has the potential to model any time-series or sequential data, where the current hidden state has to be considered in the context of the past hidden states. This property makes LSTM an ideal choice to learn the complex dynamics of various actions. Unfortunately, the conventional LSTMs do not consider the impact of spatio-temporal dynamics corresponding to the given salient motion patterns, when they gate the information that ought to be memorized through time. To address this problem, we propose a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames. This change in information gain is quantified by Derivative of States (DoS), and thus the proposed LSTM model is termed as differential Recurrent Neural Network (dRNN). We demonstrate the effectiveness of the proposed model by automatically recognizing actions from the real-world 2D and 3D human action datasets. Our study is one of the first works towards demonstrating the potential of learning complex time-series representations via high-order derivatives of states.

437 citations


Proceedings ArticleDOI
13 Oct 2015
TL;DR: This paper develops a novel deep network structure, capable of transferring labeling information across heterogeneous domains, especially from text domain to image domain, and presents a novel architecture of DTNs to translate cross-domain information from text to image.
Abstract: In recent years, deep networks have been successfully applied to model image concepts and achieved competitive performance on many data sets. In spite of impressive performance, the conventional deep networks can be subjected to the decayed performance if we have insufficient training examples. This problem becomes extremely severe for deep networks with powerful representation structure, making them prone to over fitting by capturing nonessential or noisy information in a small data set. In this paper, to address this challenge, we will develop a novel deep network structure, capable of transferring labeling information across heterogeneous domains, especially from text domain to image domain. This weakly-shared Deep Transfer Networks (DTNs) can adequately mitigate the problem of insufficient image training data by bringing in rich labels from the text domain. Specifically, we present a novel architecture of DTNs to translate cross-domain information from text to image. To share the labels between two domains, we will build multiple weakly shared layers of features. It allows to represent both shared inter-domain features and domain-specific features, making this structure more flexible and powerful in capturing complex data of different domains jointly than the strongly shared layers. Experiments on real world dataset will show its competitive performance as compared with the other state-of-the-art methods.

202 citations


Posted Content
TL;DR: This study proposes a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames, and thus the model is termed as differential Recurrent Neural Network (dRNN).
Abstract: The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. It has the potential to model any sequential time-series data, where the current hidden state has to be considered in the context of the past hidden states. This property makes LSTM an ideal choice to learn the complex dynamics of various actions. Unfortunately, the conventional LSTMs do not consider the impact of spatio-temporal dynamics corresponding to the given salient motion patterns, when they gate the information that ought to be memorized through time. To address this problem, we propose a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames. This change in information gain is quantified by Derivative of States (DoS), and thus the proposed LSTM model is termed as differential Recurrent Neural Network (dRNN). We demonstrate the effectiveness of the proposed model by automatically recognizing actions from the real-world 2D and 3D human action datasets. Our study is one of the first works towards demonstrating the potential of learning complex time-series representations via high-order derivatives of states.

119 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: Sparse composite quantization is developed, which constructs sparse dictionaries and the benefit is that the distance evaluation between the query and the dictionary element (a sparse vector) is accelerated using the efficient sparse vector operation, and thus the cost of distance table computation is reduced a lot.
Abstract: The quantization techniques have shown competitive performance in approximate nearest neighbor search. The state-of-the-art algorithm, composite quantization, takes advantage of the compositionabity, i.e., the vector approximation accuracy, as opposed to product quantization and Cartesian k-means. However, we have observed that the runtime cost of computing the distance table in composite quantization, which is used as a lookup table for fast distance computation, becomes nonnegligible in real applications, e.g., reordering the candidates retrieved from the inverted index when handling very large scale databases. To address this problem, we develop a novel approach, called sparse composite quantization, which constructs sparse dictionaries. The benefit is that the distance evaluation between the query and the dictionary element (a sparse vector) is accelerated using the efficient sparse vector operation, and thus the cost of distance table computation is reduced a lot. Experiment results on large scale ANN retrieval tasks (1M SIFTs and 1B SIFTs) and applications to object retrieval show that the proposed approach yields competitive performance: superior search accuracy to product quantization and Cartesian k-means with almost the same computing cost, and much faster ANN search than composite quantization with the same level of accuracy.

88 citations


Proceedings ArticleDOI
22 Jun 2015
TL;DR: This work proposes a temporal order-preserving dynamic quantization method to extract the most discriminative patterns of the action sequence and presents a multimodal feature fusion method that can be derived in this dynamic quantification framework to exploit different discrim inative capability of features from multiple modalities.
Abstract: Recent commodity depth cameras have been widely used in the applications of video games, business, surveillance and have dramatically changed the way of human-computer interaction. They provide rich multimodal information that can be used to interpret the human-centric environment. However, it is still of great challenge to model the temporal dynamics of the human actions and great potential can be exploited to further enhance the retrieval accuracy by adequately modeling the patterns of these actions. To address this challenge, we propose a temporal order-preserving dynamic quantization method to extract the most discriminative patterns of the action sequence. We further present a multimodal feature fusion method that can be derived in this dynamic quantization framework to exploit different discriminative capability of features from multiple modalities. Experiments based on three public human action datasets show that the proposed technique has achieved state-of-the-art performance.

34 citations


Journal ArticleDOI
TL;DR: In this paper, the first direct synthesis of CZTS nanocrystals in a formamide solvent system without using long hydrocarbon chain organic ligands is reported, which can be used to solve the problem of forming dense thin-films from loose nanocrystal films.
Abstract: The first direct synthesis of CZTS nanocrystals in a formamide solvent system without using long hydrocarbon chain organic ligands is reported. The kesterite CZTS nanocrystals possess a mean size of 5.2 ± 1.2 nm. No secondary phases have been detected within the known limitations of XRD and Raman measurements. Experimental evidence suggests that excess S2− is present on the surface of the nanocrystals, accounting for their dispersibility in polar solvents. The nanocrystals also exhibit a smaller weight loss of 8.7% at 500 °C compared to 24.4% for those capped by oleylamine. A description of the formation of CZTS FA nanocrystals and the role of formamide during synthesis is proposed. Annealing of spin-coated nanocrystal thin-films highlighted the difficulty of forming dense films from loose nanocrystal films. This work shows that this can be overcome using compaction with a combination of a reasonably soft metal and silicone. A means to compact the film uniformly on a centimeter scale with reduced delamination is thus demonstrated. Annealed compacted films possess crystal grains with a favorable size on the order of microns. More significantly, a large-grain layer is formed without an unwanted residual fine-grain underlayer. The absence of a fine-grain underlayer shows that this ligand exchange-free strategy is effective in resolving a key challenge associated with the nanocrystal approach of making CZTS thin-films while simultaneously being low-cost and having a smaller environmental footprint. The strategy presented here is equally applicable to other nanocrystal approaches requiring the synthesis of dense thin-films from nanocrystal films.

27 citations


Proceedings ArticleDOI
10 Aug 2015
TL;DR: A dynamically programmed layer that is critical in determining the alignment between the neuronal activations of pair-wise combinations of neurons is introduced that exploits the spatial and temporal nature of the neuronal activation data.
Abstract: This paper explores the idea of using deep neural network architecture with dynamically programmed layers for brain connectome prediction problem. Understanding the brain connectome structure is a very interesting and a challenging problem. It is critical in the research for epilepsy and other neuropathological diseases. We introduce a new deep learning architecture that exploits the spatial and temporal nature of the neuronal activation data. The architecture consists of a combination of Convolutional layer and a Recurrent layer for predicting the connectome of neurons based on their time-series of activation data. The key contribution of this paper is a dynamically programmed layer that is critical in determining the alignment between the neuronal activations of pair-wise combinations of neurons.

15 citations


Posted Content
TL;DR: This paper proposes a decentralized recommender system by formulating the popular collaborative filleting model into a decentralized matrix completion form over a set of users, and demonstrates that the decentralized algorithm can gain a competitive performance to others.
Abstract: This paper proposes a decentralized recommender system by formulating the popular collaborative filleting (CF) model into a decentralized matrix completion form over a set of users. In such a way, data storages and computations are fully distributed. Each user could exchange limited information with its local neighborhood, and thus it avoids the centralized fusion. Advantages of the proposed system include a protection on user privacy, as well as better scalability and robustness. We compare our proposed algorithm with several state-of-the-art algorithms on the FlickerUserFavor dataset, and demonstrate that the decentralized algorithm can gain a competitive performance to others.

11 citations


Proceedings ArticleDOI
10 Aug 2015
TL;DR: A novel dynamic prediction model is developed that uses the notion of state-stacked sparseness to select a subset of the most critical sensors as a function of evolving system state.
Abstract: An important problem in large-scale sensor mining is that of selecting relevant sensors for prediction purposes. Selecting small subsets of sensors, also referred to as active sensors, often leads to lower operational costs, and it reduces the noise and information overload for prediction. Existing sensor selection and prediction models either select a set of sensors a priori, or they use adaptive algorithms to determine the most relevant sensors for prediction. Sensor data sets often show dynamically varying patterns, because of which it is suboptimal to select a fixed subset of active sensors. To address this problem, we develop a novel dynamic prediction model that uses the notion of hidden system states to dynamically select a varying subset of sensors. These hidden system states are automatically learned by our model in a data-driven manner. The proposed algorithm can rapidly switch between different sets of active sensors when the model detects the (periodic or intermittent) change in the system state. We derive the dynamic sensor selection strategy by minimizing the error rates in tracking and predicting sensor readings over time. We introduce the notion of state-stacked sparseness to select a subset of the most critical sensors as a function of evolving system state. We present experimental results on two real sensor datasets, corresponding to oil drilling rig sensors and intensive care unit (ICU) sensors, and demonstrate the superiority of our approach with respect to other models.

7 citations


Journal ArticleDOI
TL;DR: The twenty papers in this special section aim at providing a forum to present recent advancements in deep learning research that directly concerns the multimedia community.
Abstract: The twenty papers in this special section aim at providing a forum to present recent advancements in deep learning research that directly concerns the multimedia community. Specifically, deep learning has successfully designed algorithms that can build deep nonlinear representations to mimic how the brain perceives and understands multimodal information, ranging from low-level signals like images and audios, to high-level semantic data like natural language. For multimedia research, it is especially important to develop deep networks to capture the dependencies between different genres of data, building joint deep representation for diverse modalities.

Journal ArticleDOI
TL;DR: This paper proposes the use of learning methods to perform link inference by transferring the link information from the source network to the target network by exploiting existing structures in source networks to rectify cross-network bias.
Abstract: Link prediction is one of the most fundamental problems in graph modeling and mining. It has been studied in a wide range of scenarios, from uncovering missing links between different entities in databases, to recommending relations between people in social networks. In this problem, we wish to predict unseen links in a growing target network by exploiting existing structures in source networks. Most of the existing methods often assume that abundant links are available in the target network to build a model for link prediction. However, in many scenarios, the target network may be too sparse to enable robust inference process, which makes link prediction challenging with the paucity of link data. On the other hand, in many cases, other (more densely linked) auxiliary networks can be available that contains similar link structure relevant to that in the target network. The linkage information in the existing networks can be used in conjunction with the node attribute information in both networks in order to make more accurate link recommendations. Thus, this paper proposes the use of learning methods to perform link inference by transferring the link information from the source network to the target network. We also note that the source network may contain the link information irrelevant to the target network. This leads to cross-network bias between the networks, which makes the link model built upon the source network misaligned with the link structure of the target network. Therefore, we re-sample the source network to rectify such cross-network bias by maximizing the cross-network relevance measured by the node attributes, as well as preserving as rich link information as possible to avoid the loss of source link structure caused by the re-sampling algorithm. The link model based on the re-sampled source network can make more accurate link predictions on the target network with aligned link structures across the networks. We present experimental results illustrating the effectiveness of the approach.

Journal ArticleDOI
01 Jul 2015
TL;DR: The authors propose an ontological random forest algorithm where the of decision trees are determined by semantic relations among categories and hierarchical features are automatically learned by multiple-instance learning to capture visual dissimilarities at different concept levels.
Abstract: Previous image classification approaches mostly neglect semantics, which has two major limitations. First, categories are simply treated independently while in fact they have semantic overlaps. For example, "sedan" is a specific kind of "car". Therefore, it's unreasonable to train a classifier to distinguish between "sedan" and "car". Second, image feature representations used for classifying different categories are the same. However, the human perception system is believed to use different features for different objects. In this paper, we leverage semantic ontologies to solve the aforementioned problems. The authors propose an ontological random forest algorithm where the splitting of decision trees are determined by semantic relations among categories. Then hierarchical features are automatically learned by multiple-instance learning to capture visual dissimilarities at different concept levels. Their approach is tested on two image classification datasets. Experimental results demonstrate that their approach not only outperforms state-of-the-art results but also identifies semantic visual features.

Posted Content
TL;DR: This work proposes a novel hash learning framework that encodes feature's rank orders instead of numeric values in a number of optimal low-dimensional ranking subspaces and presents two versions of the algorithm: one with independent optimization of each hash bit and the other exploiting a sequential learning framework.
Abstract: The era of Big Data has spawned unprecedented interests in developing hashing algorithms for efficient storage and fast nearest neighbor search. Most existing work learn hash functions that are numeric quantizations of feature values in projected feature space. In this work, we propose a novel hash learning framework that encodes feature's rank orders instead of numeric values in a number of optimal low-dimensional ranking subspaces. We formulate the ranking subspace learning problem as the optimization of a piece-wise linear convex-concave function and present two versions of our algorithm: one with independent optimization of each hash bit and the other exploiting a sequential learning framework. Our work is a generalization of the Winner-Take-All (WTA) hash family and naturally enjoys all the numeric stability benefits of rank correlation measures while being optimized to achieve high precision at very short code length. We compare with several state-of-the-art hashing algorithms in both supervised and unsupervised domain, showing superior performance in a number of data sets.

Posted Content
Jun Ye, Hao Hu, Kai Li, Guo-Jun Qi, Kien A. Hua 
TL;DR: This work proposes to treat the 3D human action recognition as a video-level hashing problem and proposes a novel First-Take-All (FTA) Hashing algorithm capable of hashing the entire video into hash codes of fixed length, demonstrating that this FTA algorithm produces a compact representation of the video invariant to the above mentioned variations.
Abstract: With the prevalence of the commodity depth cameras, the new paradigm of user interfaces based on 3D motion capturing and recognition have dramatically changed the way of interactions between human and computers. Human action recognition, as one of the key components in these devices, plays an important role to guarantee the quality of user experience. Although the model-driven methods have achieved huge success, they cannot provide a scalable solution for efficiently storing, retrieving and recognizing actions in the large-scale applications. These models are also vulnerable to the temporal translation and warping, as well as the variations in motion scales and execution rates. To address these challenges, we propose to treat the 3D human action recognition as a video-level hashing problem and propose a novel First-Take-All (FTA) Hashing algorithm capable of hashing the entire video into hash codes of fixed length. We demonstrate that this FTA algorithm produces a compact representation of the video invariant to the above mentioned variations, through which action recognition can be solved by an efficient nearest neighbor search by the Hamming distance between the FTA hash codes. Experiments on the public 3D human action datasets shows that the FTA algorithm can reach a recognition accuracy higher than 80%, with about 15 bits per frame considering there are 65 frames per video over the datasets.

01 Jan 2015
TL;DR: This paper proposes the use of learning methods to perform link inference by transferring the link information from the source network to the target network and presents experimental results illustrating the effectiveness of the approach.
Abstract: Link prediction is one of the most fundamental problems in graph modeling and mining. It has been studied in a wide range of scenarios, from uncovering missing links between different entities in databases, to recommending relations between people in social networks. In this problem, we wish to predict unseen links in a growing target network by exploiting existing structures in source networks. Most of the existing methods often assume that abundant links are available in the target network to build a model for link prediction. However, in many scenarios, the target network may be too sparse to enable robust inference process, which makes link prediction challenging with the paucity of link data. On the other hand, in many cases, other (more densely linked) auxiliary networks can be available that contains similar link structure relevant to that in the target network. The linkage information in the existing networks can be used in conjunction with the node attribute information in both networks in order to make more accurate link recommendations. Thus, this paper proposes the use of learning methods to perform link inference by transferring the link information from the source network to the target network. We also note that the source network may contain the link information irrelevant to the target network. This leads to cross-network bias between the networks, which makes the link model built upon the source network misaligned with the link structure of the target network. Therefore, we re-sample the source network to rectify such cross-network bias by maximizing the cross-network relevance measured by the node attributes, as well as preserving as rich link information as possible to avoid the loss of source link structure caused by the re-sampling algorithm. The link model based on the re-sampled source network can make more accurate link predictions on the target network with aligned link structures across the networks. We present experimental results illustrating the effectiveness of the approach. Index Terms—Link prediction, link transfer, cross-network bias, node attribution, link richness