scispace - formally typeset
Search or ask a question
Author

Ming Ji

Other affiliations: Zhejiang University, IBM
Bio: Ming Ji is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Semi-supervised learning & Ranking SVM. The author has an hindex of 12, co-authored 24 publications receiving 989 citations. Previous affiliations of Ming Ji include Zhejiang University & IBM.

Papers
More filters
Book ChapterDOI
20 Sep 2010
TL;DR: This paper considers the transductive classification problem on heterogeneous networked data which share a common topic and proposes a novel graph-based regularization framework, GNetMine, to model the link structure in information networks with arbitrary network schema and arbitrary number of object/link types.
Abstract: A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning from such labeled and unlabeled data via transductive classification can lead to good knowledge extraction of the hidden network structure. However, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the transductive classification problem on heterogeneous networked data which share a common topic. Only some objects in the given network are labeled, and we aim to predict labels for all types of the remaining objects. A novel graph-based regularization framework, GNetMine, is proposed to model the link structure in information networks with arbitrary network schema and arbitrary number of object/link types. Specifically, we explicitly respect the type differences by preserving consistency over each relation graph corresponding to each type of links separately. Efficient computational schemes are then introduced to solve the corresponding optimization problem. Experiments on the DBLP data set show that our algorithm significantly improves the classification accuracy over existing state-of-the-art methods.

247 citations

Proceedings ArticleDOI
21 Aug 2011
TL;DR: A novel ranking-based iterative classification framework that generates more accurate classes than the state-of-art classification methods on networked data, but also provides meaningful ranking of objects within each class, serving as a more informative view of the data than traditional classification.
Abstract: It has been recently recognized that heterogeneous information networks composed of multiple types of nodes and links are prevalent in the real world. Both classification and ranking of the nodes (or data objects) in such networks are essential for network analysis. However, so far these approaches have generally been performed separately. In this paper, we combine ranking and classification in order to perform more accurate analysis of a heterogeneous information network. Our intuition is that highly ranked objects within a class should play more important roles in classification. On the other hand, class membership information is important for determining a quality ranking over a dataset. We believe it is therefore beneficial to integrate classification and ranking in a simultaneous, mutually enhancing process, and to this end, propose a novel ranking-based iterative classification framework, called RankClass. Specifically, we build a graph-based ranking model to iteratively compute the ranking distribution of the objects within each class. At each iteration, according to the current ranking results, the graph structure used in the ranking algorithm is adjusted so that the sub-network corresponding to the specific class is emphasized, while the rest of the network is weakened. As our experiments show, integrating ranking with classification not only generates more accurate classes than the state-of-art classification methods on networked data, but also provides meaningful ranking of objects within each class, serving as a more informative view of the data than traditional classification.

187 citations

Journal ArticleDOI
TL;DR: A moving object data mining system, MoveMine, which integrates multiple data mining functions, including sophisticated pattern mining and trajectory analysis is introduced, which will benefit scientists and other users to carry out versatile analysis tasks to analyze object movement regularities and anomalies.
Abstract: With the maturity and wide availability of GPS, wireless, telecommunication, and Web technologies, massive amounts of object movement data have been collected from various moving object targets, such as animals, mobile devices, vehicles, and climate radars. Analyzing such data has deep implications in many applications, such as, ecological study, traffic control, mobile communication management, and climatological forecast. In this article, we focus our study on animal movement data analysis and examine advanced data mining methods for discovery of various animal movement patterns. In particular, we introduce a moving object data mining system, MoveMine, which integrates multiple data mining functions, including sophisticated pattern mining and trajectory analysis. In this system, two interesting moving object pattern mining functions are newly developed: (1) periodic behavior mining and (2) swarm pattern mining. For mining periodic behaviors, a reference location-based method is developed, which first detects the reference locations, discovers the periods in complex movements, and then finds periodic patterns by hierarchical clustering. For mining swarm patterns, an efficient method is developed to uncover flexible moving object clusters by relaxing the popularly-enforced collective movement constraints.In the MoveMine system, a set of commonly used moving object mining functions are built and a user-friendly interface is provided to facilitate interactive exploration of moving object data mining and flexible tuning of the mining constraints and parameters. MoveMine has been tested on multiple kinds of real datasets, especially for MoveBank applications and other moving object data analysis. The system will benefit scientists and other users to carry out versatile analysis tasks to analyze object movement regularities and anomalies. Moreover, it will benefit researchers to realize the importance and limitations of current techniques and promote future studies on moving object data mining. As expected, a mastery of animal movement patterns and trends will improve our understanding of the interactions between and the changes of the animal world and the ecosystem and therefore help ensure the sustainability of our ecosystem.

137 citations

Proceedings ArticleDOI
06 Jun 2010
TL;DR: The system, MoveMine, is designed for sophisticated moving object data mining by integrating several attractive functions including moving object pattern mining and trajectory mining and a user-friendly interface is provided to facilitate interactive exploration of mining results and flexible tuning of the underlying methods.
Abstract: With the maturity of GPS, wireless, and Web technologies, increasing amounts of movement data collected from various moving objects, such as animals, vehicles, mobile devices, and climate radars, have become widely available. Analyzing such data has broad applications, e.g., in ecological study, vehicle control, mobile communication management, and climatological forecast. However, few data mining tools are available for flexible and scalable analysis of massive-scale moving object data. Our system, MoveMine, is designed for sophisticated moving object data mining by integrating several attractive functions including moving object pattern mining and trajectory mining. We explore the state-of-the-art and novel techniques at implementation of the selected functions. A user-friendly interface is provided to facilitate interactive exploration of mining results and flexible tuning of the underlying methods. Since MoveMine is tested on multiple kinds of real data sets, it will benefit users to carry out versatile analysis on these kinds of data. At the same time, it will benefit researchers to realize the importance and limitations of current techniques as well as the potential future studies in moving object data mining.

109 citations

Journal ArticleDOI
TL;DR: This paper considers the feature selection problem in unsupervised learning scenarios, which is particularly difficult due to the absence of class labels that would guide the search for relevant information, and proposes two novel feature selection algorithms which aim to minimize the expected prediction error of the regularized regression model.
Abstract: In many information processing tasks, one is often confronted with very high-dimensional data. Feature selection techniques are designed to find the meaningful feature subset of the original features which can facilitate clustering, classification, and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenarios, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. Based on Laplacian regularized least squares, which finds a smooth function on the data manifold and minimizes the empirical loss, we propose two novel feature selection algorithms which aim to minimize the expected prediction error of the regularized regression model. Specifically, we select those features such that the size of the parameter covariance matrix of the regularized regression model is minimized. Motivated from experimental design, we use trace and determinant operators to measure the size of the covariance matrix. Efficient computational schemes are also introduced to solve the corresponding optimization problems. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithms.

89 citations


Cited by
More filters
Journal ArticleDOI
Raghu K. Ganti1, Fan Ye1, Hui Lei1
TL;DR: The need for a unified architecture for mobile crowdsensing is argued and the requirements it must satisfy are envisioned.
Abstract: An emerging category of devices at the edge of the Internet are consumer-centric mobile sensing and computing devices, such as smartphones, music players, and in-vehicle sensors. These devices will fuel the evolution of the Internet of Things as they feed sensor data to the Internet at a societal scale. In this article, we examine a category of applications that we term mobile crowdsensing, where individuals with sensing and computing devices collectively share data and extract information to measure and map phenomena of common interest. We present a brief overview of existing mobile crowdsensing applications, explain their unique characteristics, illustrate various research challenges, and discuss possible solutions. Finally, we argue the need for a unified architecture and envision the requirements it must satisfy.

1,833 citations

Proceedings ArticleDOI
04 Aug 2017
TL;DR: Two scalable representation learning models, namely metapath2vec and metapATH2vec++, are developed that are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, but also discern the structural and semantic correlations between diverse network objects.
Abstract: We study the problem of representation learning in heterogeneous networks. Its unique challenges come from the existence of multiple types of nodes and links, which limit the feasibility of the conventional network embedding techniques. We develop two scalable representation learning models, namely metapath2vec and metapath2vec++. The metapath2vec model formalizes meta-path-based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. The metapath2vec++ model further enables the simultaneous modeling of structural and semantic correlations in heterogeneous networks. Extensive experiments show that metapath2vec and metapath2vec++ are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, such as node classification, clustering, and similarity search, but also discern the structural and semantic correlations between diverse network objects.

1,794 citations

Journal ArticleDOI
TL;DR: In GNMF, an affinity graph is constructed to encode the geometrical information and a matrix factorization is sought, which respects the graph structure, and the empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
Abstract: Recently, multiple graph regularizer based methods have shown promising performances in data representation However, the parameter choice of the regularizer is crucial to the performance of clustering and its optimal value changes for different real datasets To deal with this problem, we propose a novel method called Parameter-less Auto-weighted Multiple Graph regularized Nonnegative Matrix Factorization (PAMGNMF) in this paper PAMGNMF employs the linear combination of multiple simple graphs to approximate the manifold structure of data as previous methods do Moreover, the proposed method can automatically learn an optimal weight for each graph without introducing an additive parameter Therefore, the proposed PAMGNMF method is easily applied to practical problems Extensive experimental results on different real-world datasets have demonstrated that the proposed method achieves better performance than the state-of-the-art approaches

1,082 citations

Proceedings Article
19 Jun 2016
TL;DR: In this article, a semi-supervised learning framework based on graph embeddings is proposed, where given a graph between instances, an embedding for each instance is trained to jointly predict the class label and the neighborhood context in the graph.
Abstract: We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.

1,012 citations

Proceedings ArticleDOI
24 Feb 2014
TL;DR: This paper proposes to combine heterogeneous relationship information for each user differently and aim to provide high-quality personalized recommendation results using user implicit feedback data and personalized recommendation models.
Abstract: Among different hybrid recommendation techniques, network-based entity recommendation methods, which utilize user or item relationship information, are beginning to attract increasing attention recently. Most of the previous studies in this category only consider a single relationship type, such as friendships in a social network. In many scenarios, the entity recommendation problem exists in a heterogeneous information network environment. Different types of relationships can be potentially used to improve the recommendation quality. In this paper, we study the entity recommendation problem in heterogeneous information networks. Specifically, we propose to combine heterogeneous relationship information for each user differently and aim to provide high-quality personalized recommendation results using user implicit feedback data and personalized recommendation models. In order to take full advantage of the relationship heterogeneity in information networks, we first introduce meta-path-based latent features to represent the connectivity between users and items along different types of paths. We then define recommendation models at both global and personalized levels and use Bayesian ranking optimization techniques to estimate the proposed models. Empirical studies show that our approaches outperform several widely employed or the state-of-the-art entity recommendation techniques.

674 citations