Home
/
Authors
/
Yintao Yu

Author

Yintao Yu

University of Illinois at Urbana–Champaign

Bio: Yintao Yu is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Heterogeneous network & Topic model. The author has an hindex of 10, co-authored 12 publications receiving 1475 citations.

Topics: Heterogeneous network, Topic model, Ranking, Node (networking), Cube (algebra) ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Ranking-based clustering of heterogeneous information networks with star network schema

[...]

Yizhou Sun¹, Yintao Yu¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Jun 2009

TL;DR: This paper studies clustering of multi-typed heterogeneous networks with a star network schema and proposes a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters and generates informative clusters.

...read moreread less

Abstract: A heterogeneous information network is an information networkcomposed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been addressed until recently.A recent study proposed a new algorithm, RankClus, for clustering on bi-typed heterogeneous networks. However, a real-world network may consist of more than two types, and the interactions among multi-typed objects play a key role at disclosing the rich semantics that a network carries. In this paper, we study clustering of multi-typed heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters. An iterative enhancement method is developed that leads to effective ranking-based clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each net-cluster.

...read moreread less

546 citations

Proceedings Article•DOI•

Mining advisor-advisee relationships from research publication networks

[...]

Chi Wang¹, Jiawei Han¹, Yuntao Jia¹, Jie Tang², Duo Zhang¹, Yintao Yu¹, Jingyi Guo² - Show less +3 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Tsinghua University²

25 Jul 2010

TL;DR: A time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function is proposed and an efficient learning algorithm is designed to optimize the objective function.

...read moreread less

Abstract: Information network contains abundant knowledge about relationships among people or entities. Unfortunately, such kind of knowledge is often hidden in a network where different kinds of relationships are not explicitly categorized. For example, in a research publication network, the advisor-advisee relationships among researchers are hidden in the coauthor network. Discovery of those relationships can benefit many interesting applications such as expert finding and research community analysis. In this paper, we take a computer science bibliographic network as an example, to analyze the roles of authors and to discover the likely advisor-advisee relationships. In particular, we propose a time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function. We further design an efficient learning algorithm to optimize the objective function. Based on that our model suggests and ranks probable advisors for every author. Experimental results show that the proposed approach infer advisor-advisee relationships efficiently and achieves a state-of-the-art accuracy (80-90%). We also apply the discovered advisor-advisee relationships to bole search, a specific expert finding task and empirical study shows that the search performance can be effectively improved (+4.09% by NDCG@5).

...read moreread less

212 citations

Proceedings Article•DOI•

Fast computation of SimRank for static and dynamic information networks

[...]

Cuiping Li¹, Jiawei Han², Guoming He¹, Xin Jin², Yizhou Sun², Yintao Yu², Tianyi Wu² - Show less +3 more•Institutions (2)

Renmin University of China¹, University of Illinois at Urbana–Champaign²

22 Mar 2010

TL;DR: A family of novel approximate SimRank computation algorithms for static and dynamic information networks are developed and their corresponding theoretical justification and analysis are given.

...read moreread less

Abstract: Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between nodes in a network. SimRank is a simple and influential measure of this kind, based on a solid theoretical "random surfer" model. Existing work computes SimRank similarity scores in an iterative mode. We argue that the iterative method can be infeasible and inefficient when, as in many real-world scenarios, the networks change dynamically and frequently. We envision non-iterative method to bridge the gap. It allows users not only to update the similarity scores incrementally, but also to derive similarity scores for an arbitrary subset of nodes. To enable the non-iterative computation, we propose to rewrite the SimRank equation into a non-iterative form by using the Kronecker product and vectorization operators. Based on this, we develop a family of novel approximate SimRank computation algorithms for static and dynamic information networks, and give their corresponding theoretical justification and analysis. The non-iterative method supports efficient processing of various node analysis including similarity tracking and centrality tracking on evolving information networks. The effectiveness and efficiency of our proposed methods are evaluated on synthetic and real data sets.

...read moreread less

171 citations

Proceedings Article•DOI•

iTopicModel: Information Network-Integrated Topic Modeling

[...]

Yizhou Sun¹, Jiawei Han¹, Jing Gao¹, Yintao Yu¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

06 Dec 2009

TL;DR: A novel topic modeling framework is proposed, which builds a unified generative topic model that is able to consider both text and structure information for documents, and a graphical model is proposed to describe the generative model.

...read moreread less

Abstract: Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online data. In this paper, we propose a novel topic modeling framework for document networks, which builds a unified generative topic model that is able to consider both text and structure information for documents. A graphical model is proposed to describe the generative model. On the top layer of this graphical model, we define a novel multivariate Markov Random Field for topic distribution random variables for each document, to model the dependency relationships among documents over the network structure. On the bottom layer, we follow the traditional topic model to model the generation of text for each document. A joint distribution function for both the text and structure of the documents is thus provided. A solution to estimate this topic model is given, by maximizing the log-likelihood of the joint probability. Some important practical issues in real applications are also discussed, including how to decide the topic number and how to choose a good network structure. We apply the model on two real datasets, DBLP and Cora, and the experiments show that this model is more effective in comparison with the state-of-the-art topic modeling algorithms.

...read moreread less

145 citations

Journal Article•DOI•

MoveMine: Mining moving object data for discovery of animal movement patterns

[...]

Zhenhui Li¹, Jiawei Han¹, Ming Ji¹, Lu-An Tang¹, Yintao Yu¹, Bolin Ding¹, Jae-Gil Lee², Roland Kays - Show less +4 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, KAIST²

15 Jul 2011-ACM Transactions on Intelligent Systems and Technology

TL;DR: A moving object data mining system, MoveMine, which integrates multiple data mining functions, including sophisticated pattern mining and trajectory analysis is introduced, which will benefit scientists and other users to carry out versatile analysis tasks to analyze object movement regularities and anomalies.

...read moreread less

Abstract: With the maturity and wide availability of GPS, wireless, telecommunication, and Web technologies, massive amounts of object movement data have been collected from various moving object targets, such as animals, mobile devices, vehicles, and climate radars. Analyzing such data has deep implications in many applications, such as, ecological study, traffic control, mobile communication management, and climatological forecast. In this article, we focus our study on animal movement data analysis and examine advanced data mining methods for discovery of various animal movement patterns. In particular, we introduce a moving object data mining system, MoveMine, which integrates multiple data mining functions, including sophisticated pattern mining and trajectory analysis. In this system, two interesting moving object pattern mining functions are newly developed: (1) periodic behavior mining and (2) swarm pattern mining. For mining periodic behaviors, a reference location-based method is developed, which first detects the reference locations, discovers the periods in complex movements, and then finds periodic patterns by hierarchical clustering. For mining swarm patterns, an efficient method is developed to uncover flexible moving object clusters by relaxing the popularly-enforced collective movement constraints.In the MoveMine system, a set of commonly used moving object mining functions are built and a user-friendly interface is provided to facilitate interactive exploration of moving object data mining and flexible tuning of the mining constraints and parameters. MoveMine has been tested on multiple kinds of real datasets, especially for MoveBank applications and other moving object data analysis. The system will benefit scientists and other users to carry out versatile analysis tasks to analyze object movement regularities and anomalies. Moreover, it will benefit researchers to realize the importance and limitations of current techniques and promote future studies on moving object data mining. As expected, a mastery of animal movement patterns and trends will improve our understanding of the interactions between and the changes of the animal world and the ecosystem and therefore help ensure the sustainability of our ecosystem.

...read moreread less

137 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Multilayer Networks

[...]

Mikko Kivelä, Alexandre Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, Mason A. Porter - Show less +2 more

27 Sep 2013-arXiv: Physics and Society

TL;DR: In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications.

...read moreread less

Abstract: In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications Such systems include multiple subsystems and layers of connectivity, and it is important to take such "multilayer" features into account to try to improve our understanding of complex systems Consequently, it is necessary to generalize "traditional" network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary of terminology to relate the numerous existing concepts to each other, and provide a thorough discussion that compares, contrasts, and translates between related notions such as multilayer networks, multiplex networks, interdependent networks, networks of networks, and many others We also survey and discuss existing data sets that can be represented as multilayer networks We review attempts to generalize single-layer-network diagnostics to multilayer networks We also discuss the rapidly expanding research on multilayer-network models and notions like community structure, connected components, tensor decompositions, and various types of dynamical processes on multilayer networks We conclude with a summary and an outlook

...read moreread less

1,934 citations

Journal Article•DOI•

Mobile crowdsensing: current state and future challenges

[...]

Raghu K. Ganti¹, Fan Ye¹, Hui Lei¹•Institutions (1)

IBM¹

10 Nov 2011-IEEE Communications Magazine

TL;DR: The need for a unified architecture for mobile crowdsensing is argued and the requirements it must satisfy are envisioned.

...read moreread less

Abstract: An emerging category of devices at the edge of the Internet are consumer-centric mobile sensing and computing devices, such as smartphones, music players, and in-vehicle sensors. These devices will fuel the evolution of the Internet of Things as they feed sensor data to the Internet at a societal scale. In this article, we examine a category of applications that we term mobile crowdsensing, where individuals with sensing and computing devices collectively share data and extract information to measure and map phenomena of common interest. We present a brief overview of existing mobile crowdsensing applications, explain their unique characteristics, illustrate various research challenges, and discuss possible solutions. Finally, we argue the need for a unified architecture and envision the requirements it must satisfy.

...read moreread less

1,833 citations

Proceedings Article•DOI•

metapath2vec: Scalable Representation Learning for Heterogeneous Networks

[...]

Yuxiao Dong¹, Nitesh V. Chawla¹, Ananthram Swami²•Institutions (2)

University of Notre Dame¹, United States Army Research Laboratory²

04 Aug 2017

TL;DR: Two scalable representation learning models, namely metapath2vec and metapATH2vec++, are developed that are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, but also discern the structural and semantic correlations between diverse network objects.

...read moreread less

Abstract: We study the problem of representation learning in heterogeneous networks. Its unique challenges come from the existence of multiple types of nodes and links, which limit the feasibility of the conventional network embedding techniques. We develop two scalable representation learning models, namely metapath2vec and metapath2vec++. The metapath2vec model formalizes meta-path-based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. The metapath2vec++ model further enables the simultaneous modeling of structural and semantic correlations in heterogeneous networks. Extensive experiments show that metapath2vec and metapath2vec++ are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, such as node classification, clustering, and similarity search, but also discern the structural and semantic correlations between diverse network objects.

...read moreread less

1,794 citations

Journal Article•DOI•

PathSim: meta path-based top-K similarity search in heterogeneous information networks

[...]

Yizhou Sun¹, Jiawei Han¹, Xifeng Yan², Philip S. Yu³, Tianyi Wu⁴ - Show less +1 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, University of California, Santa Barbara², University of Illinois at Chicago³, Microsoft⁴

01 Aug 2011

TL;DR: Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures.

...read moreread less

Abstract: Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks.In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity, where a meta path is a path consisting of a sequence of relations defined between different object types (i.e., structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top-k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.

...read moreread less

1,583 citations

Journal Article•DOI•

Defining and evaluating network communities based on ground-truth

[...]

Jaewon Yang¹, Jure Leskovec¹•Institutions (1)

Stanford University¹

01 Jan 2015-Knowledge and Information Systems

TL;DR: In this article, the authors distinguish between structural and functional definitions of network communities and identify networks with explicitly labeled functional communities to which they refer as ground-truth communities, where nodes explicitly state their community memberships and use such social groups to define a reliable and robust notion of groundtruth communities.

...read moreread less

Abstract: Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of definitions of network communities, intractability of methods for detecting them, and the issues with evaluation which stem from the lack of a reliable gold-standard ground-truth. In this paper, we distinguish between structural and functional definitions of network communities. Structural definitions of communities are based on connectivity patterns, like the density of connections between the community members, while functional definitions are based on (often unobserved) common function or role of the community members in the network. We argue that the goal of network community detection is to extract functional communities based on the connectivity structure of the nodes in the network. We then identify networks with explicitly labeled functional communities to which we refer as ground-truth communities. In particular, we study a set of 230 large real-world social, collaboration, and information networks where nodes explicitly state their community memberships. For example, in social networks, nodes explicitly join various interest-based social groups. We use such social groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology, which allows us to compare and quantitatively evaluate how different structural definitions of communities correspond to ground-truth functional communities. We study 13 commonly used structural definitions of communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad participation ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than 100 million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods.

...read moreread less

1,518 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse