Home
/
Authors
/
Juanzi Li

Author

Juanzi Li

Other affiliations: Université de Montréal

Bio: Juanzi Li is an academic researcher from Tsinghua University. The author has contributed to research in topics: Ontology (information science) & Semantic Web. The author has an hindex of 43, co-authored 267 publications receiving 8106 citations. Previous affiliations of Juanzi Li include Université de Montréal.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
1999

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

ArnetMiner: extraction and mining of academic social networks

[...]

Jie Tang¹, Jing Zhang¹, Limin Yao¹, Juanzi Li¹, Li Zhang², Zhong Su² - Show less +2 more•Institutions (2)

Tsinghua University¹, IBM²

24 Aug 2008

TL;DR: The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.

...read moreread less

Abstract: This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

...read moreread less

2,058 citations

Journal Article•DOI•

RiMOM: A Dynamic Multistrategy Ontology Alignment Framework

[...]

Juanzi Li¹, Jie Tang¹, Yi Li¹, Qiong Luo¹•Institutions (1)

Tsinghua University¹

01 Aug 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper presents a dynamic multistrategy ontology alignment framework, named RiMOM, and proposes a systematic approach to quantitatively estimate the similarity characteristics for each alignment task and a strategy selection method to automatically combine the matching strategies based on two estimated factors.

...read moreread less

Abstract: Ontology alignment identifies semantically matching entities in different ontologies. Various ontology alignment strategies have been proposed; however, few systems have explored how to automatically combine multiple strategies to improve the matching effectiveness. This paper presents a dynamic multistrategy ontology alignment framework, named RiMOM. The key insight in this framework is that similarity characteristics between ontologies may vary widely. We propose a systematic approach to quantitatively estimate the similarity characteristics for each alignment task and propose a strategy selection method to automatically combine the matching strategies based on two estimated factors. In the approach, we consider both textual and structural characteristics of ontologies. With RiMOM, we participated in the 2006 and 2007 campaigns of the Ontology Alignment Evaluation Initiative (OAEI). Our system is among the top three performers in benchmark data sets.

...read moreread less

444 citations

Proceedings Article•DOI•

OpenKE: An Open Toolkit for Knowledge Embedding

[...]

Xu Han¹, Shulin Cao¹, Xin Lv¹, Yankai Lin¹, Zhiyuan Liu¹, Maosong Sun¹, Juanzi Li¹ - Show less +3 more•Institutions (1)

Tsinghua University¹

01 Nov 2018

TL;DR: An open toolkit for knowledge embedding, which provides a unified framework and various fundamental models to embed knowledge graphs into a continuous low-dimensional space and the embeddings of some existing large-scale knowledge graphs pre-trained by OpenKE are available.

...read moreread less

Abstract: We release an open toolkit for knowledge embedding (OpenKE), which provides a unified framework and various fundamental models to embed knowledge graphs into a continuous low-dimensional space OpenKE prioritizes operational efficiency to support quick model validation and large-scale knowledge representation learning Meanwhile, OpenKE maintains sufficient modularity and extensibility to easily incorporate new models into the framework Besides the toolkit, the embeddings of some existing large-scale knowledge graphs pre-trained by OpenKE are also available, which can be directly applied for many applications including information retrieval, personalized recommendation and question answering The toolkit, documentation, and pre-trained embeddings are all released on http://openkethunlporg/

...read moreread less

297 citations

Posted Content•

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

[...]

Xiaozhi Wang¹, Tianyu Gao¹, Zhaocheng Zhu¹, Zhengyan Zhang¹, Zhiyuan Liu², Juanzi Li³, Jian Tang⁴ - Show less +3 more•Institutions (4)

Tsinghua University¹, Princeton University², Université de Montréal³, HEC Montréal⁴

13 Nov 2019-arXiv: Computation and Language

TL;DR: A unified model for Knowledge Embedding and Pre-trained LanguagERepresentation (KEPLER), which can not only better integrate factual knowledge into PLMs but also produce effective text-enhanced KE with the strong PLMs is proposed.

...read moreread less

Abstract: Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, knowledge embedding (KE) methods can effectively represent the relational facts in knowledge graphs (KGs) with informative entity embeddings, but conventional KE models cannot take full advantage of the abundant textual information. In this paper, we propose a unified model for Knowledge Embedding and Pre-trained LanguagE Representation (KEPLER), which can not only better integrate factual knowledge into PLMs but also produce effective text-enhanced KE with the strong PLMs. In KEPLER, we encode textual entity descriptions with a PLM as their embeddings, and then jointly optimize the KE and language modeling objectives. Experimental results show that KEPLER achieves state-of-the-art performances on various NLP tasks, and also works remarkably well as an inductive KE model on KG link prediction. Furthermore, for pre-training and evaluating KEPLER, we construct Wikidata5M, a large-scale KG dataset with aligned entity descriptions, and benchmark state-of-the-art KE methods on it. It shall serve as a new KE benchmark and facilitate the research on large KG, inductive KE, and KG with text. The source code can be obtained from this https URL.

...read moreread less

269 citations

Proceedings Article•DOI•

Understanding retweeting behaviors in social networks

[...]

Zi Yang¹, Jingyi Guo¹, Keke Cai², Jie Tang¹, Juanzi Li¹, Li Zhang², Zhong Su² - Show less +3 more•Institutions (2)

Tsinghua University¹, IBM²

26 Oct 2010

TL;DR: This paper proposes a factor graph model to predict users' retweeting behaviors and shows that this method can achieve a precision of 28.81% and recall of 37.33% for prediction of the retweet behaviors.

...read moreread less

Abstract: Retweeting is an important action (behavior) on Twitter, indicating the behavior that users re-post microblogs of their friends. While much work has been conducted for mining textual content that users generate or analyzing the social network structure, few publications systematically study the underlying mechanism of the retweeting behaviors. In this paper, we perform an interesting analysis for the problem on Twitter. We have found that almost 25.5% of the tweets posted by users are actually retweeted from friends' blog spaces. Our investigation unveils that for the retweet behaviors, some statistics still follows the power law distribution, while some others violate the state-of-the-art distribution for Web. Based on these important observations, we propose a factor graph model to predict users' retweeting behaviors. Experimental results on the Twitter data set show that our method can achieve a precision of 28.81% and recall of 37.33% for prediction of the retweet behaviors.

...read moreread less

265 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

Journal Article•DOI•

A Comprehensive Survey on Graph Neural Networks

[...]

Zonghan Wu¹, Shirui Pan², Fengwen Chen¹, Guodong Long¹, Chengqi Zhang¹, Philip S. Yu³ - Show less +2 more•Institutions (3)

University of Technology, Sydney¹, Monash University, Clayton campus², University of Illinois at Chicago³

01 Jan 2021-IEEE Transactions on Neural Networks

TL;DR: This article provides a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields and proposes a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNS, convolutional GNN’s, graph autoencoders, and spatial–temporal Gnns.

...read moreread less

Abstract: Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial–temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.

...read moreread less

4,584 citations

Journal Article•

Thinking fast and slow.

[...]

Neil McGlynn

01 Dec 2014-Australian Veterinary Journal

TL;DR: Prospect Theory led cognitive psychology in a new direction that began to uncover other human biases in thinking that are probably not learned but are part of the authors' brain’s wiring.

...read moreread less

Abstract: In 1974 an article appeared in Science magazine with the dry-sounding title “Judgment Under Uncertainty: Heuristics and Biases” by a pair of psychologists who were not well known outside their discipline of decision theory. In it Amos Tversky and Daniel Kahneman introduced the world to Prospect Theory, which mapped out how humans actually behave when faced with decisions about gains and losses, in contrast to how economists assumed that people behave. Prospect Theory turned Economics on its head by demonstrating through a series of ingenious experiments that people are much more concerned with losses than they are with gains, and that framing a choice from one perspective or the other will result in decisions that are exactly the opposite of each other, even if the outcomes are monetarily the same. Prospect Theory led cognitive psychology in a new direction that began to uncover other human biases in thinking that are probably not learned but are part of our brain’s wiring.

...read moreread less

4,351 citations

Proceedings Article•DOI•

LINE: Large-scale Information Network Embedding

[...]

Jian Tang¹, Meng Qu², Mingzhe Wang², Ming Zhang², Jun Yan¹, Qiaozhu Mei³ - Show less +2 more•Institutions (3)

Microsoft¹, Peking University², University of Michigan³

18 May 2015

TL;DR: A novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.

...read moreread less

Abstract: This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online\footnote{\url{https://github.com/tangjianpku/LINE}}.

...read moreread less

3,492 citations

Proceedings Article•DOI•

LINE: Large-scale Information Network Embedding

[...]

Jian Tang¹, Meng Qu², Mingzhe Wang², Ming Zhang², Jun Yan¹, Qiaozhu Mei³ - Show less +2 more•Institutions (3)

Microsoft¹, Peking University², University of Michigan³

12 Mar 2015-arXiv: Learning

TL;DR: LINE as discussed by the authors proposes a network embedding method called LINE, which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.

...read moreread less

Abstract: This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online.

...read moreread less

3,447 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse