scispace - formally typeset
Search or ask a question
Author

Jing Zhang

Other affiliations: Renmin University of China
Bio: Jing Zhang is an academic researcher from Tsinghua University. The author has contributed to research in topics: Social network & Topic model. The author has an hindex of 21, co-authored 39 publications receiving 3702 citations. Previous affiliations of Jing Zhang include Renmin University of China.

Papers
More filters
Proceedings ArticleDOI
Jie Tang1, Jing Zhang1, Limin Yao1, Juanzi Li1, Li Zhang2, Zhong Su2 
24 Aug 2008
TL;DR: The architecture and main features of the ArnetMiner system, which aims at extracting and mining academic social networks, are described and a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues is proposed.
Abstract: This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

2,058 citations

Proceedings ArticleDOI
TL;DR: Graph Contrastive Coding (GCC) is designed --- a self-supervised graph neural network pre-training framework --- to capture the universal network topological properties across multiple networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations.
Abstract: Graph representation learning has emerged as a powerful technique for addressing real-world problems Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) -- a self-supervised graph neural network pre-training framework -- to capture the universal network topological properties across multiple networks We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations We conduct extensive experiments on three graph learning tasks and ten graph datasets The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning

325 citations

Proceedings ArticleDOI
23 Aug 2020
TL;DR: GCC as mentioned in this paper proposes a self-supervised graph neural network pre-training framework to capture the universal network topological properties across multiple networks and leverage contrastive learning to empower graph neural networks.
Abstract: Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) --- a self-supervised graph neural network pre-training framework --- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.

320 citations

Proceedings Article
Jing Zhang1, Biao Liu1, Jie Tang1, Ting Chen1, Juanzi Li1 
03 Aug 2013
TL;DR: An interesting phenomenon of social influence locality in a large microblogging network, which suggests that users' behaviors are mainly influenced by close friends in their ego networks, is studied.
Abstract: We study an interesting phenomenon of social influence locality in a large microblogging network, which suggests that users' behaviors are mainly influenced by close friends in their ego networks. We provide a formal definition for the notion of social influence locality and develop two instantiation functions based on pairwise influence and structural diversity. The defined influence locality functions have strong predictive power. Without any additional features, we can obtain a F1-score of 71.65% for predicting users' retweet behaviors by training a logistic regression classifier based on the defined functions. Our analysis also reveals several intriguing discoveries. For example, though the probability of a user retweeting a microblog is positively correlated with the number of friends who have retweeted the microblog, it is surprisingly negatively correlated with the number of connected circles that are formed by those friends.

248 citations

Book ChapterDOI
09 Apr 2007
TL;DR: Experimental results show that the proposed propagation-based approach can outperform the baseline approach and take into consideration both person local information and network information (e.g. relationships between persons).
Abstract: This paper addresses the issue of expert finding in a social network. The task of expert finding, as one of the most important research issues in social networks, is aimed at identifying persons with relevant expertise or experience for a given topic. In this paper, we propose a propagation-based approach that takes into consideration of both person local information and network information (e.g. relationships between persons). Experimental results show that our approach can outperform the baseline approach.

245 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article provides a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields and proposes a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNS, convolutional GNN’s, graph autoencoders, and spatial–temporal Gnns.
Abstract: Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial–temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.

4,584 citations

Proceedings ArticleDOI
18 May 2015
TL;DR: A novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.
Abstract: This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online\footnote{\url{https://github.com/tangjianpku/LINE}}.

3,492 citations

Proceedings ArticleDOI
TL;DR: LINE as discussed by the authors proposes a network embedding method called LINE, which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.
Abstract: This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online.

3,447 citations

Posted Content
TL;DR: A detailed review over existing graph neural network models is provided, systematically categorize the applications, and four open problems for future research are proposed.
Abstract: Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics systems, learning molecular fingerprints, predicting protein interface, and classifying diseases demand a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures (like the dependency trees of sentences and the scene graphs of images) is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs. In recent years, variants of GNNs such as graph convolutional network (GCN), graph attention network (GAT), graph recurrent network (GRN) have demonstrated ground-breaking performances on many deep learning tasks. In this survey, we propose a general design pipeline for GNN models and discuss the variants of each component, systematically categorize the applications, and propose four open problems for future research.

2,494 citations

Proceedings ArticleDOI
04 Aug 2017
TL;DR: Two scalable representation learning models, namely metapath2vec and metapATH2vec++, are developed that are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, but also discern the structural and semantic correlations between diverse network objects.
Abstract: We study the problem of representation learning in heterogeneous networks. Its unique challenges come from the existence of multiple types of nodes and links, which limit the feasibility of the conventional network embedding techniques. We develop two scalable representation learning models, namely metapath2vec and metapath2vec++. The metapath2vec model formalizes meta-path-based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. The metapath2vec++ model further enables the simultaneous modeling of structural and semantic correlations in heterogeneous networks. Extensive experiments show that metapath2vec and metapath2vec++ are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, such as node classification, clustering, and similarity search, but also discern the structural and semantic correlations between diverse network objects.

1,794 citations