Showing papers by "Jiawei Han published in 2020"

PDF

Open Access

Proceedings Article•

On the Variance of the Adaptive Learning Rate and Beyond

[...]

Liyuan Liu¹, Haoming Jiang², Pengcheng He³, Weizhu Chen³, Xiaodong Liu⁴, Jianfeng Gao³, Jiawei Han¹ - Show less +3 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, Georgia Institute of Technology², Microsoft³, China Pharmaceutical University⁴

30 Apr 2020

TL;DR: The authors proposed Rectified Adam (RAdam), a variant of Adam, by introducing a term to rectify the variance of the adaptive learning rate, which is a variance reduction technique.

...read moreread less

Abstract: The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam. Pursuing the theory behind warmup, we identify a problem of the adaptive learning rate -- its variance is problematically large in the early stage, and presume warmup works as a variance reduction technique. We provide both empirical and theoretical evidence to verify our hypothesis. We further propose Rectified Adam (RAdam), a novel variant of Adam, by introducing a term to rectify the variance of the adaptive learning rate. Experimental results on image classification, language modeling, and neural machine translation verify our intuition and demonstrate the efficacy and robustness of RAdam.

...read moreread less

536 citations

Proceedings Article•DOI•

Understanding the Difficulty of Training Transformers

[...]

Liyuan Liu¹, Xiaodong Liu², Jianfeng Gao², Weizhu Chen², Jiawei Han¹ - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Microsoft²

17 Apr 2020

TL;DR: In this article, the authors identify an amplification effect that influences training substantially and propose an adaptive model initialization to stabilize the early stage's training and unleash its full potential in the late stage.

...read moreread less

Abstract: Transformers have proved effective in many NLP tasks. However, their training requires non-trivial efforts regarding carefully designing cutting-edge optimizers and learning rate schedulers (e.g., conventional SGD fails to train Transformers effectively). Our objective here is to understand __what complicates Transformer training__ from both empirical and theoretical perspectives. Our analysis reveals that unbalanced gradients are not the root cause of the instability of training. Instead, we identify an amplification effect that influences training substantially—for each layer in a multi-layer Transformer model, heavy dependency on its residual branch makes training unstable, since it amplifies small parameter perturbations (e.g., parameter updates) and results in significant disturbances in the model output. Yet we observe that a light dependency limits the model potential and leads to inferior trained models. Inspired by our analysis, we propose Admin (Adaptive model initialization) to stabilize the early stage’s training and unleash its full potential in the late stage. Extensive experiments show that Admin is more stable, converges faster, and leads to better performance

...read moreread less

151 citations

Journal Article•DOI•

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

[...]

Carl Yang¹, Yuxin Xiao², Yu Zhang², Yizhou Sun³, Jiawei Han² - Show less +1 more•Institutions (3)

Emory University¹, University of Illinois at Urbana–Champaign², University of California, Los Angeles³

21 Dec 2020-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work provides a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms, and creates four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and \etc.~from different sources towards handy and fair evaluations of H NE algorithms.

...read moreread less

Abstract: Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (a.k.a. embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. As the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and etc. from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for eleven popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings.

...read moreread less

132 citations

Posted Content•

Generation-Augmented Retrieval for Open-domain Question Answering

[...]

Yuning Mao¹, Pengcheng He², Xiaodong Liu³, Yelong Shen², Jianfeng Gao², Jiawei Han⁴, Weizhu Chen² - Show less +3 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, Microsoft², Edinburgh Napier University³, Google⁴

17 Sep 2020-arXiv: Computation and Language

TL;DR: It is shown that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy, and as sparse and dense representations are often complementary, GAR can be easily combined with DPR to achieve even better performance.

...read moreread less

Abstract: We propose Generation-Augmented Retrieval (GAR) for answering open-domain questions, which augments a query through text generation of heuristically discovered relevant contexts without external resources as supervision. We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR. We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy. Moreover, as sparse and dense representations are often complementary, GAR can be easily combined with DPR to achieve even better performance. GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader, and consistently outperforms other retrieval methods when the same generative reader is used.

...read moreread less

105 citations

Posted Content•

Text Classification Using Label Names Only: A Language Model Self-Training Approach

[...]

Yu Meng¹, Yunyi Zhang¹, Jiaxin Huang¹, Chenyan Xiong², Heng Ji¹, Chao Zhang³, Jiawei Han¹ - Show less +3 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of California, Los Angeles², Beihang University³

14 Oct 2020-arXiv: Computation and Language

TL;DR: This paper uses pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification, and achieves around 90% accuracy on four benchmark datasets.

...read moreread less

Abstract: Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name.

...read moreread less

89 citations

Journal Article•DOI•

Unsupervised Attributed Multiplex Network Embedding

[...]

Chanyoung Park¹, Donghyun Kim², Jiawei Han¹, Hwanjo Yu³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Yahoo!², Pohang University of Science and Technology³

03 Apr 2020

TL;DR: DMGI as discussed by the authors proposes a consensus regularization framework that minimizes the disagreements among the relation-type specific node embeddings, and a universal discriminator that discriminates true samples regardless of the relation types.

...read moreread less

Abstract: Nodes in a multiplex network are connected by multiple types of relations. However, most existing network embedding methods assume that only a single type of relation exists between nodes. Even for those that consider the multiplexity of a network, they overlook node attributes, resort to node labels for training, and fail to model the global properties of a graph. We present a simple yet effective unsupervised network embedding method for attributed multiplex network called DMGI, inspired by Deep Graph Infomax (DGI) that maximizes the mutual information between local patches of a graph, and the global representation of the entire graph. We devise a systematic way to jointly integrate the node embeddings from multiple graphs by introducing 1) the consensus regularization framework that minimizes the disagreements among the relation-type specific node embeddings, and 2) the universal discriminator that discriminates true samples regardless of the relation types. We also show that the attention mechanism infers the importance of each relation type, and thus can be useful for filtering unnecessary relation types as a preprocessing step. Extensive experiments on various downstream tasks demonstrate that DMGI outperforms the state-of-the-art methods, even though DMGI is fully unsupervised.

...read moreread less

88 citations

Posted Content•

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation

[...]

01 Jul 2020-arXiv: Computation and Language

TL;DR: A novel and comprehensive knowledge discovery framework, COVID-KG, to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature and exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation.

...read moreread less

Abstract: To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities and their visual chemical structures, relations, and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence.

...read moreread less

79 citations

Proceedings Article•DOI•

A Data-Driven Graph Generative Model for Temporal Interaction Networks

[...]

Dawei Zhou¹, Lecheng Zheng¹, Jiawei Han¹, Jingrui He¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

23 Aug 2020

TL;DR: This work proposes an end-to-end deep generative framework named TagGen, which outperforms all baselines in the temporal interaction network generation problem, and significantly boosts the performance of the prediction models in the tasks of anomaly detection and link prediction.

...read moreread less

Abstract: Deep graph generative models have recently received a surge of attention due to its superiority of modeling realistic graphs in a variety of domains, including biology, chemistry, and social science. Despite the initial success, most, if not all, of the existing works are designed for static networks. Nonetheless, many realistic networks are intrinsically dynamic and presented as a collection of system logs (i.e., timestamped interactions/edges between entities), which pose a new research direction for us: how can we synthesize realistic dynamic networks by directly learning from the system logs? In addition, how can we ensure the generated graphs preserve both the structural and temporal characteristics of the real data? To address these challenges, we propose an end-to-end deep generative framework named TagGen. In particular, we start with a novel sampling strategy for jointly extracting structural and temporal context information from temporal networks. On top of that, TagGen parameterizes a bi-level self-attention mechanism together with a family of local operations to generate temporal random walks. At last, a discriminator gradually selects generated temporal random walks, that are plausible in the input data, and feeds them to an assembling module for generating temporal networks. The experimental results in seven real-world data sets across a variety of metrics demonstrate that (1) TagGen outperforms all baselines in the temporal interaction network generation problem, and (2) TagGen significantly boosts the performance of the prediction models in the tasks of anomaly detection and link prediction.

...read moreread less

68 citations

Proceedings Article•DOI•

Text Classification Using Label Names Only: A Language Model Self-Training Approach

[...]

Yu Meng¹, Yunyi Zhang¹, Jiaxin Huang¹, Chenyan Xiong², Heng Ji¹, Chao Zhang³, Jiawei Han¹ - Show less +3 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of California, Los Angeles², Beihang University³

31 Oct 2020

TL;DR: The authors used pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification, and achieved 90% accuracy on four benchmark datasets including topic and sentiment classification without using any labeled documents.

...read moreread less

67 citations

Proceedings Article•DOI•

Discriminative Topic Mining via Category-Name Guided Text Embedding

[...]

Yu Meng¹, Jiaxin Huang¹, Guangyuan Wang¹, Zihan Wang¹, Chao Zhang², Yu Zhang¹, Jiawei Han¹ - Show less +3 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Georgia Institute of Technology²

20 Apr 2020

TL;DR: In this article, a new task, discriminative topic mining, is proposed, which leverages a set of user-provided category names to mine discriminating topics from text corpora, which helps a user understand clearly and distinctively the topics he/she is most interested in.

...read moreread less

Abstract: Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users’ particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.

...read moreread less

54 citations

Proceedings Article•DOI•

AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

[...]

Xin Luna Dong¹, Xiang He¹, Andrey Kan¹, Xian Li¹, Yan Liang¹, Jun Ma¹, Yifan Ethan Xu¹, Chenwei Zhang¹, Tong Zhao¹, Gabriel Blanco Saldana¹, Saurabh Deshpande¹, Alexandre Michetti Manduca¹, Jay Ren¹, Surender Pal Singh¹, Fan Xiao¹, Haw-Shiuan Chang², Giannis Karamanolakis³, Yuning Mao⁴, Yaqing Wang⁵, Christos Faloutsos⁶, Andrew McCallum², Jiawei Han⁴ - Show less +18 more•Institutions (6)

Amazon.com¹, University of Massachusetts Amherst², Columbia University³, University of Illinois at Urbana–Champaign⁴, University at Buffalo⁵, Carnegie Mellon University⁶

24 Jun 2020-arXiv: Artificial Intelligence

TL;DR: AutoKnow, the automatic (self-driving) system that addresses challenges of organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products.

...read moreread less

Abstract: Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products We describe AutoKnow, our automatic (self-driving) system that addresses these challenges The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs AutoKnow has been operational in collecting product knowledge for over 11K product types

...read moreread less

Proceedings Article•DOI•

When Do GNNs Work: Understanding and Improving Neighborhood Aggregation

[...]

Yiqing Xie¹, Yiqing Xie², Sha Li², Carl Yang³, Raymond Chi-Wing Wong¹, Jiawei Han² - Show less +2 more•Institutions (3)

Hong Kong University of Science and Technology¹, University of Illinois at Urbana–Champaign², Emory University³

09 Jul 2020

TL;DR: This paper proposes novel metrics that quantitatively measure these two circumstances and integrate them into an Adaptive-layer module and shows that allowing for node-specific aggregation degrees have significant advantage over current GNNs.

...read moreread less

Abstract: Graph Neural Networks (GNNs) have been shown to be powerful in a wide range of graph-related tasks. While there exist various GNN models, a critical common ingredient is neighborhood aggregation, where the embedding of each node is updated by referring to the embedding of its neighbors. This paper aims to provide a better understanding of this mechanism by asking the following question: Is neighborhood aggregation always necessary and beneficial? In short, the answer is no. We carve out two conditions under which neighborhood aggregation is not helpful: (1) when a node’s neighbors are highly dissimilar and (2) when a node’s embedding is already similar to that of its neighbors. We propose novel metrics that quantitatively measure these two circumstances and integrate them into an Adaptive-layer module. Our experiments show that allowing for node-specific aggregation degrees have significant advantage over current GNNs.

...read moreread less

Posted Content•

Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization

[...]

Qi Zhu¹, Yidan Xu¹, Haonan Wang², Chao Zhang³, Jiawei Han⁴, Carl Yang¹ - Show less +2 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, Emory University², University of Washington³, Georgia Institute of Technology⁴

11 Sep 2020-arXiv: Learning

TL;DR: This work proposes a novel view towards the essential graph information and advocate the capturing of it as the goal of transferable GNN training, which motivates the design of The authors', a novel GNN framework based on ego-graph information maximization to analytically achieve this goal.

...read moreread less

Abstract: Graph neural networks (GNNs) have been shown with superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs. Some recent work started to study the pre-training of GNNs. However, none of them provide theoretical insights into the design of their frameworks, or clear requirements and guarantees towards the transferability of GNNs. In this work, we establish a theoretically grounded and practically useful framework for the transfer learning of GNNs. Firstly, we propose a novel view towards the essential graph information and advocate the capturing of it as the goal of transferable GNN training, which motivates the design of Ours, a novel GNN framework based on ego-graph information maximization to analytically achieve this goal. Secondly, we specify the requirement of structure-respecting node features as the GNN input, and derive a rigorous bound of GNN transferability based on the difference between the local graph Laplacians of the source and target graphs. Finally, we conduct controlled synthetic experiments to directly justify our theoretical conclusions. Extensive experiments on real-world networks towards role identification show consistent results in the rigorously analyzed setting of direct-transfering, while those towards large-scale relation prediction show promising results in the more generalized and practical setting of transfering with fine-tuning.

...read moreread less

Posted Content•

Heterogeneous Network Representation Learning: Survey, Benchmark, Evaluation, and Beyond

[...]

Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han - Show less +1 more

01 Apr 2020

TL;DR: This work aims to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), and provides a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms.

...read moreread less

Abstract: Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (a.k.a. embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. Understandable due to the application favor of HNE, such indirect comparisons largely hinder the proper attribution of improved task performance towards effective data preprocessing and novel technical design, especially considering the various ways possible to construct a heterogeneous network from real-world application data. Therefore, as the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and etc. from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for eleven popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings.

...read moreread less

Posted Content•

Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision

[...]

Xuan Wang, Xiangchen Song, Bangzheng Li, Yingjun Guan, Jiawei Han - Show less +1 more

27 Mar 2020-arXiv: Computation and Language

TL;DR: This CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus covers 75 fine-grained entity types, which may benefit research on CO VID-19 related virus, spreading mechanisms, and potential vaccines.

...read moreread less

Abstract: We created this CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13). This CORD-NER dataset covers 75 fine-grained entity types: In addition to the common biomedical entity types (e.g., genes, chemicals and diseases), it covers many new entity types related explicitly to the COVID-19 studies (e.g., coronaviruses, viral proteins, evolution, materials, substrates and immune responses), which may benefit research on COVID-19 related virus, spreading mechanisms, and potential vaccines. CORD-NER annotation is a combination of four sources with different NER methods. The quality of CORD-NER annotation surpasses SciSpacy (over 10% higher on the F1 score based on a sample set of documents), a fully supervised BioNER tool. Moreover, CORD-NER supports incrementally adding new documents as well as adding new entity types when needed by adding dozens of seeds as the input examples. We will constantly update CORD-NER based on the incremental updates of the CORD-19 corpus and the improvement of our system.

...read moreread less

Proceedings Article•DOI•

Collective Multi-type Entity Alignment Between Knowledge Graphs

[...]

Qi Zhu¹, Hao Wei², Bunyamin Sisman², Da Zheng², Christos Faloutsos³, Xin Luna Dong², Jiawei Han¹ - Show less +3 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Amazon.com², Carnegie Mellon University³

20 Apr 2020

TL;DR: A novel collective aggregation function tailored for Multi-type entity Alignment, called CG-MuAlign, that relieves the incompleteness of knowledge graphs via both cross-graph and self attentions, and scales up efficiently with mini-batch training paradigm and effective neighborhood sampling strategy.

...read moreread less

Abstract: Knowledge graph (e.g. Freebase, YAGO) is a multi-relational graph representing rich factual information among entities of various types. Entity alignment is the key step towards knowledge graph integration from multiple sources. It aims to identify entities across different knowledge graphs that refer to the same real world entity. However, current entity alignment systems overlook the sparsity of different knowledge graphs and can not align multi-type entities by one single model. In this paper, we present a Collective Graph neural network for Multi-type entity Alignment, called CG-MuAlign. Different from previous work, CG-MuAlign jointly aligns multiple types of entities, collectively leverages the neighborhood information and generalizes to unlabeled entity types. Specifically, we propose novel collective aggregation function tailored for this task, that (1) relieves the incompleteness of knowledge graphs via both cross-graph and self attentions, (2) scales up efficiently with mini-batch training paradigm and effective neighborhood sampling strategy. We conduct experiments on real world knowledge graphs with millions of entities and observe the superior performance beyond existing methods. In addition, the running time of our approach is much less than the current state-of-the-art deep learning methods.

...read moreread less

Proceedings Article•DOI•

TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

[...]

Jiaming Shen¹, Zhihong Shen², Chenyan Xiong², Chi Wang², Kuansan Wang², Jiawei Han¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Microsoft²

20 Apr 2020

TL;DR: Wang et al. as discussed by the authors propose a self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept pairs from the existing taxonomy as training data.

...read moreread less

Abstract: Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications. For example, online retailers (e.g., Amazon and eBay) use taxonomies for product recommendation, and web search engines (e.g., Google and Bing) leverage taxonomies to enhance query understanding. Enormous efforts have been made on constructing taxonomies either manually or semi-automatically. However, with the fast-growing volume of web content, existing taxonomies will become outdated and fail to capture emerging knowledge. Therefore, in many applications, dynamic expansions of an existing taxonomy are in great demand. In this paper, we study how to expand an existing taxonomy by adding a set of new concepts. We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of ⟨query concept, anchor concept⟩ pairs from the existing taxonomy as training data. Using such self-supervision data, TaxoExpan learns a model to predict whether a query concept is the direct hyponym of an anchor concept. We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data. Extensive experiments on three large-scale datasets from different domains demonstrate both the effectiveness and the efficiency of TaxoExpan for taxonomy expansion.

...read moreread less

Proceedings Article•DOI•

NetTaxo: Automated Topic Taxonomy Construction from Text-Rich Network

[...]

Jingbo Shang¹, Xinyang Zhang², Liyuan Liu², Sha Li², Jiawei Han² - Show less +1 more•Institutions (2)

University of California, San Diego¹, University of Illinois at Urbana–Champaign²

20 Apr 2020

TL;DR: This paper proposes NetTaxo, a novel automatic topic taxonomy construction framework, which goes beyond the existing paradigm and allows text data to collaborate with network structure and learns term embeddings from both text and network as contexts.

...read moreread less

Abstract: The automated construction of topic taxonomies can benefit numerous applications, including web search, recommendation, and knowledge discovery. One of the major advantages of automatic taxonomy construction is the ability to capture corpus-specific information and adapt to different scenarios. To better reflect the characteristics of a corpus, we take the meta-data of documents into consideration and view the corpus as a text-rich network. In this paper, we propose NetTaxo, a novel automatic topic taxonomy construction framework, which goes beyond the existing paradigm and allows text data to collaborate with network structure. Specifically, we learn term embeddings from both text and network as contexts. Network motifs are adopted to capture appropriate network contexts. We conduct an instance-level selection for motifs, which further refines term embedding according to the granularity and semantics of each taxonomy node. Clustering is then applied to obtain sub-topics under a taxonomy node. Extensive experiments on two real-world datasets demonstrate the superiority of our method over the state-of-the-art, and further verify the effectiveness and importance of instance-level motif selection.

...read moreread less

Posted Content•

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

[...]

Yanru Qu¹, Dinghan Shen², Yelong Shen², Sandra Sajeev², Jiawei Han¹, Weizhu Chen² - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Microsoft²

16 Oct 2020-arXiv: Computation and Language

TL;DR: A novel data augmentation framework dubbed CoDA is proposed, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically by introducing a contrastive regularization objective to capture the global relationship among all the data samples.

...read moreread less

Abstract: Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation framework dubbed CoDA, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically. Moreover, a contrastive regularization objective is introduced to capture the global relationship among all the data samples. A momentum encoder along with a memory bank is further leveraged to better estimate the contrastive loss. To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks. On the GLUE benchmark, CoDA gives rise to an average improvement of 2.2% while applied to the RoBERTa-large model. More importantly, it consistently exhibits stronger results relative to several competitive data augmentation and adversarial training base-lines (including the low-resource settings). Extensive experiments show that the proposed contrastive objective can be flexibly combined with various data augmentation approaches to further boost their performance, highlighting the wide applicability of the CoDA framework.

...read moreread less

Proceedings Article•DOI•

Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction

[...]

Aravind Sankar¹, Xinyang Zhang¹, Adit Krishnan¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

20 Jan 2020

TL;DR: Zhang et al. as discussed by the authors proposed a variational autoencoder framework (Inf-VAE) to jointly embed homophily and influence through proximity-preserving social and position-encoded temporal latent variables.

...read moreread less

Abstract: Recent years have witnessed tremendous interest in understanding and predicting information spread on social media platforms such as Twitter, Facebook, etc. Existing diffusion prediction methods primarily exploit the sequential order of influenced users by projecting diffusion cascades onto their local social neighborhoods. However, this fails to capture global social structures that do not explicitly manifest in any of the cascades, resulting in poor performance for inactive users with limited historical activities. In this paper, we present a novel variational autoencoder framework (Inf-VAE) to jointly embed homophily and influence through proximity-preserving social and position-encoded temporal latent variables. To model social homophily, Inf-VAE utilizes powerful graph neural network architectures to learn social variables that selectively exploit the social connections of users. Given a sequence of seed user activations, Inf-VAE uses a novel expressive co-attentive fusion network that jointly attends over their social and temporal variables to predict the set of all influenced users. Our experimental results on multiple real-world social network datasets, including Digg, Weibo, and Stack-Exchanges demonstrate significant gains (22% MAP@10) for Inf-VAE over state-of-the-art diffusion prediction models; we achieve massive gains for users with sparse activities, and users who lack direct social neighbors in seed sets.

...read moreread less

Proceedings Article•DOI•

Empower Entity Set Expansion via Language Model Probing

[...]

Yunyi Zhang¹, Jiaming Shen¹, Jingbo Shang², Jiawei Han¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, San Diego²

01 Jul 2020

TL;DR: This study proposes a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue and generates high-quality class names and outperforms previous state-of-the-art methods significantly.

...read moreread less

Abstract: Entity set expansion, aiming at expanding a small seed entity set with new entities belonging to the same semantic class, is a critical task that benefits many downstream NLP and IR applications, such as question answering, query understanding, and taxonomy construction. Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities. A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations. In this study, we propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue. In each iteration, we select one positive and several negative class names by probing a pre-trained language model, and further score each candidate entity based on selected class names. Experiments on two datasets show that our framework generates high-quality class names and outperforms previous state-of-the-art methods significantly.

...read moreread less

Proceedings Article•DOI•

MultiSage: Empowering GCN with Contextualized Multi-Embeddings on Web-Scale Multipartite Networks

[...]

Carl Yang¹, Aditya Pal, Andrew Zhai, Nikil Pancha, Jiawei Han², Charles J. Rosenberg, Jure Leskovec - Show less +3 more•Institutions (2)

Emory University¹, University of Illinois at Urbana–Champaign²

23 Aug 2020

TL;DR: A contextualized GCN engine is presented by modeling the multipartite networks of target nodes and their intermediatecontext nodes that specify the contexts of their interactions to achieve interaction contextualization by treating neighboring target nodes based on intermediate context nodes.

...read moreread less

Abstract: Graph convolutional networks (GCNs) are a powerful class of graph neural networks. Trained in a semi-supervised end-to-end fashion, GCNs can learn to integrate node features and graph structures to generate high-quality embeddings that can be used for various downstream tasks like search and recommendation. However, existing GCNs mostly work on homogeneous graphs and consider a single embedding for each node, which do not sufficiently model the multi-facet nature and complex interaction of nodes in real-world networks. Here, we present a contextualized GCN engine by modeling the multipartite networks of target nodes and their intermediatecontext nodes that specify the contexts of their interactions. Towards the neighborhood aggregation process, we devise a contextual masking operation at the feature level and a contextual attention mechanism at the node level to achieve interaction contextualization by treating neighboring target nodes based on intermediate context nodes. Consequently, we compute multiple embeddings for target nodes that capture their diverse facets and different interactions during graph convolution, which is useful for fine-grained downstream applications. To enable efficient web-scale training, we build a parallel random walk engine to pre-sample contextualized neighbors, and a Hadoop2-based data provider pipeline to pre-join training data, dynamically reduce multi-GPU training time, and avoid high memory cost. Extensive experiments on the bipartite Pinterest graph and tripartite OAG graph corroborate the advantage of the proposed system.

...read moreread less

Proceedings Article•DOI•

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

[...]

Yu Meng¹, Yunyi Zhang¹, Jiaxin Huang¹, Yu Zhang¹, Chao Zhang², Jiawei Han¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Georgia Institute of Technology²

23 Aug 2020

TL;DR: In this paper, a hierarchical topic mining task is proposed to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics.

...read moreread less

Abstract: Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled optimization procedure that allows simultaneous modeling of the category tree structure and the corpus generative process in the spherical space for effective category-representative term discovery. Our comprehensive experiments show that our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.

...read moreread less

Proceedings Article•DOI•

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

[...]

Yu Meng¹, Yunyi Zhang¹, Jiaxin Huang¹, Yu Zhang¹, Chao Zhang², Jiawei Han¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Georgia Institute of Technology²

18 Jul 2020-arXiv: Computation and Language

TL;DR: This work proposes a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics.

...read moreread less

Posted Content•

Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation.

[...]

Yuning Mao, Xiang Ren, Heng Ji, Jiawei Han

24 Oct 2020-arXiv: Computation and Language

TL;DR: This paper proposed constrained abstractive summarization (CAS), a general setup that preserves the factual consistency of abstractive summaries by specifying tokens as constraints that must be present in the summary.

...read moreread less

Abstract: Summaries generated by abstractive summarization are supposed to only contain statements entailed by the source documents. However, state-of-the-art abstractive methods are still prone to hallucinate content inconsistent with the source documents. In this paper, we propose constrained abstractive summarization (CAS), a general setup that preserves the factual consistency of abstractive summarization by specifying tokens as constraints that must be present in the summary. We explore the feasibility of using lexically constrained decoding, a technique applicable to any abstractive method with beam search decoding, to fulfill CAS and conduct experiments in two scenarios: (1) Standard summarization without human involvement, where keyphrase extraction is used to extract constraints from source documents; (2) Interactive summarization with human feedback, which is simulated by taking missing tokens in the reference summaries as constraints. Automatic and human evaluations on two benchmark datasets demonstrate that CAS improves the quality of abstractive summaries, especially on factual consistency. In particular, we observe up to 11.2 ROUGE-2 gains when several ground-truth tokens are used as constraints in the interactive summarization scenario.

...read moreread less

Posted Content•

Generating Representative Headlines for News Stories

[...]

Xiaotao Gu¹, Yuning Mao¹, Jiawei Han¹, Jialu Liu², Hongkun Yu², You Wu², Cong Yu², Daniel Finnie², Jiaqi Zhai², Nicholas Zukoski² - Show less +6 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Google²

26 Jan 2020-arXiv: Computation and Language

TL;DR: The first manually curated benchmark dataset on headline generation for news stories, NewSHead, is published, which contains 367K stories, 6.5 times larger than the current largest multi-document summarization dataset.

...read moreread less

Abstract: Millions of news articles are published online every day, which can be overwhelming for readers to follow. Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. However, it remains a challenging research problem to efficiently and effectively generate a representative headline for each story. Automatic summarization of a document set has been studied for decades, while few studies have focused on generating representative headlines for a set of articles. Unlike summaries, which aim to capture most information with least redundancy, headlines aim to capture information jointly shared by the story articles in short length, and exclude information that is too specific to each individual article. In this work, we study the problem of generating representative headlines for news stories. We develop a distant supervision approach to train large-scale generation models without any human annotation. This approach centers on two technical components. First, we propose a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs.-quantity balance at different levels. We show that models trained within this framework outperform those trained with pure human curated corpus. Second, we propose a novel self-voting-based article attention layer to extract salient information shared by multiple articles. We show that models that incorporate this layer are robust to potential noises in news stories and outperform existing baselines with or without noises. We can further enhance our model by incorporating human labels, and we show our distant supervision approach significantly reduces the demand on labeled data.

...read moreread less

Proceedings Article•DOI•

Generating Representative Headlines for News Stories

[...]

Xiaotao Gu¹, Yuning Mao¹, Jiawei Han¹, Jialu Liu², You Wu², Cong Yu², Daniel Finnie², Hongkun Yu², Jiaqi Zhai², Nicholas Zukoski² - Show less +6 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Google²

20 Apr 2020

TL;DR: This article proposed a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs-quantity balance at different levels to generate representative headlines for news stories.

...read moreread less

Abstract: Millions of news articles are published online every day, which can be overwhelming for readers to follow. Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. However, it remains a challenging research problem to efficiently and effectively generate a representative headline for each story. Automatic summarization of a document set has been studied for decades, while few studies have focused on generating representative headlines for a set of articles. Unlike summaries, which aim to capture most information with least redundancy, headlines aim to capture information jointly shared by the story articles in short length and exclude information specific to each individual article. In this work, we study the problem of generating representative headlines for news stories. We develop a distant supervision approach to train large-scale generation models without any human annotation. The proposed approach centers on two technical components. First, we propose a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs.-quantity balance at different levels. We show that models trained within the multi-level pre-training framework outperform those only trained with human-curated corpus. Second, we propose a novel self-voting-based article attention layer to extract salient information shared by multiple articles. We show that models that incorporate this attention layer are robust to potential noises in news stories and outperform existing baselines on both clean and noisy datasets. We further enhance our model by incorporating human labels, and show that our distant supervision approach significantly reduces the demand on labeled data. Finally, to serve the research community, we publish the first manually curated benchmark dataset on headline generation for news stories, NewSHead, which contains 367K stories (each with 3-5 articles), 6.5 times larger than the current largest multi-document summarization dataset.

...read moreread less

Proceedings Article•DOI•

Relation Learning on Social Networks with Multi-Modal Graph Edge Variational Autoencoders

[...]

Carl Yang¹, Jieyu Zhang¹, Haonan Wang¹, Sha Li¹, Myungwan Kim², Matthew Walker², Yiou Xiao², Jiawei Han¹ - Show less +4 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, LinkedIn²

20 Jan 2020

TL;DR: This work aims to develop a principled and principled frame- work that can pro le user relations as edge semantics in social networks by integrating multi-modal signals in the presence of noisy and incomplete data.

...read moreread less

Abstract: While node semantics have been extensively explored in social networks, little research attention has been paid to pro le edge semantics, i.e., social relations. Ideal edge semantics should not only show that two users are connected, but also why they know each other and what they share in common. However, relations in social networks are often hard to pro le, due to noisy multi-modal signals and limited user-generated ground-truth labels. In this work, we aim to develop a uni ed and principled frame- work that can pro le user relations as edge semantics in social networks by integrating multi-modal signals in the presence of noisy and incomplete data. Our framework is also exible towards limited or missing supervision. Speci cally, we assume a latent distribution of multiple relations underlying each user link, and learn them with multi-modal graph edge variational autoencoders. We encode the network data with a graph convolutional network, and decode arbitrary signals with multiple reconstruction networks. Extensive experiments and case studies on two public DBLP author networks and two internal LinkedIn member networks demonstrate the superior e ectiveness and e ciency of our proposed model.

...read moreread less

Proceedings Article•DOI•

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

[...]

Jiaxin Huang¹, Yiqing Xie², Yu Meng¹, Jiaming Shen¹, Yunyi Zhang¹, Jiawei Han¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Hong Kong University of Science and Technology²

20 Apr 2020

TL;DR: Set-CoExpan as discussed by the authors proposes a new framework that automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets.

...read moreread less

Abstract: Given a small set of seed entities (e.g., “USA”, “Russia”), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user’s interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.

...read moreread less

Proceedings Article•DOI•

Co-Embedding Network Nodes and Hierarchical Labels with Taxonomy Based Generative Adversarial Networks

[...]

Carl Yang¹, Jieyu Zhang¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Nov 2020

TL;DR: TaxoGAN as discussed by the authors proposes to co-embed network nodes and hierarchical labels through a hierarchical network generation process, which models the child labels and network nodes of each parent label in an individual embedding space while learning to transfer network proximity among the spaces of hierarchical labels.

...read moreread less

Abstract: Network embedding aims at transferring node proximity in networks into distributed vectors, which can be leveraged in various downstream applications. Recent research has shown that nodes in a network can often be organized in latent hierarchical structures, but without a particular underlying taxonomy, the learned node embedding is less useful nor interpretable. In this work, we aim to improve network embedding by modeling the conditional node proximity in networks indicated by node labels residing in real taxonomies. In the meantime, we also aim to model the hierarchical label proximity in the given taxonomies, which is too coarse by solely looking at the hierarchical topologies. To this end, we propose TaxoGAN to co-embed network nodes and hierarchical labels, through a hierarchical network generation process. Particularly, TaxoGAN models the child labels and network nodes of each parent label in an individual embedding space while learning to transfer network proximity among the spaces of hierarchical labels through stacked network generators and embedding encoders. To enable robust and efficient model inference, we further develop a hierarchical adversarial training process. Comprehensive experiments and case studies on four real-world datasets of networks with hierarchical labels demonstrate the utility of TaxoGAN in improving network embedding on traditional tasks of node classification and link prediction, as well as novel tasks like conditional proximity search and fine-grained taxonomy layout.

...read moreread less