scispace - formally typeset
Search or ask a question

Showing papers by "Soumen Chakrabarti published in 2020"


Proceedings ArticleDOI
01 Jul 2020
TL;DR: IMoJIE is presented, an extension to CopyAttention, which produces the next extraction conditioned on all previously extracted tuples, establishing a new state of the art for the task.
Abstract: While traditional systems for Open Information Extraction were statistical and rule-based, recently neural models have been introduced for the task. Our work builds upon CopyAttention, a sequence generation OpenIE model (Cui et. al. 18). Our analysis reveals that CopyAttention produces a constant number of extractions per sentence, and its extracted tuples often express redundant information. We present IMoJIE, an extension to CopyAttention, which produces the next extraction conditioned on all previously extracted tuples. This approach overcomes both shortcomings of CopyAttention, resulting in a variable number of diverse extractions per sentence. We train IMoJIE on training data bootstrapped from extractions of several non-neural systems, which have been automatically filtered to reduce redundancy and noise. IMoJIE outperforms CopyAttention by about 18 F1 pts, and a BERT-based strong baseline by 2 F1 pts, establishing a new state of the art for the task.

56 citations


Posted Content
TL;DR: This paper presents an iterative labeling-based system that establishes a new state of the art for OpenIE, while extracting 10x faster, through a novel Iterative Grid Labeling (IGL) architecture, which treats OpenIE as a 2-D grid labeling task.
Abstract: A recent state-of-the-art neural open information extraction (OpenIE) system generates extractions iteratively, requiring repeated encoding of partial outputs. This comes at a significant computational cost. On the other hand, sequence labeling approaches for OpenIE are much faster, but worse in extraction quality. In this paper, we bridge this trade-off by presenting an iterative labeling-based system that establishes a new state of the art for OpenIE, while extracting 10x faster. This is achieved through a novel Iterative Grid Labeling (IGL) architecture, which treats OpenIE as a 2-D grid labeling task. We improve its performance further by applying coverage (soft) constraints on the grid at training time. Moreover, on observing that the best OpenIE systems falter at handling coordination structures, our OpenIE system also incorporates a new coordination analyzer built with the same IGL architecture. This IGL based coordination analyzer helps our OpenIE system handle complicated coordination structures, while also establishing a new state of the art on the task of coordination analysis, with a 12.3 pts improvement in F1 over previous analyzers. Our OpenIE system, OpenIE6, beats the previous systems by as much as 4 pts in F1, while being much faster.

48 citations


Proceedings ArticleDOI
01 Nov 2020
TL;DR: In this article, the authors propose TIMEPLEX, a joint temporal knowledge base completion (TKBC) method, in which entities, relations and time are all embedded in a uniform, compatible space.
Abstract: Research on temporal knowledge bases, which associate a relational fact (s,r,o) with a validity time period (or time instant), is in its early days. Our work considers predicting missing entities (link prediction) and missing time intervals (time prediction) as joint Temporal Knowledge Base Completion (TKBC) tasks, and presents TIMEPLEX, a novel TKBC method, in which entities, relations and, time are all embedded in a uniform, compatible space. TIMEPLEX exploits the recurrent nature of some facts/events and temporal interactions between pairs of relations, yielding state-of-the-art results on both prediction tasks. We also find that existing TKBC models heavily overestimate link prediction performance due to imperfect evaluation mechanisms. In response, we propose improved TKBC evaluation protocols for both link and time prediction tasks, dealing with subtle issues that arise from the partial overlap of time intervals in gold instances and system predictions.

37 citations


Journal ArticleDOI
TL;DR: This work proposes a new, space-efficient, computationally lightweight, two-stage framework for relatedness computation, which shows better agreement with human judgment than existing proposals both on the new dataset and on an established one.
Abstract: Many text mining tasks, such as clustering, classification, retrieval, and named entity linking, benefit from a measure of relatedness between entities in a knowledge graph. We present a thorough study of all entity relatedness measures in recent literature based on Wikipedia as the knowledge graph. To facilitate this study, we introduce a new dataset with human judgments of entity relatedness. No clear dominance is seen between measures based on textual similarity and graph proximity. Some of the better measures involve expensive global graph computations. We propose a new, space-efficient, computationally lightweight, two-stage framework for relatedness computation. In the first stage, a small weighted subgraph is dynamically grown around the two query entities; in the second stage, relatedness is derived based on computations on this subgraph. Our system shows better agreement with human judgment than existing proposals both on the new dataset and on an established one. Our framework also shows improvements with respect to the state-of-the-art on three different extrinsic evaluations in the domains of ranking entity pairs, entity linking, and synonym extraction.

15 citations


Proceedings ArticleDOI
23 Aug 2020
TL;DR: This work proposes a novel framework, ChatterNet, which is the first that can model and predict user engagement without considering the underlying user network, and shows considerable improvements beyond recent state-of-the-art models of engagement prediction.
Abstract: Modeling user engagement dynamics on social media has compelling applications in market trend analysis, user-persona detection, and political discourse mining. Most existing approaches depend heavily on knowledge of the underlying user network. However, a large number of discussions happen on platforms that either lack any reliable social network (news portal, blogs, Buzzfeed) or reveal only partially the inter-user ties (Reddit, Stackoverflow). Many approaches require observing a discussion for some considerable period before they can make useful predictions. In real-time streaming scenarios, observations incur costs. Lastly, most models do not capture complex interactions between exogenous events (such as news articles published externally) and in-network effects (such as follow-up discussions on Reddit) to determine engagement levels. To address the three limitations noted above, we propose a novel framework, ChatterNet, which, to our knowledge, is the first that can model and predict user engagement without considering the underlying user network. Given streams of timestamped news articles and discussions, the task is to observe the streams for a short period leading up to a time horizon, then predict chatter: the volume of discussions through a specified period after the horizon. ChatterNet processes text from news and discussions using a novel time-evolving recurrent network architecture that captures both temporal properties within news and discussions, as well as influence of news on discussions. We report on extensive experiments using a two-month-long discussion corpus of Reddit, and a contemporaneous corpus of online news articles from the Common Crawl. ChatterNet shows considerable improvements beyond recent state-of-the-art models of engagement prediction. Detailed studies controlling observation and prediction windows, over 43 different subreddits, yield further useful insights.

15 citations


Posted Content
TL;DR: This paper discusses how by just applying this training regime to a basic model like Complex gives near SOTA performance on all the datasets, and highlights how various multiplicative methods recently proposed in literature benefit from this trick and become indistinguishable in terms of performance on most datasets.
Abstract: Knowledge Base Completion has been a very active area recently, where multiplicative models have generally outperformed additive and other deep learning methods -- like GNN, CNN, path-based models. Several recent KBC papers propose architectural changes, new training methods, or even a new problem reformulation. They evaluate their methods on standard benchmark datasets - FB15k, FB15k-237, WN18, WN18RR, and Yago3-10. Recently, some papers discussed how 1-N scoring can speed up training and evaluation. In this paper, we discuss how by just applying this training regime to a basic model like Complex gives near SOTA performance on all the datasets -- we call this model COMPLEX-V2. We also highlight how various multiplicative methods recently proposed in literature benefit from this trick and become indistinguishable in terms of performance on most datasets. This paper calls for a reassessment of their individual value, in light of these findings.

13 citations


Posted Content
TL;DR: PermGNN is proposed, which aggregates neighbor features using a recurrent, order-sensitive aggregator and directly minimizes an LP loss while it is `attacked' by adversarial generator of neighbor permutations, and has more expressive power compared to earlier symmetric aggregators.
Abstract: After observing a snapshot of a social network, a link prediction (LP) algorithm identifies node pairs between which new edges will likely materialize in future. Most LP algorithms estimate a score for currently non-neighboring node pairs, and rank them by this score. Recent LP systems compute this score by comparing dense, low dimensional vector representations of nodes. Graph neural networks (GNNs), in particular graph convolutional networks (GCNs), are popular examples. For two nodes to be meaningfully compared, their embeddings should be indifferent to reordering of their neighbors. GNNs typically use simple, symmetric set aggregators to ensure this property, but this design decision has been shown to produce representations with limited expressive power. Sequence encoders are more expressive, but are permutation sensitive by design. Recent efforts to overcome this dilemma turn out to be unsatisfactory for LP tasks. In response, we propose PermGNN, which aggregates neighbor features using a recurrent, order-sensitive aggregator and directly minimizes an LP loss while it is `attacked' by adversarial generator of neighbor permutations. By design, PermGNN{} has more expressive power compared to earlier symmetric aggregators. Next, we devise an optimization framework to map PermGNN's node embeddings to a suitable locality-sensitive hash, which speeds up reporting the top-$K$ most likely edges for the LP task. Our experiments on diverse datasets show that \our outperforms several state-of-the-art link predictors by a significant margin, and can predict the most likely edges fast.

6 citations


Journal ArticleDOI
TL;DR: This paper proposes R ef O r C ite, a new model that allows for copying of both the references from as well as the citations to an existing node, and differs from the Forest Fire model in ways that makes it amenable to mean-field analysis for degree distribution, triangle count, and densification.

6 citations


Proceedings ArticleDOI
01 Nov 2020
TL;DR: In this article, an iterative grid labeling (IGL) architecture is proposed for OpenIE, which treats OpenIE as a 2D grid labeling task and achieves state-of-the-art performance.
Abstract: A recent state-of-the-art neural open information extraction (OpenIE) system generates extractions iteratively, requiring repeated encoding of partial outputs. This comes at a significant computational cost. On the other hand,sequence labeling approaches for OpenIE are much faster, but worse in extraction quality. In this paper, we bridge this trade-off by presenting an iterative labeling-based system that establishes a new state of the art for OpenIE, while extracting 10x faster. This is achieved through a novel Iterative Grid Labeling (IGL) architecture, which treats OpenIE as a 2-D grid labeling task. We improve its performance further by applying coverage (soft) constraints on the grid at training time. Moreover, on observing that the best OpenIE systems falter at handling coordination structures, our OpenIE system also incorporates a new coordination analyzer built with the same IGL architecture. This IGL based coordination analyzer helps our OpenIE system handle complicated coordination structures, while also establishing a new state of the art on the task of coordination analysis, with a 12.3 pts improvement in F1 over previous analyzers. Our OpenIE system - OpenIE6 - beats the previous systems by as much as 4 pts in F1, while being much faster.

6 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: This paper presents an effective, feature and structure-aware, end-to-end trainable neural match scoring system for graphs, and shows the efficacy of this method, compared to competitive baseline approaches.
Abstract: Graph retrieval from a large corpus of graphs has a wide variety of applications, e.g., sentence retrieval using words and dependency parse trees for question answering, image retrieval using scene graphs, and molecule discovery from a set of existing molecular graphs. In such graph search applications, nodes, edges and associated features bear distinctive physical significance. Therefore, a unified, trainable search model that efficiently returns corpus graphs that are highly relevant to a query graph has immense potential impact. In this paper, we present an effective, feature and structure-aware, end-to-end trainable neural match scoring system for graphs. We achieve this by constructing the product graph between the query and a candidate graph in the corpus, and then conduct a family of random walks on the product graph, which are then aggregated into the match score, using a network whose parameters can be trained. Experiments show the efficacy of our method, compared to competitive baseline approaches.

4 citations


Proceedings ArticleDOI
05 Jan 2020
TL;DR: The architecture of the Kauwa-Kaate system for fact-checking articles is outlined, which supports querying based on text as well as on images and video; the latter features are very important since many fake news articles are based on image and video.
Abstract: Fake news spread via social media is a major problem today. It is not easy with current-generation tools to check if a particular article is genuine or contains fake news. While there are many Web sites today that debunk viral fake news, checking if a particular article has been debunked or is true is not easy for an end-user. Search engines like Google do not make it easy to check a complete article since they limit the number of query keywords. In this paper, we outline the architecture of the Kauwa-Kaate system for fact-checking articles. Queried articles are searched against articles crawled from fact-checking sites, as well as against articles crawled from trusted news sites. Our system supports querying based on text as well as on images and video; the latter features are very important since many fake news articles are based on images and videos. We also describe the user interfaces which we will use to demonstrate the Kauwa-Kaate system.

Proceedings ArticleDOI
20 Apr 2020
TL;DR: This work will review cross-community co-evolution of question answering (QA) with the advent of large-scale knowledge graphs, continuous representations of text and graphs, and deep sequence analysis, and the corpus counterpart of such strategies has been proposed.
Abstract: We will review cross-community co-evolution of question answering (QA) with the advent of large-scale knowledge graphs (KGs), continuous representations of text and graphs, and deep sequence analysis. Early QA systems were information retrieval (IR) systems enhanced to extract named entity spans from high-scoring passages. Starting with WordNet, a series of structured curations of language and world knowledge, called KGs, enabled further improvements. Corpus is unstructured and messy to exploit for QA. If a question can be answered using the KG alone, it is attractive to ‘interpret’ the free-form question into a structured query, which is then executed on the structured KG. This process is called KGQA. Answers can be high-quality and explainable if the KG has an answer, but manual curation results in low coverage. KGs were soon found useful to harness corpus information. Named entity mention spans could be tagged with fine-grained types (e.g., scientist), or even specific entities (e.g., Einstein). The QA system can learn to decompose a query into functional parts, e.g., “which scientist” and “played the violin”. With increasing success of such systems, ambition grew to address multi-hop or multi-clause queries, e.g., “the father of the director of La La Land teaches at which university?” or “who directed an award-winning movie and is the son of a Princeton University professor?” Questions limited to simple path traversals in KGs have been encoded to a vector representation, which a decoder then uses to guide the KG traversal. Recently the corpus counterpart of such strategies has also been proposed. However, for general multi-clause queries that do not necessarily translate to paths, and seek to bind multiple variables to satisfy multiple clauses, or involve logic, comparison, aggregation and other arithmetic, neural programmer-interpreter systems have seen some success. Our key focus will be on identifying situations where manual introduction of structural bias is essential for accuracy, as against cases where sufficient data can get around distant or no supervision.

Posted Content
TL;DR: The authors proposed IMoJIE, an extension to CopyAttention, which produces the next extraction conditioned on all previously extracted tuples, resulting in a variable number of diverse extractions per sentence.
Abstract: While traditional systems for Open Information Extraction were statistical and rule-based, recently neural models have been introduced for the task. Our work builds upon CopyAttention, a sequence generation OpenIE model (Cui et. al., 2018). Our analysis reveals that CopyAttention produces a constant number of extractions per sentence, and its extracted tuples often express redundant information. We present IMoJIE, an extension to CopyAttention, which produces the next extraction conditioned on all previously extracted tuples. This approach overcomes both shortcomings of CopyAttention, resulting in a variable number of diverse extractions per sentence. We train IMoJIE on training data bootstrapped from extractions of several non-neural systems, which have been automatically filtered to reduce redundancy and noise. IMoJIE outperforms CopyAttention by about 18 F1 pts, and a BERT-based strong baseline by 2 F1 pts, establishing a new state of the art for the task.

Posted Content
TL;DR: ChatterNet as discussed by the authors proposes a time-evolving recurrent network architecture that captures both temporal properties within news and discussions, as well as the influence of news on discussions to predict user engagement.
Abstract: Modeling user engagement dynamics on social media has compelling applications in user-persona detection and political discourse mining. Most existing approaches depend heavily on knowledge of the underlying user network. However, a large number of discussions happen on platforms that either lack any reliable social network or reveal only partially the inter-user ties (Reddit, Stackoverflow). Many approaches require observing a discussion for some considerable period before they can make useful predictions. In real-time streaming scenarios, observations incur costs. Lastly, most models do not capture complex interactions between exogenous events (such as news articles published externally) and in-network effects (such as follow-up discussions on Reddit) to determine engagement levels. To address the three limitations noted above, we propose a novel framework, ChatterNet, which, to our knowledge, is the first that can model and predict user engagement without considering the underlying user network. Given streams of timestamped news articles and discussions, the task is to observe the streams for a short period leading up to a time horizon, then predict chatter: the volume of discussions through a specified period after the horizon. ChatterNet processes text from news and discussions using a novel time-evolving recurrent network architecture that captures both temporal properties within news and discussions, as well as the influence of news on discussions. We report on extensive experiments using a two-month-long discussion corpus of Reddit, and a contemporaneous corpus of online news articles from the Common Crawl. ChatterNet shows considerable improvements beyond recent state-of-the-art models of engagement prediction. Detailed studies controlling observation and prediction windows, over 43 different subreddits, yield further useful insights.

Posted Content
TL;DR: This paper presents TIMEPLEX, a novel time-aware KBC method, that also automatically exploits the recurrent nature of some relations and temporal interactions between pairs of relations, and achieves state-of-the-art performance on both prediction tasks.
Abstract: Temporal knowledge bases associate relational (s,r,o) triples with a set of times (or a single time instant) when the relation is valid. While time-agnostic KB completion (KBC) has witnessed significant research, temporal KB completion (TKBC) is in its early days. In this paper, we consider predicting missing entities (link prediction) and missing time intervals (time prediction) as joint TKBC tasks where entities, relations, and time are all embedded in a uniform, compatible space. We present TIMEPLEX, a novel time-aware KBC method, that also automatically exploits the recurrent nature of some relations and temporal interactions between pairs of relations. TIMEPLEX achieves state-of-the-art performance on both prediction tasks. We also find that existing TKBC models heavily overestimate link prediction performance due to imperfect evaluation mechanisms. In response, we propose improved TKBC evaluation protocols for both link and time prediction tasks, dealing with subtle issues that arise from the partial overlap of time intervals in gold instances and system predictions.

Proceedings ArticleDOI
01 Nov 2020
TL;DR: In this paper, an unsupervised, corpus-based sketch to register to the service is used, and the server modifies its network mildly to accommodate client sketches, and occasionally trains the augmented network over existing clients.
Abstract: State-of-the-art NLP inference uses enormous neural architectures and models trained for GPU-months, well beyond the reach of most consumers of NLP. This has led to one-size-fits-all public API-based NLP service models by major AI companies, serving millions of clients. They cannot afford traditional fine tuning for individual clients. Many clients cannot even afford significant fine tuning, and own little or no labeled data. Recognizing that word usage and salience diversity across clients leads to reduced accuracy, we initiate a study of practical and lightweight adaptation of centralized NLP services to clients. Each client uses an unsupervised, corpus-based sketch to register to the service. The server modifies its network mildly to accommodate client sketches, and occasionally trains the augmented network over existing clients. When a new client registers with its sketch, it gets immediate accuracy benefits. We demonstrate the proposed architecture using sentiment labeling, NER, and predictive language modeling.

Posted Content
TL;DR: This work initiates a study of practical and lightweight adaptation of centralized NLP services to clients, and demonstrates the proposed architecture using sentiment labeling, NER, and predictive language modeling.
Abstract: State-of-the-art NLP inference uses enormous neural architectures and models trained for GPU-months, well beyond the reach of most consumers of NLP. This has led to one-size-fits-all public API-based NLP service models by major AI companies, serving large numbers of clients. Neither (hardware deficient) clients nor (heavily subscribed) servers can afford traditional fine tuning. Many clients own little or no labeled data. We initiate a study of adaptation of centralized NLP services to clients, and present one practical and lightweight approach. Each client uses an unsupervised, corpus-based sketch to register to the service. The server uses an auxiliary network to map the sketch to an abstract vector representation, which then informs the main labeling network. When a new client registers with its sketch, it gets immediate accuracy benefits. We demonstrate the success of the proposed architecture using sentiment labeling, NER, and predictive language modeling