scispace - formally typeset
Search or ask a question
Author

Srikanta Bedathur

Bio: Srikanta Bedathur is an academic researcher from Indian Institute of Technology Delhi. The author has contributed to research in topics: Computer science & SPARQL. The author has an hindex of 21, co-authored 108 publications receiving 1680 citations. Previous affiliations of Srikanta Bedathur include IBM & Indraprastha Institute of Information Technology.


Papers
More filters
Posted Content
TL;DR: DataVizard as mentioned in this paper is a system that automatically recommends the most appropriate visual presentation for the structured result of a structured query such as SQL and a data table with an associated short description (e.g., tables from the Web).
Abstract: Selecting the appropriate visual presentation of the data such that it preserves the semantics of the underlying data and at the same time provides an intuitive summary of the data is an important, often the final step of data analytics. Unfortunately, this is also a step involving significant human effort starting from selection of groups of columns in the structured results from analytics stages, to the selection of right visualization by experimenting with various alternatives. In this paper, we describe our \emph{DataVizard} system aimed at reducing this overhead by automatically recommending the most appropriate visual presentation for the structured result. Specifically, we consider the following two scenarios: first, when one needs to visualize the results of a structured query such as SQL; and the second, when one has acquired a data table with an associated short description (e.g., tables from the Web). Using a corpus of real-world database queries (and their results) and a number of statistical tables crawled from the Web, we show that DataVizard is capable of recommending visual presentations with high accuracy. We also present the results of a user survey that we conducted in order to assess user views of the suitability of the presented charts vis-a-vis the plain text captions of the data.

1 citations

01 Jan 2010
TL;DR: The aim is to efficiently identify interesting time points in Web archives with an assumption that the authors receive a result list for a given query in standard relevance-order from an existing retrieval system, and an early termination technique which is proven to be very effective.
Abstract: Large scale text archives are increasingly becoming available on the Web. Exploring their evolving contents along both text and temporal dimensions enables us to realize their full potential. Standard keyword queries facilitate exploration along the text dimension only. Recently proposed time-travel keyword queries enable query processing along both dimensions, but require the user to be aware of the exact time point of interest. This may be impractical if the user does not know the history of the query within the collection or is not familiar with the topic. In this work, our aim is to efficiently identify interesting time points in Web archives with an assumption that we receive a result list for a given query in standard relevance-order from an existing retrieval system. We consider two forms of Web archives: (i) one where documents have a publication time-stamp and never change (such as news archives), and (ii) the archives where documents undergo revisions, and are thus versioned. In both settings, we define interestingness as the change in top-k result set of two consecutive time-points. The key step in our solution is the maintenance of top-k results valid at each time-point of the archive, which can then be used to compute the interestingness scores for the time-points. We propose two techniques to realize efficient identification of interesting time points: (i) For the case when documents once published never change, we have a simple but effective technique. (ii) For the more general case with versioned documents, we develop an extension to the segment tree which makes it rank-aware and dynamic. To further improve efficiency, we propose an early termination technique which is proven to be very effective. Our methods are shown to be effective in efficiently finding interesting time points in a set of experiments using the New York Times news archive and the Wikipedia versioned archive.

1 citations

Posted Content
TL;DR: RotatE-Box as discussed by the authors is a combination of RotatE and box embeddings for answering regular expression queries (containing disjunction and Kleene plus operators) over incomplete KBs.
Abstract: We propose the novel task of answering regular expression queries (containing disjunction ($\vee$) and Kleene plus ($+$) operators) over incomplete KBs. The answer set of these queries potentially has a large number of entities, hence previous works for single-hop queries in KBC that model a query as a point in high-dimensional space are not as effective. In response, we develop RotatE-Box -- a novel combination of RotatE and box embeddings. It can model more relational inference patterns compared to existing embedding based models. Furthermore, we define baseline approaches for embedding based KBC models to handle regex operators. We demonstrate performance of RotatE-Box on two new regex-query datasets introduced in this paper, including one where the queries are harvested based on actual user query logs. We find that our final RotatE-Box model significantly outperforms models based on just RotatE and just box embeddings.
Journal ArticleDOI
TL;DR: This work introduces a novel hierarchical data structure called BloomSampleTree that helps us design efficient algorithms to extract an almost uniform sample from the set stored in a Bloom filter and also allows us to reconstruct the set efficiently.
Abstract: In this paper, we address the problem of sampling from a set and reconstructing a set stored as a Bloom filter. To the best of our knowledge our work is the first to address this question. We introduce a novel hierarchical data structure called BloomSampleTree that helps us design efficient algorithms to extract an almost uniform sample from the set stored in a Bloom filter and also allows us to reconstruct the set efficiently. In the case where the hash functions used in the Bloom filter implementation are partially invertible, in the sense that it is easy to calculate the set of elements that map to a particular hash value, we propose a second, more space-efficient method called HashInvert for the reconstruction. We study the properties of these two methods both analytically as well as experimentally. We provide bounds on run times for both methods and sample quality for the BloomSampleTree based algorithm, and show through an extensive experimental evaluation that our methods are efficient and effective.
Proceedings ArticleDOI
02 Jul 2018
TL;DR: A graph-based approach is devised called DeepAntara1 and its performance for change tracking task over multiple sentence pairs extracted from different versions of publicly available financial CRS treaties is shown.
Abstract: Businesses need to adhere to certain regulations to remain compliant They want to expand or move to a new geography and find itself subject to a slightly different set of regulations Also, regulations themselves change over time and force the business to change its internal working to remain compliant When a compliance officer is presented with a new regulatory document, he has to manually compare corresponding sentences between previous and the new version While most studies in text mining have focused on measuring textual similarity, textual entailment detection and paraphrase identification etc, there has been very little focus on the problem of change tracking (CT) Change tracking can be defined as the task of identifying the phrase pair(s) that captures the semantic difference between two given sentences, and plays an important role in domains such as financial regulatory compliance where core changes introduced by regulators to existing regulations need to identified quickly Naturally, the change tracking has to satisfy the minimality and comprehen-siveness requirements even in presence of complex language structure, context dependence and paraphrasing between com-pared sentences In this paper, we address these challenges and devise a graph-based approach called DeepAntara1 and show its performance for change tracking task over multiple sentence pairs extracted from different versions of publicly available financial CRS treaties

Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: Recent progress about link prediction algorithms is summarized, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods.
Abstract: Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labeled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.

2,530 citations

Journal ArticleDOI
TL;DR: YAGO2 as mentioned in this paper is an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space, and it contains 447 million facts about 9.8 million entities.

1,186 citations

Journal ArticleDOI
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.

912 citations