scispace - formally typeset
Search or ask a question
Author

Srikanta Bedathur

Bio: Srikanta Bedathur is an academic researcher from Indian Institute of Technology Delhi. The author has contributed to research in topics: Computer science & SPARQL. The author has an hindex of 21, co-authored 108 publications receiving 1680 citations. Previous affiliations of Srikanta Bedathur include IBM & Indraprastha Institute of Information Technology.


Papers
More filters
01 Jan 2009
TL;DR: The goal is to build a scalable peer-to-peer framework for web archival and to further support time-travel search over it with an initial design with crawling, persistent storage and indexing and the partitioning strategies for historical analysis of data are analyzed.
Abstract: The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. The goal is to build a scalable peer-to-peer framework for web archival and to further support time-travel search over it.We provide an initial design with crawling, persistent storage and indexing and also analyze the partitioning strategies for historical analysis of data. Peer-to-peer (p2p) systems are a nice fit here but they suffer from churn and communication overhead and hence require controlled replication for availability and load balancing. The core of the contribution is of index organization by temporally partitioning the time-travel index lists for supporting efficient time-travel search. We also analyze the partitioning strategies in terms of improving replication to improve availability while still keeping the overall blowup if the index in check. We present various heuristic approaches with detailed experimental analysis exploring the nature of partitioning algorithms in a distributed setting.

1 citations

Book ChapterDOI
11 May 2021
TL;DR: In this article, the authors propose a dual-network Hawkes process (DNHP) to model bursty diffusion of text-based events over a social network of user nodes, where closeness of nodes is captured using topic-topic, a user-user, and user-topic interactions.
Abstract: We address the problem of modeling bursty diffusion of text-based events over a social network of user nodes. The purpose is to recover, disentangle and analyze overlapping social conversations from the perspective of user-topic preferences, user-user connection strengths and, importantly, topic transitions. For this, we propose a Dual-Network Hawkes Process (DNHP), which executes over a graph whose nodes are user-topic pairs, and closeness of nodes is captured using topic-topic, a user-user, and user-topic interactions. No existing Hawkes Process model captures such multiple interactions simultaneously. Additionally, unlike existing Hawkes Process based models, where event times are generated first, and event topics are conditioned on the event times, the DNHP is more faithful to the underlying social process by making the event times depend on interacting (user, topic) pairs. We develop a Gibbs sampling algorithm for estimating the three network parameters that allows evidence to flow between the parameter spaces. Using experiments over large real collection of tweets by US politicians, we show that the DNHP generalizes better than state of the art models, and also provides interesting insights about user and topic transitions.

1 citations

Posted Content
TL;DR: IterefinE as discussed by the authors uses ontological information and inferences rules to improve the quality of KGs and infer higher quality new facts in downstream application tasks such as KG-based question answering.
Abstract: Knowledge Graphs (KGs) extracted from text sources are often noisy and lead to poor performance in downstream application tasks such as KG-based question answering.While much of the recent activity is focused on addressing the sparsity of KGs by using embeddings for inferring new facts, the issue of cleaning up of noise in KGs through KG refinement task is not as actively studied. Most successful techniques for KG refinement make use of inference rules and reasoning over ontologies. Barring a few exceptions, embeddings do not make use of ontological information, and their performance in KG refinement task is not well understood. In this paper, we present a KG refinement framework called IterefinE which iteratively combines the two techniques - one which uses ontological information and inferences rules, PSL-KGI, and the KG embeddings such as ComplEx and ConvE which do not. As a result, IterefinE is able to exploit not only the ontological information to improve the quality of predictions, but also the power of KG embeddings which (implicitly) perform longer chains of reasoning. The IterefinE framework, operates in a co-training mode and results in explicit type-supervised embedding of the refined KG from PSL-KGI which we call as TypeE-X. Our experiments over a range of KG benchmarks show that the embeddings that we produce are able to reject noisy facts from KG and at the same time infer higher quality new facts resulting in up to 9% improvement of overall weighted F1 score

1 citations

Journal ArticleDOI
TL;DR: In this paper, the problem of sampling from a set and reconstructing a set stored as a Bloom filter is addressed, and a hierarchical data structure called BloomSampleTree is introduced to extract an almost uniform sample from the set.
Abstract: In this paper, we address the problem of sampling from a set and reconstructing a set stored as a Bloom filter. To the best of our knowledge our work is the first to address this question. We introduce a novel hierarchical data structure called $\mathsf{BloomSampleTree}$ that helps us design efficient algorithms to extract an almost uniform sample from the set stored in a Bloom filter and also allows us to reconstruct the set efficiently. In the case where the hash functions used in the Bloom filter implementation are partially invertible, in the sense that it is easy to calculate the set of elements that map to a particular hash value, we propose a second, more space-efficient method called HashInvert for the reconstruction. We study the properties of these two methods both analytically as well as experimentally. We provide bounds on run times for both methods and sample quality for the $\mathsf{BloomSampleTree}$ based algorithm, and show through an extensive experimental evaluation that our methods are efficient and effective.

1 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: Recent progress about link prediction algorithms is summarized, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods.
Abstract: Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labeled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.

2,530 citations

Journal ArticleDOI
TL;DR: YAGO2 as mentioned in this paper is an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space, and it contains 447 million facts about 9.8 million entities.

1,186 citations

Journal ArticleDOI
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.

912 citations