Showing papers by "Srikanta Bedathur published in 2021"

PDF

Open Access

Proceedings Article•DOI•

Fast One-class Classification using Class Boundary-preserving Random Projections

[...]

Arindam Bhattacharya¹, Sumanth Varambally¹, Amitabha Bagchi¹, Srikanta Bedathur¹•Institutions (1)

14 Aug 2021

TL;DR: Fast Random projection-based one-class classification (FROCC) as mentioned in this paper is an efficient, scalable and easily parallelizable method for one classifier with provable theoretical guarantees, which transforms the training data by projecting it onto a set of random unit vectors that are chosen uniformly and independently from the unit sphere.

...read moreread less

Abstract: Several applications, like malicious URL detection and web spam detection, require classification on very high-dimensional data. In such cases anomalous data is hard to find but normal data is easily available. As such it is increasingly common to use a one-class classifier (OCC). Unfortunately, most OCC algorithms cannot scale to datasets with extremely high dimensions. In this paper, we present Fast Random projection-based One-Class Classification (FROCC), an extremely efficient, scalable and easily parallelizable method for one-class classification with provable theoretical guarantees. Our method is based on the simple idea of transforming the training data by projecting it onto a set of random unit vectors that are chosen uniformly and independently from the unit sphere, and bounding the regions based on separation of the data. FROCC can be naturally extended with kernels. We provide a new theoretical framework to prove that that FROCC generalizes well in the sense that it is stable and has low bias for some parameter settings. We then develop a fast scalable approximation of FROCC using vectorization, exploiting data sparsity and parallelism to develop a new implementation called ParDFROCC. ParDFROCC achieves up to 2 percent points better ROC than the next best baseline, with up to 12× speedup in training and test times over a range of state-of-the-art benchmarks for the OCC task.

...read moreread less

8 citations

Proceedings Article•DOI•

Region Invariant Normalizing Flows for Mobility Transfer

[...]

Vinayak Gupta¹, Srikanta Bedathur¹•Institutions (1)

Indian Institute of Technology Delhi¹

26 Oct 2021

TL;DR: In this article, a transfer learning framework called Reformd is proposed for continuous-time location prediction for regions with sparse checkin data, where the authors model user-specific checkin-sequences in a region using a marked temporal point process (MTPP) with normalizing flows to learn the inter-checkin time and geo-distributions.

...read moreread less

Abstract: There exists a high variability in mobility data volumes across different regions, which deteriorates the performance of spatial recommender systems that rely on region-specific data. In this paper, we propose a novel transfer learning framework called Reformd, for continuous-time location prediction for regions with sparse checkin data. Specifically, we model user-specific checkin-sequences in a region using a marked temporal point process (MTPP) with normalizing flows to learn the inter-checkin time and geo-distributions. Later, we transfer the model parameters of spatial and temporal flows trained on a data-rich origin region for the next check-in and time prediction in a target region with scarce checkin data. We capture the evolving region-specific checkin dynamics for MTPP and spatial-temporal flows by maximizing the joint likelihood of next checkin with three channels (1) checkin-category prediction, (2) checkin-time prediction, and (3) travel distance prediction. Extensive experiments on different user mobility datasets across the U.S. and Japan show that our model significantly outperforms state-of-the-art methods for modeling continuous-time sequences. Moreover, we also show that Reformd can be easily adapted for product recommendations i.e., sequences without any spatial component.

...read moreread less

8 citations

Proceedings Article•DOI•

2nd International Workshop on Data Quality Assessment for Machine Learning

[...]

Hima Patel¹, Fuyuki Ishikawa², Laure Berti-Equille, Nitin Gupta¹, Sameep Mehta¹, Satoshi Masuda¹, Shashank Mujumdar¹, Shazia Afzal¹, Srikanta Bedathur³, Yasuharu Nishi⁴ - Show less +6 more•Institutions (4)

IBM¹, National Institute of Informatics², Indian Institute of Technology Delhi³, University of Electro-Communications⁴

14 Aug 2021

TL;DR: The 2nd International Workshop on Data Quality Assessment for Machine Learning (DQAML'21) is organized in conjunction with the Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) as discussed by the authors.

...read moreread less

Abstract: The 2nd International Workshop on Data Quality Assessment for Machine Learning (DQAML'21) is organized in conjunction with the Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). This workshop aims to serve as a forum for the presentation of research related to data quality assessment and remediation in AI/ML pipeline. Data quality is a critical issue in the data preparation phase and involves numerous challenging problems related to detection, remediation, visualization and evaluation of data issues. The workshop aims to provide a platform to researchers and practitioners to discuss such challenges across different modalities of data like structured, time series, text and graphical. The aim is to attract perspectives from both industrial and academic circles.

...read moreread less

3 citations

Proceedings Article•

Learning Temporal Point Processes with Intermittent Observations

[...]

Vinayak Gupta¹, Srikanta Bedathur¹, Sourangshu Bhattacharya², Abir De³•Institutions (3)

Indian Institute of Technology Delhi¹, Indian Institute of Technology Kharagpur², Indian Institute of Technology Bombay³

18 Mar 2021

3 citations

Book Chapter•DOI•

1st International Workshop on Data Assessment and Readiness for AI

[...]

Bortik Bandyopadhyay¹, Sambaran Bandyopadhyay², Srikanta Bedathur³, Nitin Gupta², Sameep Mehta², Shashank Mujumdar², Srinivasan Parthasarathy⁴, Hima Patel² - Show less +4 more•Institutions (4)

Apple Inc.¹, IBM², Indian Institute of Technology Delhi³, Ohio State University⁴

11 May 2021

TL;DR: In the last several years, AI/ML technologies have become pervasive in academia and industry, finding its utility in newer and challenging applications as discussed by the authors, and they have become the basis for many new and interesting applications.

...read moreread less

Abstract: In the last several years, AI/ML technologies have become pervasive in academia and industry, finding its utility in newer and challenging applications.

...read moreread less

2 citations

Proceedings Article•DOI•

Computing and Maintaining Provenance of Query Result Probabilities in Uncertain Knowledge Graphs

[...]

Garima Gaur¹, Abhishek Dang¹, Arnab Bhattacharya¹, Srikanta Bedathur²•Institutions (2)

Indian Institute of Technology Kanpur¹, Indian Institute of Technology Delhi²

26 Oct 2021

TL;DR: In this paper, the authors propose a system, HAPPI (How Provenance of Probabilistic Inference), to handle query processing and inference over probabilistic knowledge graphs.

...read moreread less

Abstract: Knowledge graphs (KG) model relationships between entities as labeled edges (or facts). They are mostly constructed using a suite of automated extractors, thereby inherently leading to uncertainty in the extracted facts. Modeling the uncertainty as probabilistic confidence scores results in a probabilistic knowledge graph. Graph queries over such probabilistic KGs require answer computation along with the computation of result probabilities, i.e., probabilistic inference. We propose a system, HAPPI (How Provenance of Probabilistic Inference), to handle such query processing and inference. Complying with the standard provenance semiring model, we propose a novel commutative semiring to symbolically compute the probability of the result of a query. These provenance-polynomial-like symbolic expressions encode fine-grained information about the probability computation process. We leverage this encoding to efficiently compute as well as maintain probabilities of results even as the underlying KG changes. Focusing on conjunctive basic graph pattern queries, we observe that HAPPI is more efficient than knowledge compilation for answering commonly occurring queries with lower range of probability derivation complexity. We propose an adaptive system that leverages the strengths of both HAPPI and compilation based techniques, for not only to perform efficient probabilistic inference and compute their provenance, but also to incrementally maintain them.

...read moreread less

1 citations

Posted Content•

BERT Meets Relational DB: Contextual Representations of Relational Databases.

[...]

Siddhant Arora, Vinayak Gupta, Garima Gaur, Srikanta Bedathur

30 Apr 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors address the problem of learning low-dimensional representation of entities on relational databases consisting of multiple tables and propose an attention-based model to learn embeddings for entities in the relational database.

...read moreread less

Abstract: In this paper, we address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables. Embeddings help to capture semantics encoded in the database and can be used in a variety of settings like auto-completion of tables, fully-neural query processing of relational joins queries, seamlessly handling missing values, and more. Current work is restricted to working with just single table, or using pretrained embeddings over an external corpus making them unsuitable for use in real-world databases. In this work, we look into ways of using these attention-based model to learn embeddings for entities in the relational database. We are inspired by BERT style pretraining methods and are interested in observing how they can be extended for representation learning on structured databases. We evaluate our approach of the autocompletion of relational databases and achieve improvement over standard baselines.

...read moreread less

1 citations

Posted Content•

Region Invariant Normalizing Flows for Mobility Transfer

[...]

Vinayak Gupta¹, Srikanta Bedathur¹•Institutions (1)

Indian Institute of Technology Delhi¹

13 Sep 2021-arXiv: Learning

TL;DR: The authorsMD The authors proposes a transfer learning framework for continuous-time location prediction for regions with sparse checkin data, which learns the inter-checkin time and geo-distributions by maximizing the joint likelihood of next checkin with three channels (1) checkin category prediction, 2) check-in time prediction, 3) travel distance prediction).

...read moreread less

Abstract: There exists a high variability in mobility data volumes across different regions, which deteriorates the performance of spatial recommender systems that rely on region-specific data. In this paper, we propose a novel transfer learning framework called REFORMD, for continuous-time location prediction for regions with sparse checkin data. Specifically, we model user-specific checkin-sequences in a region using a marked temporal point process (MTPP) with normalizing flows to learn the inter-checkin time and geo-distributions. Later, we transfer the model parameters of spatial and temporal flows trained on a data-rich origin region for the next check-in and time prediction in a target region with scarce checkin data. We capture the evolving region-specific checkin dynamics for MTPP and spatial-temporal flows by maximizing the joint likelihood of next checkin with three channels (1) checkin-category prediction, (2) checkin-time prediction, and (3) travel distance prediction. Extensive experiments on different user mobility datasets across the U.S. and Japan show that our model significantly outperforms state-of-the-art methods for modeling continuous-time sequences. Moreover, we also show that REFORMD can be easily adapted for product recommendations i.e., sequences without any spatial component.

...read moreread less

1 citations

Posted Content•

Computing and Maintaining Provenance of Query Result Probabilities in Uncertain Knowledge Graphs

[...]

Garima Gaur¹, Abhishek Dang¹, Arnab Bhattacharya¹, Srikanta Bedathur²•Institutions (2)

Indian Institute of Technology Kanpur¹, Indian Institute of Technology Delhi²

17 Aug 2021-arXiv: Databases

TL;DR: In this paper, the authors propose a system called HAPPI (How Provenance of Probabilistic Inference) to handle query processing over probabilistic knowledge graphs.

...read moreread less

Abstract: Knowledge graphs (KG) that model the relationships between entities as labeled edges (or facts) in a graph are mostly constructed using a suite of automated extractors, thereby inherently leading to uncertainty in the extracted facts. Modeling the uncertainty as probabilistic confidence scores results in a probabilistic knowledge graph. Graph queries over such probabilistic KGs require answer computation along with the computation of those result probabilities, aka, probabilistic inference. We propose a system, HAPPI (How Provenance of Probabilistic Inference), to handle such query processing. Complying with the standard provenance semiring model, we propose a novel commutative semiring to symbolically compute the probability of the result of a query. These provenance-polynomiallike symbolic expressions encode fine-grained information about the probability computation process. We leverage this encoding to efficiently compute as well as maintain the probability of results as the underlying KG changes. Focusing on a popular class of conjunctive basic graph pattern queries on the KG, we compare the performance of HAPPI against a possible-world model of computation and a knowledge compilation tool over two large datasets. We also propose an adaptive system that leverages the strengths of both HAPPI and compilation based techniques. Since existing systems for probabilistic databases mostly focus on query computation, they default to re-computation when facts in the KG are updated. HAPPI, on the other hand, does not just perform probabilistic inference and maintain their provenance, but also provides a mechanism to incrementally maintain them as the KG changes. We extend this maintainability as part of our proposed adaptive system.

...read moreread less

1 citations

Book Chapter•DOI•

Analyzing Topic Transitions in Text-Based Social Cascades Using Dual-Network Hawkes Process.

[...]

Jayesh Choudhari¹, Srikanta Bedathur², Indrajit Bhattacharya, Anirban Dasgupta³•Institutions (3)

University of Warwick¹, Indian Institute of Technology Delhi², Indian Institute of Technology Gandhinagar³

11 May 2021

TL;DR: In this article, the authors propose a dual-network Hawkes process (DNHP) to model bursty diffusion of text-based events over a social network of user nodes, where closeness of nodes is captured using topic-topic, a user-user, and user-topic interactions.

...read moreread less

Abstract: We address the problem of modeling bursty diffusion of text-based events over a social network of user nodes. The purpose is to recover, disentangle and analyze overlapping social conversations from the perspective of user-topic preferences, user-user connection strengths and, importantly, topic transitions. For this, we propose a Dual-Network Hawkes Process (DNHP), which executes over a graph whose nodes are user-topic pairs, and closeness of nodes is captured using topic-topic, a user-user, and user-topic interactions. No existing Hawkes Process model captures such multiple interactions simultaneously. Additionally, unlike existing Hawkes Process based models, where event times are generated first, and event topics are conditioned on the event times, the DNHP is more faithful to the underlying social process by making the event times depend on interacting (user, topic) pairs. We develop a Gibbs sampling algorithm for estimating the three network parameters that allows evidence to flow between the parameter spaces. Using experiments over large real collection of tweets by US politicians, we show that the DNHP generalizes better than state of the art models, and also provides interesting insights about user and topic transitions.

...read moreread less

1 citations

Posted Content•

Tracking entities in technical procedures - a new dataset and baselines.

[...]

Saransh Goyal, Pratyush Pandey, Garima Gaur, Subhalingam D, Srikanta Bedathur, Maya Ramanath - Show less +2 more

15 Apr 2021-arXiv: Computation and Language

TL;DR: The TechTrack dataset as mentioned in this paper ) is a dataset for tracking entities in technical procedures, which consists of 1351 procedures annotated with open domain articles from WikiHow and contains more than 1200 unique entities with an average of 4.7 entities per procedure.

...read moreread less

Abstract: We introduce TechTrack, a new dataset for tracking entities in technical procedures. The dataset, prepared by annotating open domain articles from WikiHow, consists of 1351 procedures, e.g., "How to connect a printer", identifies more than 1200 unique entities with an average of 4.7 entities per procedure. We evaluate the performance of state-of-the-art models on the entity-tracking task and find that they are well below the human annotation performance. We describe how TechTrack can be used to take forward the research on understanding procedures from temporal texts.

...read moreread less