scispace - formally typeset
Search or ask a question
Author

Srijan Kumar

Bio: Srijan Kumar is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Computer science & Misinformation. The author has an hindex of 20, co-authored 57 publications receiving 2529 citations. Previous affiliations of Srijan Kumar include University of Maryland, College Park & Association for Computing Machinery.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
21 Mar 2016
TL;DR: This paper expands the mining strategy space to include novel "stubborn" strategies that, for a large range of parameters, earn the miner more revenue, and shows how a miner can further amplify its gain by non-trivially composing mining attacks with network-level eclipse attacks.
Abstract: Selfish mining, originally discovered by Eyal et al. [9], is a well-known attack where a selfish miner, under certain conditions, can gain a disproportionate share of reward by deviating from the honest behavior. In this paper, we expand the mining strategy space to include novel "stubborn" strategies that, for a large range of parameters, earn the miner more revenue. Consequently, we show that the selfish mining attack is not (in general) optimal. Further, we show how a miner can further amplify its gain by non-trivially composing mining attacks with network-level eclipse attacks. We show, surprisingly, that given the attacker's best strategy, in some cases victims of an eclipse attack can actually benefit from being eclipsed!

420 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This paper proposes two novel measures of node behavior: the goodness of a node intuitively captures how much this node is liked/trusted by other nodes, while the fairness of a nodes captures how fair the node is in rating other nodes' likeability or trust level.
Abstract: Weighted signed networks (WSNs) are networks in which edges are labeled with positive and negative weights. WSNs can capture like/dislike, trust/distrust, and other social relationships between people. In this paper, we consider the problem of predicting the weights of edges in such networks. We propose two novel measures of node behavior: the goodness of a node intuitively captures how much this node is liked/trusted by other nodes, while the fairness of a node captures how fair the node is in rating other nodes' likeability or trust level. We provide axioms that these two notions need to satisfy and show that past work does not meet these requirements for WSNs. We provide a mutually recursive definition of these two concepts and prove that they converge to a unique solution in linear time. We use the two measures to predict the edge weight in WSNs. Furthermore, we show that when compared against several individual algorithms from both the signed and unsigned social network literature, our fairness and goodness metrics almost always have the best predictive power. We then use these as features in different multiple regression models and show that we can predict edge weights on 2 Bitcoin WSNs, an Epinions WSN, 2 WSNs derived from Wikipedia, and a WSN derived from Twitter with more accurate results than past work. Moreover, fairness and goodness metrics form the most significant feature for prediction in most (but not all) cases.

372 citations

Proceedings ArticleDOI
TL;DR: JODIE as mentioned in this paper employs two recurrent neural networks to update the embedding of a user and an item at every interaction, which can be used to predict future user-item interactions.
Abstract: Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space. Here we propose JODIE, a coupled recurrent neural network model that learns the embedding trajectories of users and items. JODIE employs two recurrent neural networks to update the embedding of a user and an item at every interaction. Crucially, JODIE also models the future embedding trajectory of a user/item. To this end, it introduces a novel projection operator that learns to estimate the embedding of the user at any time in the future. These estimated embeddings are then used to predict future user-item interactions. To make the method scalable, we develop a t-Batch algorithm that creates time-consistent batches and leads to 9x faster training. We conduct six experiments to validate JODIE on two prediction tasks---future interaction prediction and state change prediction---using four real-world datasets. We show that JODIE outperforms six state-of-the-art algorithms in these tasks by at least 20% in predicting future interactions and 12% in state change prediction.

297 citations

Proceedings ArticleDOI
11 Apr 2016
TL;DR: This paper studies false information on Wikipedia by focusing on the hoax articles that have been created throughout its history, and assesses the real-world impact of hoax articles by measuring how long they survive before being debunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web.
Abstract: Wikipedia is a major source of information for many people. However, false information on Wikipedia raises concerns about its credibility. One way in which false information may be presented on Wikipedia is in the form of hoax articles, i.e., articles containing fabricated facts about nonexistent entities or events. In this paper we study false information on Wikipedia by focusing on the hoax articles that have been created throughout its history. We make several contributions. First, we assess the real-world impact of hoax articles by measuring how long they survive before being debunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web. We find that, while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web. Second, we characterize the nature of successful hoaxes by comparing them to legitimate articles and to failed hoaxes that were discovered shortly after being created. We find characteristic differences in terms of article structure and content, embeddedness into the rest of Wikipedia, and features of the editor who created the hoax. Third, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given article is a hoax. And finally, we describe and evaluate a task involving humans distinguishing hoaxes from non-hoaxes. We find that humans are not particularly good at the task and that our automated classifier outperforms them by a big margin.

290 citations

Proceedings ArticleDOI
02 Feb 2018
TL;DR: The REV2 algorithm is developed, a system to identify fraudulent users and outperforms nine existing algorithms in detecting fair and unfair users and is guaranteed to converge and has linear time complexity.
Abstract: Rating platforms enable large-scale collection of user opinion about items(e.g., products or other users). However, untrustworthy users give fraudulent ratings for excessive monetary gains. In this paper, we present REV2, a system to identify such fraudulent users. We propose three interdependent intrinsic quality metrics---fairness of a user, reliability of a rating and goodness of a product. The fairness and reliability quantify the trustworthiness of a user and rating, respectively, and goodness quantifies the quality of a product. Intuitively, a user is fair if it provides reliable scores that are close to the goodness of products. We propose six axioms to establish the interdependency between the scores, and then, formulate a mutually recursive definition that satisfies these axioms. We extend the formulation to address cold start problem and incorporate behavior properties. We develop the REV2 algorithm to calculate these intrinsic quality scores for all users, ratings, and products. We show that this algorithm is guaranteed to converge and has linear time complexity. By conducting extensive experiments on five rating datasets, we show that REV2 outperforms nine existing algorithms in detecting fair and unfair users. We reported the 150 most unfair users in the Flipkart network to their review fraud investigators, and 127 users were identified as being fraudulent(84.6% accuracy). The REV2 algorithm is being deployed at Flipkart.

277 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Proceedings ArticleDOI
25 Jun 2017
TL;DR: An overview of blockchain architechture is provided and some typical consensus algorithms used in different blockchains are compared and possible future trends for blockchain are laid out.
Abstract: Blockchain, the foundation of Bitcoin, has received extensive attentions recently. Blockchain serves as an immutable ledger which allows transactions take place in a decentralized manner. Blockchain-based applications are springing up, covering numerous fields including financial services, reputation system and Internet of Things (IoT), and so on. However, there are still many challenges of blockchain technology such as scalability and security problems waiting to be overcome. This paper presents a comprehensive overview on blockchain technology. We provide an overview of blockchain architechture firstly and compare some typical consensus algorithms used in different blockchains. Furthermore, technical challenges and recent advances are briefly listed. We also lay out possible future trends for blockchain.

2,642 citations

Posted Content
TL;DR: PyTorch Geometric is introduced, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch, and a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios is performed.
Abstract: We introduce PyTorch Geometric, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch. In addition to general graph data structures and processing methods, it contains a variety of recently published methods from the domains of relational learning and 3D data processing. PyTorch Geometric achieves high data throughput by leveraging sparse GPU acceleration, by providing dedicated CUDA kernels and by introducing efficient mini-batch handling for input examples of different size. In this work, we present the library in detail and perform a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios.

2,308 citations

Journal ArticleDOI
TL;DR: The blockchain taxonomy is given, the typical blockchain consensus algorithms are introduced, typical blockchain applications are reviewed, and the future directions in the blockchain technology are pointed out.
Abstract: Blockchain has numerous benefits such as decentralisation, persistency, anonymity and auditability. There is a wide spectrum of blockchain applications ranging from cryptocurrency, financial services, risk management, internet of things (IoT) to public and social services. Although a number of studies focus on using the blockchain technology in various application aspects, there is no comprehensive survey on the blockchain technology in both technological and application perspectives. To fill this gap, we conduct a comprehensive survey on the blockchain technology. In particular, this paper gives the blockchain taxonomy, introduces typical blockchain consensus algorithms, reviews blockchain applications and discusses technical challenges as well as recent advances in tackling the challenges. Moreover, this paper also points out the future directions in the blockchain technology.

1,928 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.
Abstract: Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et. al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.

1,864 citations