Showing papers in "arXiv: Social and Information Networks in 2017"

PDF

Open Access

Posted Content•

Inductive Representation Learning on Large Graphs

[...]

William L. Hamilton, Rex Ying, Jure Leskovec

07 Jun 2017-arXiv: Social and Information Networks

TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.

...read moreread less

Abstract: Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

...read moreread less

7,926 citations

Journal Article•DOI•

Natural Scales in Geographical Patterns

[...]

Telmo Menezes¹, Camille Roth²•Institutions (2)

China Merchants Bank¹, Centre national de la recherche scientifique²

04 Apr 2017-arXiv: Social and Information Networks

TL;DR: The detection of phase transitions constitutes the first objective method of characterising endogenous, natural scales of human movement and allows us to draw discrete multi-scale geographical boundaries, potentially capable of providing key insights in fields such as epidemiology or cultural contagion.

...read moreread less

Abstract: Human mobility is known to be distributed across several orders of magnitude of physical distances , which makes it generally difficult to endogenously find or define typical and meaningful scales. Relevant analyses, from movements to geographical partitions, seem to be relative to some ad-hoc scale, or no scale at all. Relying on geotagged data collected from photo-sharing social media, we apply community detection to movement networks constrained by increasing percentiles of the distance distribution. Using a simple parameter-free discontinuity detection algorithm, we discover clear phase transitions in the community partition space. The detection of these phases constitutes the first objective method of characterising endogenous, natural scales of human movement. Our study covers nine regions, ranging from cities to countries of various sizes and a transnational area. For all regions, the number of natural scales is remarkably low (2 or 3). Further, our results hint at scale-related behaviours rather than scale-related users. The partitions of the natural scales allow us to draw discrete multi-scale geographical boundaries, potentially capable of providing key insights in fields such as epidemiology or cultural contagion where the introduction of spatial boundaries is pivotal.

...read moreread less

1,543 citations

Journal Article•DOI•

Cooperative Game Theory Approaches for Network Partitioning

[...]

Konstantin Avrachenkov¹, Aleksei Y. Kondratev², Vladimir V. Mazalov²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Russian Academy of Sciences²

12 Jul 2017-arXiv: Social and Information Networks

TL;DR: The paper proposes to use the methods of cooperative game theory that highlight not only the link density but also the mechanisms of cluster formation, and suggests two approaches from Cooperative game theory based on the Myerson value and the hedonic games.

...read moreread less

Abstract: The paper is devoted to game-theoretic methods for community detection in networks. The traditional methods for detecting community structure are based on selecting denser subgraphs inside the network. Here we propose to use the methods of cooperative game theory that highlight not only the link density but also the mechanisms of cluster formation. Specifically, we suggest two approaches from cooperative game theory: the first approach is based on the Myerson value, whereas the second approach is based on hedonic games. Both approaches allow to detect clusters with various resolution. However, the tuning of the resolution parameter in the hedonic games approach is particularly intuitive. Furthermore, the modularity based approach and its generalizations can be viewed as particular cases of the hedonic games.

...read moreread less

1,191 citations

Posted Content•

Fake News Detection on Social Media: A Data Mining Perspective

[...]

Kai Shu¹, Amy Sliva², Suhang Wang¹, Jiliang Tang³, Huan Liu¹ - Show less +1 more•Institutions (3)

Arizona State University¹, Charles River Laboratories², Michigan State University³

07 Aug 2017-arXiv: Social and Information Networks

TL;DR: This survey presents a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets, and future research directions for fake news detection on socialMedia.

...read moreread less

Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of "fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

...read moreread less

887 citations

Posted Content•

Representation Learning on Graphs: Methods and Applications.

[...]

William L. Hamilton, Rex Ying, Jure Leskovec

17 Sep 2017-arXiv: Social and Information Networks

TL;DR: In this article, the authors provide a conceptual review of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks, and highlight a number of important applications and directions for future work.

...read moreread less

Abstract: Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.

...read moreread less

853 citations

Proceedings Article•DOI•

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec.

[...]

Jiezhong Qiu¹, Yuxiao Dong², Hao Ma², Jian Li¹, Kuansan Wang², Jie Tang¹ - Show less +2 more•Institutions (2)

Tsinghua University¹, Microsoft²

09 Oct 2017-arXiv: Social and Information Networks

TL;DR: The NetMF method offers significant improvements over DeepWalk and LINE for conventional network mining tasks and provides the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian.

...read moreread less

Abstract: Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.

...read moreread less

604 citations

Posted Content•

Online Human-Bot Interactions: Detection, Estimation, and Characterization

[...]

Onur Varol¹, Emilio Ferrara², Clayton A. Davis¹, Filippo Menczer¹, Alessandro Flammini¹ - Show less +1 more•Institutions (2)

Indiana University¹, University of Southern California²

09 Mar 2017-arXiv: Social and Information Networks

TL;DR: This work presents a framework to detect social bots on Twitter, and describes several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.

...read moreread less

Abstract: Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.

...read moreread less

496 citations

Proceedings Article•DOI•

struc2vec: Learning Node Representations from Structural Identity

[...]

Leonardo F. R. Ribeiro, Pedro Savarese, Daniel R. Figueiredo

11 Apr 2017-arXiv: Social and Information Networks

TL;DR: Struc2vec as mentioned in this paper uses a hierarchy to measure node similarity at different scales, and constructs a multilayer graph to encode structural similarities and generate structural context for nodes, which improves performance on classification tasks that depend more on structural identity.

...read moreread less

Abstract: Structural identity is a concept of symmetry in which network nodes are identified according to the network structure and their relationship to other nodes. Structural identity has been studied in theory and practice over the past decades, but only recently has it been addressed with representational learning techniques. This work presents struc2vec, a novel and flexible framework for learning latent representations for the structural identity of nodes. struc2vec uses a hierarchy to measure node similarity at different scales, and constructs a multilayer graph to encode structural similarities and generate structural context for nodes. Numerical experiments indicate that state-of-the-art techniques for learning node representations fail in capturing stronger notions of structural identity, while struc2vec exhibits much superior performance in this task, as it overcomes limitations of prior approaches. As a consequence, numerical experiments indicate that struc2vec improves performance on classification tasks that depend more on structural identity.

...read moreread less

472 citations

Posted Content•

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

[...]

Benjamin D. Horne¹, Sibel Adali¹•Institutions (1)

Rensselaer Polytechnic Institute¹

28 Mar 2017-arXiv: Social and Information Networks

TL;DR: Overall title structure and the use of proper nouns in titles are very significant in differentiating fake from real, leading to the conclusion that fake news is targeted for audiences who are not likely to read beyond titles and is aimed at creating mental associations between entities and claims.

...read moreread less

Abstract: The problem of fake news has gained a lot of attention as it is claimed to have had a significant impact on 2016 US Presidential Elections. Fake news is not a new problem and its spread in social networks is well-studied. Often an underlying assumption in fake news discussion is that it is written to look like real news, fooling the reader who does not check for reliability of the sources or the arguments in its content. Through a unique study of three data sets and features that capture the style and the language of articles, we show that this assumption is not true. Fake news in most cases is more similar to satire than to real news, leading us to conclude that persuasion in fake news is achieved through heuristics rather than the strength of arguments. We show overall title structure and the use of proper nouns in titles are very significant in differentiating fake from real. This leads us to conclude that fake news is targeted for audiences who are not likely to read beyond titles and is aimed at creating mental associations between entities and claims.

...read moreread less

430 citations

Proceedings Article•DOI•

The Slashdot Zoo: Mining a Social Network with Negative Edges

[...]

Jérôme Kunegis¹, Andreas Lommatzsch¹, Christian Bauckhage²•Institutions (2)

Technical University of Berlin¹, Deutsche Telekom²

31 Oct 2017-arXiv: Social and Information Networks

TL;DR: The corpus of user relationships of the Slashdot technology news site is analysed and it is shown that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to be used.

...read moreread less

Abstract: We analyse the corpus of user relationships of the Slashdot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative endorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we consider signed variants of global network characteristics such as the clustering coefficient, node-level characteristics such as centrality and popularity measures, and link-level characteristics such as distances and similarity measures. We evaluate these measures on the task of identifying unpopular users, as well as on the task of predicting the sign of links and show that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to be used. We compare our methods to traditional methods which are only suitable for positively weighted edges.

...read moreread less

423 citations

Posted Content•

The spread of fake news by social bots

[...]

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Alessandro Flammini, Filippo Menczer - Show less +1 more

24 Jul 2017-arXiv: Social and Information Networks

TL;DR: Analysis of 14 million messages spreading 400 thousand claims on Twitter during and following the 2016 U.S. presidential campaign and election suggests that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

...read moreread less

Abstract: The massive spread of fake news has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of digital misinformation and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. However, to date, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand claims on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots play a key role in the spread of fake news. Accounts that actively spread misinformation are significantly more likely to be bots. Automated accounts are particularly active in the early spreading phases of viral claims, and tend to target influential users. Humans are vulnerable to this manipulation, retweeting bots who post false news. Successful sources of false and biased claims are heavily supported by social bots. These results suggests that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

...read moreread less

Journal Article•DOI•

The spread of low-credibility content by social bots.

[...]

Chengcheng Shao¹, Giovanni Luca Ciampaglia¹, Onur Varol¹, Kai-Cheng Yang¹, Alessandro Flammini¹, Filippo Menczer¹ - Show less +2 more•Institutions (1)

Indiana University¹

24 Jul 2017-arXiv: Social and Information Networks

TL;DR: It is found that bots play a major role in the spread of low-credibility content on Twitter, and control measures for limiting thespread of misinformation are suggested.

...read moreread less

Abstract: The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. With few exceptions, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots played a disproportionate role in amplifying low-credibility content. Accounts that actively spread articles from low-credibility sources are significantly more likely to be bots. Automated accounts are particularly active in amplifying content in the very early spreading moments, before an article goes viral. Bots also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, retweeting bots who post links to low-credibility content. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

...read moreread less

Posted Content•

A Survey on Network Embedding

[...]

Peng Cui¹, Xiao Wang¹, Jian Pei², Wenwu Zhu¹•Institutions (2)

Tsinghua University¹, Simon Fraser University²

23 Nov 2017-arXiv: Social and Information Networks

TL;DR: Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure as discussed by the authors, and a significant amount of progress has been made toward this emerging network analysis paradigm.

...read moreread less

Abstract: Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods, the network embedding methods with side information and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.

...read moreread less

Proceedings Article•DOI•

Attributed Network Embedding for Learning in a Dynamic Environment

[...]

Jundong Li¹, Harsh Dani¹, Xia Hu², Jiliang Tang³, Yi Chang⁴, Huan Liu¹ - Show less +2 more•Institutions (4)

Arizona State University¹, Texas A&M University², Michigan State University³, Huawei⁴

06 Jun 2017-arXiv: Social and Information Networks

TL;DR: In this paper, the authors propose a dynamic attributed network embedding framework (DANE), which first provides an offline method for a consensus embedding and then leverages matrix perturbation theory to maintain the freshness of the end embedding results in an online manner.

...read moreread less

Abstract: Network embedding leverages the node proximity manifested to learn a low-dimensional node vector representation for each node in the network. The learned embeddings could advance various learning tasks such as node classification, network clustering, and link prediction. Most, if not all, of the existing works, are overwhelmingly performed in the context of plain and static networks. Nonetheless, in reality, network structure often evolves over time with addition/deletion of links and nodes. Also, a vast majority of real-world networks are associated with a rich set of node attributes, and their attribute values are also naturally changing, with the emerging of new content patterns and the fading of old content patterns. These changing characteristics motivate us to seek an effective embedding representation to capture network and attribute evolving patterns, which is of fundamental importance for learning in a dynamic environment. To our best knowledge, we are the first to tackle this problem with the following two challenges: (1) the inherently correlated network and node attributes could be noisy and incomplete, it necessitates a robust consensus representation to capture their individual properties and correlations; (2) the embedding learning needs to be performed in an online fashion to adapt to the changes accordingly. In this paper, we tackle this problem by proposing a novel dynamic attributed network embedding framework - DANE. In particular, DANE first provides an offline method for a consensus embedding and then leverages matrix perturbation theory to maintain the freshness of the end embedding results in an online manner. We perform extensive experiments on both synthetic and real attributed networks to corroborate the effectiveness and efficiency of the proposed framework.

...read moreread less

Proceedings Article•DOI•

The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

[...]

Stefano Cresci¹, Roberto Di Pietro², Marinella Petrocchi³, Angelo Spognardi, Maurizio Tesconi³ - Show less +1 more•Institutions (3)

University of Pisa¹, University of Padua², National Research Council³

11 Jan 2017-arXiv: Social and Information Networks

TL;DR: An extensive study of the rise of a new generation of spambots on Twitter and quantitative evidence that a paradigm-shift exists in spambot design is provided, which calls for new approaches capable of turning the tide in the fight against this raising phenomenon.

...read moreread less

Abstract: Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel phenomenon on Twitter and we provide quantitative evidence that a paradigm-shift exists in spambot design. First, we measure current Twitter's capabilities of detecting the new social spambots. Later, we assess the human performance in discriminating between genuine accounts, social spambots, and traditional spambots. Then, we benchmark several state-of-the-art techniques proposed by the academic literature. Results show that neither Twitter, nor humans, nor cutting-edge applications are currently capable of accurately detecting the new social spambots. Our results call for new approaches capable of turning the tide in the fight against this raising phenomenon. We conclude by reviewing the latest literature on spambots detection and we highlight an emerging common research trend based on the analysis of collective behaviors. Insights derived from both our extensive experimental campaign and survey shed light on the most promising directions of research and lay the foundations for the arms race against the novel social spambots. Finally, to foster research on this novel phenomenon, we make publicly available to the scientific community all the datasets used in this study.

...read moreread less

Proceedings Article•DOI•

Learning Structural Node Embeddings Via Diffusion Wavelets.

[...]

Claire Donnat¹, Marinka Zitnik¹, David Hallac¹, Jure Leskovec¹•Institutions (1)

Stanford University¹

27 Oct 2017-arXiv: Social and Information Networks

TL;DR: GraphWave is developed, a method that represents each node's network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns and mathematically proves that nodes with similar network neighborhoods will have similar GraphWave embeddings even though these nodes may reside in very different parts of the network, and the method scales linearly with the number of edges.

...read moreread less

Abstract: Nodes residing in different parts of a graph can have similar structural roles within their local network topology. The identification of such roles provides key insight into the organization of networks and can be used for a variety of machine learning tasks. However, learning structural representations of nodes is a challenging problem, and it has typically involved manually specifying and tailoring topological features for each node. In this paper, we develop GraphWave, a method that represents each node's network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns. Instead of training on hand-selected features, GraphWave learns these embeddings in an unsupervised way. We mathematically prove that nodes with similar network neighborhoods will have similar GraphWave embeddings even though these nodes may reside in very different parts of the network, and our method scales linearly with the number of edges. Experiments in a variety of different settings demonstrate GraphWave's real-world potential for capturing structural roles in networks, and our approach outperforms existing state-of-the-art baselines in every experiment, by as much as 137%.

...read moreread less

Proceedings Article•DOI•

Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media

[...]

Juhi Kulshrestha¹, Motahhare Eslami², Johnnatan Messias¹, Muhammad Bilal Zafar¹, Saptarshi Ghosh³, Krishna P. Gummadi¹, Karrie Karahalios² - Show less +3 more•Institutions (3)

Max Planck Society¹, University of Illinois at Urbana–Champaign², Indian Institute of Engineering Science and Technology, Shibpur³

05 Apr 2017-arXiv: Social and Information Networks

TL;DR: This paper proposes a framework to quantify these distinct biases and applies this framework to politics-related queries on Twitter and found that both the input data and the ranking system contribute significantly to produce varying amounts of bias in the search results.

...read moreread less

Abstract: Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises from the data that serves as the input to the ranking system and the bias that arises from the ranking system itself. In this paper, we propose a framework to quantify these distinct biases and apply this framework to politics-related queries on Twitter. We found that both the input data and the ranking system contribute significantly to produce varying amounts of bias in the search results and in different ways. We discuss the consequences of these biases and possible mechanisms to signal this bias in social media search systems' interfaces.

...read moreread less

Posted Content•

Structural Deep Embedding for Hyper-Networks

[...]

Ke Tu¹, Peng Cui¹, Xiao Wang¹, Fei Wang², Wenwu Zhu¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, Cornell University²

28 Nov 2017-arXiv: Social and Information Networks

TL;DR: Deep Hyper-Network Embedding (DHNE) as discussed by the authors proposes a new deep model to realize a non-linear tuplewise similarity function while preserving both local and global proximities in the formed embedding space.

...read moreread less

Abstract: Network embedding has recently attracted lots of attentions in data mining. Existing network embedding methods mainly focus on networks with pairwise relationships. In real world, however, the relationships among data points could go beyond pairwise, i.e., three or more objects are involved in each relationship represented by a hyperedge, thus forming hyper-networks. These hyper-networks pose great challenges to existing network embedding methods when the hyperedges are indecomposable, that is to say, any subset of nodes in a hyperedge cannot form another hyperedge. These indecomposable hyperedges are especially common in heterogeneous networks. In this paper, we propose a novel Deep Hyper-Network Embedding (DHNE) model to embed hyper-networks with indecomposable hyperedges. More specifically, we theoretically prove that any linear similarity metric in embedding space commonly used in existing methods cannot maintain the indecomposibility property in hyper-networks, and thus propose a new deep model to realize a non-linear tuplewise similarity function while preserving both local and global proximities in the formed embedding space. We conduct extensive experiments on four different types of hyper-networks, including a GPS network, an online social network, a drug network and a semantic network. The empirical results demonstrate that our method can significantly and consistently outperform the state-of-the-art algorithms.

...read moreread less

Journal Article•DOI•

Tweet for Behavior Change: Using Social Media for the Dissemination of Public Health Messages

[...]

Aisling Gough¹, Ruth F. Hunter¹, Oluwaseun Ajao¹, Anna Jurek¹, Gary McKeown¹, Jun Hong¹, Eimear Barrett¹, Marbeth Ferguson, Gerry McElwee, Miriam McCarthy, Frank Kee¹ - Show less +7 more•Institutions (1)

Queen's University Belfast¹

26 Mar 2017-arXiv: Social and Information Networks

TL;DR: Findings suggested that shocking and humorous messages generated greatest impressions and engagement, but information-based messages were likely to be shared most, which might have contributed to improved knowledge and attitudes toward skin cancer among the target population.

...read moreread less

Abstract: Background: Social media public health campaigns have the advantage of tailored messaging at low cost and large reach, but little is known about what would determine their feasibility as tools for inducing attitude and behavior change. Objective: The aim of this study was to test the feasibility of designing, implementing, and evaluating a social media-enabled intervention for skin cancer prevention. Conclusions: Social media-disseminated public health messages reached more than 23% of the Northern Ireland population. A Web-based survey suggested that the campaign might have contributed to improved knowledge and attitudes toward skin cancer among the target population. Findings suggested that shocking and humorous messages generated greatest impressions and engagement, but information-based messages were likely to be shared most. The extent of behavioral change as a result of the campaign remains to be explored, however, the change of attitudes and knowledge is promising. Social media is an inexpensive, effective method for delivering public health messages. However, existing and traditional process evaluation methods may not be suitable for social media.

...read moreread less

Journal Article•DOI•

Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

[...]

Stefano Cresci, Roberto Di Pietro¹, Marinella Petrocchi, Angelo Spognardi², Maurizio Tesconi - Show less +1 more•Institutions (2)

Bell Labs¹, Technical University of Denmark²

13 Mar 2017-arXiv: Social and Information Networks

TL;DR: The Social Fingerprinting technique is designed, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion and to efficiently rely on a limited number of lightweight account characteristics.

...read moreread less

Abstract: Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics.

...read moreread less

Journal Article•DOI•

NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media

[...]

Saeedreza Shehnepoor¹, Mostafa Salehi¹, Reza Farahbakhsh², Noel Crespi²•Institutions (2)

University of Tehran¹, Institut Mines-Télécom²

10 Mar 2017-arXiv: Social and Information Networks

TL;DR: A novel framework is proposed, named NetSpam, which utilizes spam features for modeling review data sets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks.

...read moreread less

Abstract: Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, reviewlinguistic, user-linguistic, the first type of features performs better than the other categories.

...read moreread less

Posted Content•

On the Dynamics of Deterministic Epidemic Propagation over Networks

[...]

Wenjun Mei¹, Shadi Mohagheghi¹, Sandro Zampieri², Francesco Bullo¹•Institutions (2)

University of California, Santa Barbara¹, University of Padua²

11 Jan 2017-arXiv: Social and Information Networks

TL;DR: In this paper, a class of deterministic nonlinear models for the propagation of infectious diseases over contact networks with strongly-connected topologies is presented. And the authors provide a comprehensive nonlinear analysis of equilibria, stability properties, convergence, monotonicity, positivity, and threshold conditions.

...read moreread less

Abstract: In this work we review a class of deterministic nonlinear models for the propagation of infectious diseases over contact networks with strongly-connected topologies. We consider network models for susceptible-infected (SI), susceptible-infected-susceptible (SIS), and susceptible-infected-recovered (SIR) settings. In each setting, we provide a comprehensive nonlinear analysis of equilibria, stability properties, convergence, monotonicity, positivity, and threshold conditions. For the network SI setting, specific contributions include establishing its equilibria, stability, and positivity properties. For the network SIS setting, we review a well-known deterministic model, provide novel results on the computation and characterization of the endemic state (when the system is above the epidemic threshold), and present alternative proofs for some of its properties. Finally, for the network SIR setting, we propose novel results for transient behavior, threshold conditions, stability properties, and asymptotic convergence. These results are analogous to those well-known for the scalar case. In addition, we provide a novel iterative algorithm to compute the asymptotic state of the network SIR system.

...read moreread less

Posted Content•

Beyond News Contents: The Role of Social Context for Fake News Detection

[...]

Kai Shu¹, Suhang Wang², Huan Liu¹•Institutions (2)

Arizona State University¹, Pennsylvania State University²

20 Dec 2017-arXiv: Social and Information Networks

TL;DR: A tri-relationship embedding framework TriFN is proposed, which models publisher-news relations and user-news interactions simultaneously forfake news classification and significantly outperforms other baseline methods for fake news detection.

...read moreread less

Abstract: Social media is becoming popular for news consumption due to its fast dissemination, easy access, and low cost. However, it also enables the wide propagation of fake news, i.e., news with intentionally false information. Detecting fake news is an important task, which not only ensures users to receive authentic information but also help maintain a trustworthy news ecosystem. The majority of existing detection algorithms focus on finding clues from news contents, which are generally not effective because fake news is often intentionally written to mislead users by mimicking true news. Therefore, we need to explore auxiliary information to improve detection. The social context during news dissemination process on social media forms the inherent tri-relationship, the relationship among publishers, news pieces, and users, which has potential to improve fake news detection. For example, partisan-biased publishers are more likely to publish fake news, and low-credible users are more likely to share fake news. In this paper, we study the novel problem of exploiting social context for fake news detection. We propose a tri-relationship embedding framework TriFN, which models publisher-news relations and user-news interactions simultaneously for fake news classification. We conduct experiments on two real-world datasets, which demonstrate that the proposed approach significantly outperforms other baseline methods for fake news detection.

...read moreread less

Posted Content•

Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction

[...]

Ziniu Hu¹, Weiqing Liu¹, Jiang Bian¹, Xuanzhe Liu², Tie-Yan Liu¹ - Show less +1 more•Institutions (2)

Microsoft¹, Peking University²

06 Dec 2017-arXiv: Social and Information Networks

TL;DR: A Hybrid Attention Networks (HAN) is designed to predict the stock trend based on the sequence of recent related news, and the self-paced learning mechanism is applied to imitate the third principle.

...read moreread less

Abstract: Stock trend prediction plays a critical role in seeking maximized profit from stock investment. However, precise trend prediction is very difficult since the highly volatile and non-stationary nature of stock market. Exploding information on Internet together with advancing development of natural language processing and text mining techniques have enable investors to unveil market trends and volatility from online content. Unfortunately, the quality, trustworthiness and comprehensiveness of online content related to stock market varies drastically, and a large portion consists of the low-quality news, comments, or even rumors. To address this challenge, we imitate the learning process of human beings facing such chaotic online news, driven by three principles: sequential content dependency, diverse influence, and effective and efficient learning. In this paper, to capture the first two principles, we designed a Hybrid Attention Networks to predict the stock trend based on the sequence of recent related news. Moreover, we apply the self-paced learning mechanism to imitate the third principle. Extensive experiments on real-world stock market data demonstrate the effectiveness of our approach.

...read moreread less

Posted Content•

Semi-supervised Embedding in Attributed Networks with Outliers

[...]

Jiongqian Liang, Peter Jacobs, Jiankai Sun, Srinivasan Parthasarathy

23 Mar 2017-arXiv: Social and Information Networks

TL;DR: A novel framework to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN), which can significantly outperform baseline methods when applied for detecting network outliers.

...read moreread less

Abstract: In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -- flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task.

...read moreread less

Posted Content•

Social Bots: Human-Like by Means of Human Control?

[...]

Christian Grimme¹, Mike Preuss¹, Lena Adam¹, Heike Trautmann¹•Institutions (1)

University of Münster¹

23 Jun 2017-arXiv: Social and Information Networks

TL;DR: In this paper, the authors discuss how bot capabilities can be extended and controlled by integrating humans into the process and reason that this is currently the most promising way to go in order to realize effective interactions with other humans.

...read moreread less

Abstract: Social bots are currently regarded an influential but also somewhat mysterious factor in public discourse and opinion making. They are considered to be capable of massively distributing propaganda in social and online media and their application is even suspected to be partly responsible for recent election results. Astonishingly, the term `Social Bot' is not well defined and different scientific disciplines use divergent definitions. This work starts with a balanced definition attempt, before providing an overview of how social bots actually work (taking the example of Twitter) and what their current technical limitations are. Despite recent research progress in Deep Learning and Big Data, there are many activities bots cannot handle well. We then discuss how bot capabilities can be extended and controlled by integrating humans into the process and reason that this is currently the most promising way to go in order to realize effective interactions with other humans.

...read moreread less

Posted Content•

Network Community Detection: A Review and Visual Survey

[...]

Bisma S. Khan, Muaz A. Niazi

03 Aug 2017-arXiv: Social and Information Networks

TL;DR: A visual survey of key literature using CiteSpace to identify the most influential, central, as well as active nodes using scientometric analyses and finds that Yong Wang is a pivot node with the highest centrality.

...read moreread less

Abstract: Community structure is an important area of research. It has received a considerable attention from the scientific community. Despite its importance, one of the key problems in locating information about community detection is the diverse spread of related articles across various disciplines. To the best of our knowledge, there is no current comprehensive review of recent literature which uses a scientometric analysis using complex networks analysis covering all relevant articles from the Web of Science (WoS). Here we present a visual survey of key literature using CiteSpace. The idea is to identify emerging trends besides using network techniques to examine the evolution of the domain. Towards that end, we identify the most influential, central, as well as active nodes using scientometric analyses. We examine authors, key articles, cited references, core subject categories, key journals, institutions, as well as countries. The exploration of the scientometric literature of the domain reveals that Yong Wang is a pivot node with the highest centrality. Additionally, we have observed that Mark Newman is the most highly cited author in the network. We have also identified that the journal, "Reviews of Modern Physics" has the strongest citation burst. In terms of cited documents, an article by Andrea Lancichinetti has the highest centrality score. We have also discovered that the origin of the key publications in this domain is from the United States. Whereas Scotland has the strongest and longest citation burst. Additionally, we have found that the categories of "Computer Science" and "Engineering" lead other categories based on frequency and centrality respectively.

...read moreread less

Posted Content•

The Fake News Spreading Plague: Was it Preventable?

[...]

Eni Mustafaraj¹, Panagiotis Takis Metaxas¹•Institutions (1)

Wellesley College¹

20 Mar 2017-arXiv: Social and Information Networks

TL;DR: In this paper, the authors outline the recipe of how social networks are used to spread misinformation and show how it can be successfully used during the 2016 U.S. presidential election.

...read moreread less

Abstract: In 2010, a paper entitled "From Obscurity to Prominence in Minutes: Political Speech and Real-time search" won the Best Paper Prize of the Web Science 2010 Conference. Among its findings were the discovery and documentation of what was termed a "Twitter-bomb", an organized effort to spread misinformation about the democratic candidate Martha Coakley through anonymous Twitter accounts. In this paper, after summarizing the details of that event, we outline the recipe of how social networks are used to spread misinformation. One of the most important steps in such a recipe is the "infiltration" of a community of users who are already engaged in conversations about a topic, to use them as organic spreaders of misinformation in their extended subnetworks. Then, we take this misinformation spreading recipe and indicate how it was successfully used to spread fake news during the 2016 U.S. Presidential Election. The main differences between the scenarios are the use of Facebook instead of Twitter, and the respective motivations (in 2010: political influence; in 2016: financial benefit through online advertising). After situating these events in the broader context of exploiting the Web, we seize this opportunity to address limitations of the reach of research findings and to start a conversation about how communities of researchers can increase their impact on real-world societal issues.

...read moreread less

Posted Content•

Catching Zika Fever: Application of Crowdsourcing and Machine Learning for Tracking Health Misinformation on Twitter

[...]

Amira Ghenai¹, Yelena Mejova²•Institutions (2)

University of Waterloo¹, Qatar Computing Research Institute²

12 Jul 2017-arXiv: Social and Information Networks

TL;DR: An extremely bursty behavior of rumor-related topics is discovered, and it is shown that, once the questionable topic is detected, it is possible to identify rumor-bearing tweets using automated techniques.

...read moreread less

Abstract: In February 2016, World Health Organization declared the Zika outbreak a Public Health Emergency of International Concern. With developing evidence it can cause birth defects, and the Summer Olympics coming up in the worst affected country, Brazil, the virus caught fire on social media. In this work, use Zika as a case study in building a tool for tracking the misinformation around health concerns on Twitter. We collect more than 13 million tweets -- spanning the initial reports in February 2016 and the Summer Olympics -- regarding the Zika outbreak and track rumors outlined by the World Health Organization and Snopes fact checking website. The tool pipeline, which incorporates health professionals, crowdsourcing, and machine learning, allows us to capture health-related rumors around the world, as well as clarification campaigns by reputable health organizations. In the case of Zika, we discover an extremely bursty behavior of rumor-related topics, and show that, once the questionable topic is detected, it is possible to identify rumor-bearing tweets using automated techniques. Thus, we illustrate insights the proposed tools provide into potentially harmful information on social media, allowing public health researchers and practitioners to respond with a targeted and timely action.

...read moreread less

Posted Content•

Diving Deep into Clickbaits: Who Use Them to What Extents in Which Topics with What Effects?

[...]

Main Uddin Rony¹, Naeemul Hassan¹, Mohammad Abu Yousuf²•Institutions (2)

University of Mississippi¹, University of Oklahoma²

28 Mar 2017-arXiv: Social and Information Networks

TL;DR: This paper analyzes 1.67 million Facebook posts created by 153 media organizations to understand the extent of clickbait practice, its impact and user engagement by using the model developed, which uses distributed sub-word embeddings learned from a large corpus.

...read moreread less

Abstract: The use of alluring headlines (clickbait) to tempt the readers has become a growing practice nowadays. For the sake of existence in the highly competitive media industry, most of the on-line media including the mainstream ones, have started following this practice. Although the wide-spread practice of clickbait makes the reader's reliability on media vulnerable, a large scale analysis to reveal this fact is still absent. In this paper, we analyze 1.67 million Facebook posts created by 153 media organizations to understand the extent of clickbait practice, its impact and user engagement by using our own developed clickbait detection model. The model uses distributed sub-word embeddings learned from a large corpus. The accuracy of the model is 98.3%. Powered with this model, we further study the distribution of topics in clickbait and non-clickbait contents.

...read moreread less

Collapse