scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Social and Information Networks in 2017"


Posted Content
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Abstract: Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

7,926 citations


Journal ArticleDOI
TL;DR: The detection of phase transitions constitutes the first objective method of characterising endogenous, natural scales of human movement and allows us to draw discrete multi-scale geographical boundaries, potentially capable of providing key insights in fields such as epidemiology or cultural contagion.
Abstract: Human mobility is known to be distributed across several orders of magnitude of physical distances , which makes it generally difficult to endogenously find or define typical and meaningful scales. Relevant analyses, from movements to geographical partitions, seem to be relative to some ad-hoc scale, or no scale at all. Relying on geotagged data collected from photo-sharing social media, we apply community detection to movement networks constrained by increasing percentiles of the distance distribution. Using a simple parameter-free discontinuity detection algorithm, we discover clear phase transitions in the community partition space. The detection of these phases constitutes the first objective method of characterising endogenous, natural scales of human movement. Our study covers nine regions, ranging from cities to countries of various sizes and a transnational area. For all regions, the number of natural scales is remarkably low (2 or 3). Further, our results hint at scale-related behaviours rather than scale-related users. The partitions of the natural scales allow us to draw discrete multi-scale geographical boundaries, potentially capable of providing key insights in fields such as epidemiology or cultural contagion where the introduction of spatial boundaries is pivotal.

1,543 citations


Journal ArticleDOI
TL;DR: The paper proposes to use the methods of cooperative game theory that highlight not only the link density but also the mechanisms of cluster formation, and suggests two approaches from Cooperative game theory based on the Myerson value and the hedonic games.
Abstract: The paper is devoted to game-theoretic methods for community detection in networks. The traditional methods for detecting community structure are based on selecting denser subgraphs inside the network. Here we propose to use the methods of cooperative game theory that highlight not only the link density but also the mechanisms of cluster formation. Specifically, we suggest two approaches from cooperative game theory: the first approach is based on the Myerson value, whereas the second approach is based on hedonic games. Both approaches allow to detect clusters with various resolution. However, the tuning of the resolution parameter in the hedonic games approach is particularly intuitive. Furthermore, the modularity based approach and its generalizations can be viewed as particular cases of the hedonic games.

1,191 citations


Posted Content
TL;DR: This survey presents a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets, and future research directions for fake news detection on socialMedia.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of "fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

887 citations


Posted Content
TL;DR: In this article, the authors provide a conceptual review of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks, and highlight a number of important applications and directions for future work.
Abstract: Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.

853 citations


Proceedings ArticleDOI
Jiezhong Qiu1, Yuxiao Dong2, Hao Ma2, Jian Li1, Kuansan Wang2, Jie Tang1 
TL;DR: The NetMF method offers significant improvements over DeepWalk and LINE for conventional network mining tasks and provides the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian.
Abstract: Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.

604 citations


Posted Content
TL;DR: This work presents a framework to detect social bots on Twitter, and describes several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.
Abstract: Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.

496 citations


Proceedings ArticleDOI
TL;DR: Struc2vec as mentioned in this paper uses a hierarchy to measure node similarity at different scales, and constructs a multilayer graph to encode structural similarities and generate structural context for nodes, which improves performance on classification tasks that depend more on structural identity.
Abstract: Structural identity is a concept of symmetry in which network nodes are identified according to the network structure and their relationship to other nodes. Structural identity has been studied in theory and practice over the past decades, but only recently has it been addressed with representational learning techniques. This work presents struc2vec, a novel and flexible framework for learning latent representations for the structural identity of nodes. struc2vec uses a hierarchy to measure node similarity at different scales, and constructs a multilayer graph to encode structural similarities and generate structural context for nodes. Numerical experiments indicate that state-of-the-art techniques for learning node representations fail in capturing stronger notions of structural identity, while struc2vec exhibits much superior performance in this task, as it overcomes limitations of prior approaches. As a consequence, numerical experiments indicate that struc2vec improves performance on classification tasks that depend more on structural identity.

472 citations


Posted Content
TL;DR: Overall title structure and the use of proper nouns in titles are very significant in differentiating fake from real, leading to the conclusion that fake news is targeted for audiences who are not likely to read beyond titles and is aimed at creating mental associations between entities and claims.
Abstract: The problem of fake news has gained a lot of attention as it is claimed to have had a significant impact on 2016 US Presidential Elections. Fake news is not a new problem and its spread in social networks is well-studied. Often an underlying assumption in fake news discussion is that it is written to look like real news, fooling the reader who does not check for reliability of the sources or the arguments in its content. Through a unique study of three data sets and features that capture the style and the language of articles, we show that this assumption is not true. Fake news in most cases is more similar to satire than to real news, leading us to conclude that persuasion in fake news is achieved through heuristics rather than the strength of arguments. We show overall title structure and the use of proper nouns in titles are very significant in differentiating fake from real. This leads us to conclude that fake news is targeted for audiences who are not likely to read beyond titles and is aimed at creating mental associations between entities and claims.

430 citations


Proceedings ArticleDOI
TL;DR: The corpus of user relationships of the Slashdot technology news site is analysed and it is shown that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to be used.
Abstract: We analyse the corpus of user relationships of the Slashdot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative endorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we consider signed variants of global network characteristics such as the clustering coefficient, node-level characteristics such as centrality and popularity measures, and link-level characteristics such as distances and similarity measures. We evaluate these measures on the task of identifying unpopular users, as well as on the task of predicting the sign of links and show that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to be used. We compare our methods to traditional methods which are only suitable for positively weighted edges.

423 citations


Posted Content
TL;DR: Analysis of 14 million messages spreading 400 thousand claims on Twitter during and following the 2016 U.S. presidential campaign and election suggests that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
Abstract: The massive spread of fake news has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of digital misinformation and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. However, to date, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand claims on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots play a key role in the spread of fake news. Accounts that actively spread misinformation are significantly more likely to be bots. Automated accounts are particularly active in the early spreading phases of viral claims, and tend to target influential users. Humans are vulnerable to this manipulation, retweeting bots who post false news. Successful sources of false and biased claims are heavily supported by social bots. These results suggests that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

Journal ArticleDOI
TL;DR: It is found that bots play a major role in the spread of low-credibility content on Twitter, and control measures for limiting thespread of misinformation are suggested.
Abstract: The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. With few exceptions, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots played a disproportionate role in amplifying low-credibility content. Accounts that actively spread articles from low-credibility sources are significantly more likely to be bots. Automated accounts are particularly active in amplifying content in the very early spreading moments, before an article goes viral. Bots also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, retweeting bots who post links to low-credibility content. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

Posted Content
TL;DR: Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure as discussed by the authors, and a significant amount of progress has been made toward this emerging network analysis paradigm.
Abstract: Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods, the network embedding methods with side information and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.

Proceedings ArticleDOI
TL;DR: In this paper, the authors propose a dynamic attributed network embedding framework (DANE), which first provides an offline method for a consensus embedding and then leverages matrix perturbation theory to maintain the freshness of the end embedding results in an online manner.
Abstract: Network embedding leverages the node proximity manifested to learn a low-dimensional node vector representation for each node in the network. The learned embeddings could advance various learning tasks such as node classification, network clustering, and link prediction. Most, if not all, of the existing works, are overwhelmingly performed in the context of plain and static networks. Nonetheless, in reality, network structure often evolves over time with addition/deletion of links and nodes. Also, a vast majority of real-world networks are associated with a rich set of node attributes, and their attribute values are also naturally changing, with the emerging of new content patterns and the fading of old content patterns. These changing characteristics motivate us to seek an effective embedding representation to capture network and attribute evolving patterns, which is of fundamental importance for learning in a dynamic environment. To our best knowledge, we are the first to tackle this problem with the following two challenges: (1) the inherently correlated network and node attributes could be noisy and incomplete, it necessitates a robust consensus representation to capture their individual properties and correlations; (2) the embedding learning needs to be performed in an online fashion to adapt to the changes accordingly. In this paper, we tackle this problem by proposing a novel dynamic attributed network embedding framework - DANE. In particular, DANE first provides an offline method for a consensus embedding and then leverages matrix perturbation theory to maintain the freshness of the end embedding results in an online manner. We perform extensive experiments on both synthetic and real attributed networks to corroborate the effectiveness and efficiency of the proposed framework.

Proceedings ArticleDOI
TL;DR: An extensive study of the rise of a new generation of spambots on Twitter and quantitative evidence that a paradigm-shift exists in spambot design is provided, which calls for new approaches capable of turning the tide in the fight against this raising phenomenon.
Abstract: Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel phenomenon on Twitter and we provide quantitative evidence that a paradigm-shift exists in spambot design. First, we measure current Twitter's capabilities of detecting the new social spambots. Later, we assess the human performance in discriminating between genuine accounts, social spambots, and traditional spambots. Then, we benchmark several state-of-the-art techniques proposed by the academic literature. Results show that neither Twitter, nor humans, nor cutting-edge applications are currently capable of accurately detecting the new social spambots. Our results call for new approaches capable of turning the tide in the fight against this raising phenomenon. We conclude by reviewing the latest literature on spambots detection and we highlight an emerging common research trend based on the analysis of collective behaviors. Insights derived from both our extensive experimental campaign and survey shed light on the most promising directions of research and lay the foundations for the arms race against the novel social spambots. Finally, to foster research on this novel phenomenon, we make publicly available to the scientific community all the datasets used in this study.

Proceedings ArticleDOI
TL;DR: GraphWave is developed, a method that represents each node's network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns and mathematically proves that nodes with similar network neighborhoods will have similar GraphWave embeddings even though these nodes may reside in very different parts of the network, and the method scales linearly with the number of edges.
Abstract: Nodes residing in different parts of a graph can have similar structural roles within their local network topology. The identification of such roles provides key insight into the organization of networks and can be used for a variety of machine learning tasks. However, learning structural representations of nodes is a challenging problem, and it has typically involved manually specifying and tailoring topological features for each node. In this paper, we develop GraphWave, a method that represents each node's network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns. Instead of training on hand-selected features, GraphWave learns these embeddings in an unsupervised way. We mathematically prove that nodes with similar network neighborhoods will have similar GraphWave embeddings even though these nodes may reside in very different parts of the network, and our method scales linearly with the number of edges. Experiments in a variety of different settings demonstrate GraphWave's real-world potential for capturing structural roles in networks, and our approach outperforms existing state-of-the-art baselines in every experiment, by as much as 137%.

Proceedings ArticleDOI
TL;DR: This paper proposes a framework to quantify these distinct biases and applies this framework to politics-related queries on Twitter and found that both the input data and the ranking system contribute significantly to produce varying amounts of bias in the search results.
Abstract: Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises from the data that serves as the input to the ranking system and the bias that arises from the ranking system itself. In this paper, we propose a framework to quantify these distinct biases and apply this framework to politics-related queries on Twitter. We found that both the input data and the ranking system contribute significantly to produce varying amounts of bias in the search results and in different ways. We discuss the consequences of these biases and possible mechanisms to signal this bias in social media search systems' interfaces.

Posted Content
Ke Tu1, Peng Cui1, Xiao Wang1, Fei Wang2, Wenwu Zhu1 
TL;DR: Deep Hyper-Network Embedding (DHNE) as discussed by the authors proposes a new deep model to realize a non-linear tuplewise similarity function while preserving both local and global proximities in the formed embedding space.
Abstract: Network embedding has recently attracted lots of attentions in data mining. Existing network embedding methods mainly focus on networks with pairwise relationships. In real world, however, the relationships among data points could go beyond pairwise, i.e., three or more objects are involved in each relationship represented by a hyperedge, thus forming hyper-networks. These hyper-networks pose great challenges to existing network embedding methods when the hyperedges are indecomposable, that is to say, any subset of nodes in a hyperedge cannot form another hyperedge. These indecomposable hyperedges are especially common in heterogeneous networks. In this paper, we propose a novel Deep Hyper-Network Embedding (DHNE) model to embed hyper-networks with indecomposable hyperedges. More specifically, we theoretically prove that any linear similarity metric in embedding space commonly used in existing methods cannot maintain the indecomposibility property in hyper-networks, and thus propose a new deep model to realize a non-linear tuplewise similarity function while preserving both local and global proximities in the formed embedding space. We conduct extensive experiments on four different types of hyper-networks, including a GPS network, an online social network, a drug network and a semantic network. The empirical results demonstrate that our method can significantly and consistently outperform the state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: Findings suggested that shocking and humorous messages generated greatest impressions and engagement, but information-based messages were likely to be shared most, which might have contributed to improved knowledge and attitudes toward skin cancer among the target population.
Abstract: Background: Social media public health campaigns have the advantage of tailored messaging at low cost and large reach, but little is known about what would determine their feasibility as tools for inducing attitude and behavior change. Objective: The aim of this study was to test the feasibility of designing, implementing, and evaluating a social media-enabled intervention for skin cancer prevention. Conclusions: Social media-disseminated public health messages reached more than 23% of the Northern Ireland population. A Web-based survey suggested that the campaign might have contributed to improved knowledge and attitudes toward skin cancer among the target population. Findings suggested that shocking and humorous messages generated greatest impressions and engagement, but information-based messages were likely to be shared most. The extent of behavioral change as a result of the campaign remains to be explored, however, the change of attitudes and knowledge is promising. Social media is an inexpensive, effective method for delivering public health messages. However, existing and traditional process evaluation methods may not be suitable for social media.

Journal ArticleDOI
TL;DR: The Social Fingerprinting technique is designed, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion and to efficiently rely on a limited number of lightweight account characteristics.
Abstract: Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics.

Journal ArticleDOI
TL;DR: A novel framework is proposed, named NetSpam, which utilizes spam features for modeling review data sets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks.
Abstract: Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, reviewlinguistic, user-linguistic, the first type of features performs better than the other categories.

Posted Content
TL;DR: In this paper, a class of deterministic nonlinear models for the propagation of infectious diseases over contact networks with strongly-connected topologies is presented. And the authors provide a comprehensive nonlinear analysis of equilibria, stability properties, convergence, monotonicity, positivity, and threshold conditions.
Abstract: In this work we review a class of deterministic nonlinear models for the propagation of infectious diseases over contact networks with strongly-connected topologies. We consider network models for susceptible-infected (SI), susceptible-infected-susceptible (SIS), and susceptible-infected-recovered (SIR) settings. In each setting, we provide a comprehensive nonlinear analysis of equilibria, stability properties, convergence, monotonicity, positivity, and threshold conditions. For the network SI setting, specific contributions include establishing its equilibria, stability, and positivity properties. For the network SIS setting, we review a well-known deterministic model, provide novel results on the computation and characterization of the endemic state (when the system is above the epidemic threshold), and present alternative proofs for some of its properties. Finally, for the network SIR setting, we propose novel results for transient behavior, threshold conditions, stability properties, and asymptotic convergence. These results are analogous to those well-known for the scalar case. In addition, we provide a novel iterative algorithm to compute the asymptotic state of the network SIR system.

Posted Content
TL;DR: A tri-relationship embedding framework TriFN is proposed, which models publisher-news relations and user-news interactions simultaneously forfake news classification and significantly outperforms other baseline methods for fake news detection.
Abstract: Social media is becoming popular for news consumption due to its fast dissemination, easy access, and low cost. However, it also enables the wide propagation of fake news, i.e., news with intentionally false information. Detecting fake news is an important task, which not only ensures users to receive authentic information but also help maintain a trustworthy news ecosystem. The majority of existing detection algorithms focus on finding clues from news contents, which are generally not effective because fake news is often intentionally written to mislead users by mimicking true news. Therefore, we need to explore auxiliary information to improve detection. The social context during news dissemination process on social media forms the inherent tri-relationship, the relationship among publishers, news pieces, and users, which has potential to improve fake news detection. For example, partisan-biased publishers are more likely to publish fake news, and low-credible users are more likely to share fake news. In this paper, we study the novel problem of exploiting social context for fake news detection. We propose a tri-relationship embedding framework TriFN, which models publisher-news relations and user-news interactions simultaneously for fake news classification. We conduct experiments on two real-world datasets, which demonstrate that the proposed approach significantly outperforms other baseline methods for fake news detection.

Posted Content
Ziniu Hu1, Weiqing Liu1, Jiang Bian1, Xuanzhe Liu2, Tie-Yan Liu1 
TL;DR: A Hybrid Attention Networks (HAN) is designed to predict the stock trend based on the sequence of recent related news, and the self-paced learning mechanism is applied to imitate the third principle.
Abstract: Stock trend prediction plays a critical role in seeking maximized profit from stock investment. However, precise trend prediction is very difficult since the highly volatile and non-stationary nature of stock market. Exploding information on Internet together with advancing development of natural language processing and text mining techniques have enable investors to unveil market trends and volatility from online content. Unfortunately, the quality, trustworthiness and comprehensiveness of online content related to stock market varies drastically, and a large portion consists of the low-quality news, comments, or even rumors. To address this challenge, we imitate the learning process of human beings facing such chaotic online news, driven by three principles: sequential content dependency, diverse influence, and effective and efficient learning. In this paper, to capture the first two principles, we designed a Hybrid Attention Networks to predict the stock trend based on the sequence of recent related news. Moreover, we apply the self-paced learning mechanism to imitate the third principle. Extensive experiments on real-world stock market data demonstrate the effectiveness of our approach.

Posted Content
TL;DR: A novel framework to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN), which can significantly outperform baseline methods when applied for detecting network outliers.
Abstract: In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -- flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task.

Posted Content
TL;DR: In this paper, the authors discuss how bot capabilities can be extended and controlled by integrating humans into the process and reason that this is currently the most promising way to go in order to realize effective interactions with other humans.
Abstract: Social bots are currently regarded an influential but also somewhat mysterious factor in public discourse and opinion making. They are considered to be capable of massively distributing propaganda in social and online media and their application is even suspected to be partly responsible for recent election results. Astonishingly, the term `Social Bot' is not well defined and different scientific disciplines use divergent definitions. This work starts with a balanced definition attempt, before providing an overview of how social bots actually work (taking the example of Twitter) and what their current technical limitations are. Despite recent research progress in Deep Learning and Big Data, there are many activities bots cannot handle well. We then discuss how bot capabilities can be extended and controlled by integrating humans into the process and reason that this is currently the most promising way to go in order to realize effective interactions with other humans.

Posted Content
TL;DR: A visual survey of key literature using CiteSpace to identify the most influential, central, as well as active nodes using scientometric analyses and finds that Yong Wang is a pivot node with the highest centrality.
Abstract: Community structure is an important area of research. It has received a considerable attention from the scientific community. Despite its importance, one of the key problems in locating information about community detection is the diverse spread of related articles across various disciplines. To the best of our knowledge, there is no current comprehensive review of recent literature which uses a scientometric analysis using complex networks analysis covering all relevant articles from the Web of Science (WoS). Here we present a visual survey of key literature using CiteSpace. The idea is to identify emerging trends besides using network techniques to examine the evolution of the domain. Towards that end, we identify the most influential, central, as well as active nodes using scientometric analyses. We examine authors, key articles, cited references, core subject categories, key journals, institutions, as well as countries. The exploration of the scientometric literature of the domain reveals that Yong Wang is a pivot node with the highest centrality. Additionally, we have observed that Mark Newman is the most highly cited author in the network. We have also identified that the journal, "Reviews of Modern Physics" has the strongest citation burst. In terms of cited documents, an article by Andrea Lancichinetti has the highest centrality score. We have also discovered that the origin of the key publications in this domain is from the United States. Whereas Scotland has the strongest and longest citation burst. Additionally, we have found that the categories of "Computer Science" and "Engineering" lead other categories based on frequency and centrality respectively.

Posted Content
TL;DR: In this paper, the authors outline the recipe of how social networks are used to spread misinformation and show how it can be successfully used during the 2016 U.S. presidential election.
Abstract: In 2010, a paper entitled "From Obscurity to Prominence in Minutes: Political Speech and Real-time search" won the Best Paper Prize of the Web Science 2010 Conference. Among its findings were the discovery and documentation of what was termed a "Twitter-bomb", an organized effort to spread misinformation about the democratic candidate Martha Coakley through anonymous Twitter accounts. In this paper, after summarizing the details of that event, we outline the recipe of how social networks are used to spread misinformation. One of the most important steps in such a recipe is the "infiltration" of a community of users who are already engaged in conversations about a topic, to use them as organic spreaders of misinformation in their extended subnetworks. Then, we take this misinformation spreading recipe and indicate how it was successfully used to spread fake news during the 2016 U.S. Presidential Election. The main differences between the scenarios are the use of Facebook instead of Twitter, and the respective motivations (in 2010: political influence; in 2016: financial benefit through online advertising). After situating these events in the broader context of exploiting the Web, we seize this opportunity to address limitations of the reach of research findings and to start a conversation about how communities of researchers can increase their impact on real-world societal issues.

Posted Content
TL;DR: An extremely bursty behavior of rumor-related topics is discovered, and it is shown that, once the questionable topic is detected, it is possible to identify rumor-bearing tweets using automated techniques.
Abstract: In February 2016, World Health Organization declared the Zika outbreak a Public Health Emergency of International Concern. With developing evidence it can cause birth defects, and the Summer Olympics coming up in the worst affected country, Brazil, the virus caught fire on social media. In this work, use Zika as a case study in building a tool for tracking the misinformation around health concerns on Twitter. We collect more than 13 million tweets -- spanning the initial reports in February 2016 and the Summer Olympics -- regarding the Zika outbreak and track rumors outlined by the World Health Organization and Snopes fact checking website. The tool pipeline, which incorporates health professionals, crowdsourcing, and machine learning, allows us to capture health-related rumors around the world, as well as clarification campaigns by reputable health organizations. In the case of Zika, we discover an extremely bursty behavior of rumor-related topics, and show that, once the questionable topic is detected, it is possible to identify rumor-bearing tweets using automated techniques. Thus, we illustrate insights the proposed tools provide into potentially harmful information on social media, allowing public health researchers and practitioners to respond with a targeted and timely action.

Posted Content
TL;DR: This paper analyzes 1.67 million Facebook posts created by 153 media organizations to understand the extent of clickbait practice, its impact and user engagement by using the model developed, which uses distributed sub-word embeddings learned from a large corpus.
Abstract: The use of alluring headlines (clickbait) to tempt the readers has become a growing practice nowadays. For the sake of existence in the highly competitive media industry, most of the on-line media including the mainstream ones, have started following this practice. Although the wide-spread practice of clickbait makes the reader's reliability on media vulnerable, a large scale analysis to reveal this fact is still absent. In this paper, we analyze 1.67 million Facebook posts created by 153 media organizations to understand the extent of clickbait practice, its impact and user engagement by using our own developed clickbait detection model. The model uses distributed sub-word embeddings learned from a large corpus. The accuracy of the model is 98.3%. Powered with this model, we further study the distribution of topics in clickbait and non-clickbait contents.