scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Social and Information Networks in 2018"


Journal ArticleDOI
TL;DR: The Leiden algorithm is found to be faster than the Louvain algorithm and uncovers better partitions, in addition to providing explicit guarantees on communities that are guaranteed to be connected.
Abstract: Community detection is often used to understand the structure of large and complex networks. One of the most popular algorithms for uncovering community structure is the so-called Louvain algorithm. We show that this algorithm has a major defect that largely went unnoticed until now: the Louvain algorithm may yield arbitrarily badly connected communities. In the worst case, communities may even be disconnected, especially when running the algorithm iteratively. In our experimental analysis, we observe that up to 25% of the communities are badly connected and up to 16% are disconnected. To address this problem, we introduce the Leiden algorithm. We prove that the Leiden algorithm yields communities that are guaranteed to be connected. In addition, we prove that, when the Leiden algorithm is applied iteratively, it converges to a partition in which all subsets of all communities are locally optimally assigned. Furthermore, by relying on a fast local move approach, the Leiden algorithm runs faster than the Louvain algorithm. We demonstrate the performance of the Leiden algorithm for several benchmark and real-world networks. We find that the Leiden algorithm is faster than the Louvain algorithm and uncovers better partitions, in addition to providing explicit guarantees.

592 citations


Posted Content
TL;DR: This work presents an efficient algorithm DynGEM, based on recent advances in deep autoencoders for graph embeddings, that can handle growing dynamic graphs, and has better running time than using static embedding methods on each snapshot of a dynamic graph.
Abstract: Embedding large graphs in low dimensional spaces has recently attracted significant interest due to its wide applications such as graph visualization, link prediction and node classification. Existing methods focus on computing the embedding for static graphs. However, many graphs in practical applications are dynamic and evolve constantly over time. Naively applying existing embedding algorithms to each snapshot of dynamic graphs independently usually leads to unsatisfactory performance in terms of stability, flexibility and efficiency. In this work, we present an efficient algorithm DynGEM based on recent advances in deep autoencoders for graph embeddings, to address this problem. The major advantages of DynGEM include: (1) the embedding is stable over time, (2) it can handle growing dynamic graphs, and (3) it has better running time than using static embedding methods on each snapshot of a dynamic graph. We test DynGEM on a variety of tasks including graph visualization, graph reconstruction, link prediction and anomaly detection (on both synthetic and real datasets). Experimental results demonstrate the superior stability and scalability of our approach.

285 citations


Posted Content
TL;DR: A comprehensive survey spanning diverse aspects of false information is presented, namely the actors involved in spreading false information, rationale behind successfully deceiving readers, quantifying the impact offalse information, and algorithms developed to detect false information.
Abstract: False information can be created and spread easily through the web and social media platforms, resulting in widespread real-world impact. Characterizing how false information proliferates on social platforms and why it succeeds in deceiving readers are critical to develop efficient detection algorithms and tools for early detection. A recent surge of research in this area has aimed to address the key issues using methods based on feature engineering, graph mining, and information modeling. Majority of the research has primarily focused on two broad categories of false information: opinion-based (e.g., fake reviews), and fact-based (e.g., false news and hoaxes). Therefore, in this work, we present a comprehensive survey spanning diverse aspects of false information, namely (i) the actors involved in spreading false information, (ii) rationale behind successfully deceiving readers, (iii) quantifying the impact of false information, (iv) measuring its characteristics across different dimensions, and finally, (iv) algorithms developed to detect false information. In doing so, we create a unified framework to describe these recent methods and highlight a number of important directions for future research.

252 citations


Proceedings ArticleDOI
TL;DR: Although an ideologically broad swath of Twitter users were exposed to Russian trolls in the period leading up to the 2016 U.S. Presidential election, it was mainly conservatives who helped amplify their message, revealing that they had a mostly conservative, pro-Trump agenda.
Abstract: Until recently, social media was seen to promote democratic discourse on social and political issues However, this powerful communication platform has come under scrutiny for allowing hostile actors to exploit online discussions in an attempt to manipulate public opinion A case in point is the ongoing US Congress' investigation of Russian interference in the 2016 US election campaign, with Russia accused of using trolls (malicious accounts created to manipulate) and bots to spread misinformation and politically biased information In this study, we explore the effects of this manipulation campaign, taking a closer look at users who re-shared the posts produced on Twitter by the Russian troll accounts publicly disclosed by US Congress investigation We collected a dataset with over 43 million election-related posts shared on Twitter between September 16 and October 21, 2016, by about 57 million distinct users This dataset included accounts associated with the identified Russian trolls We use label propagation to infer the ideology of all users based on the news sources they shared This method enables us to classify a large number of users as liberal or conservative with precision and recall above 90% Conservatives retweeted Russian trolls about 31 times more often than liberals and produced 36x more tweets Additionally, most retweets of troll content originated from two Southern states: Tennessee and Texas Using state-of-the-art bot detection techniques, we estimated that about 49% and 62% of liberal and conservative users respectively were bots Text analysis on the content shared by trolls reveals that they had a mostly conservative, pro-Trump agenda Although an ideologically broad swath of Twitter users was exposed to Russian Trolls in the period leading up to the 2016 US Presidential election, it was mainly conservatives who helped amplify their message

201 citations


Posted Content
TL;DR: This work proposes an incremental and iterative methodology, that utilizes the power of crowdsourcing to annotate a large scale collection of tweets with a set of abuse-related labels, and identifies a reduced but robust set of labels.
Abstract: In recent years, offensive, abusive and hateful language, sexism, racism and other types of aggressive and cyberbullying behavior have been manifesting with increased frequency, and in many online social media platforms. In fact, past scientific work focused on studying these forms in popular media, such as Facebook and Twitter. Building on such work, we present an 8-month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior, at the same time. We propose an incremental and iterative methodology, that utilizes the power of crowdsourcing to annotate a large scale collection of tweets with a set of abuse-related labels. In fact, by applying our methodology including statistical analysis for label merging or elimination, we identify a reduced but robust set of labels. Finally, we offer a first overview and findings of our collected and annotated dataset of 100 thousand tweets, which we make publicly available for further scientific exploration.

183 citations


Journal ArticleDOI
TL;DR: In this paper, the authors highlight a novel chapter of control theory, dealing with dynamic models of social networks and processes over them, to the attention of the broad research community, and focus on more recent models of complex networks that have been developed concurrently with MAS theory.
Abstract: Recent years have witnessed a significant trend towards filling the gap between Social Network Analysis (SNA) and control theory. This trend was enabled by the introduction of new mathematical models describing dynamics of social groups, the development of algorithms and software for data analysis and the tremendous progress in understanding complex networks and multi-agent systems (MAS) dynamics. The aim of this tutorial is to highlight a novel chapter of control theory, dealing with dynamic models of social networks and processes over them, to the attention of the broad research community. In its first part [1], we have considered the most classical models of social dynamics, which have anticipated and to a great extent inspired the recent extensive studies on MAS and complex networks. This paper is the second part of the tutorial, and it is focused on more recent models of social processes that have been developed concurrently with MAS theory. Future perspectives of control in social and techno-social systems are also discussed.

176 citations


Posted Content
TL;DR: This paper measured trends in the diffusion of misinformation on Facebook and Twitter between January 2015 and July 2018, focusing on stories from 570 sites that have been identified as producers of false stories and found that interactions with these sites on both Facebook, while they continued to rise on Twitter, with the ratio of Facebook engagements to Twitter shares falling by approximately 60 percent.
Abstract: We measure trends in the diffusion of misinformation on Facebook and Twitter between January 2015 and July 2018. We focus on stories from 570 sites that have been identified as producers of false stories. Interactions with these sites on both Facebook and Twitter rose steadily through the end of 2016. Interactions then fell sharply on Facebook while they continued to rise on Twitter, with the ratio of Facebook engagements to Twitter shares falling by approximately 60 percent. We see no similar pattern for other news, business, or culture sites, where interactions have been relatively stable over time and have followed similar trends on the two platforms both before and after the election.

172 citations


Journal ArticleDOI
TL;DR: It is found that, while top influencers spreading traditional center and left leaning news largely influence the activity of Clinton supporters, this causality is reversed for the fake news: the activityof Trump supporters influences the dynamics of the top fake news spreaders.
Abstract: The dynamics and influence of fake news on Twitter during the 2016 US presidential election remains to be clarified. Here, we use a dataset of 171 million tweets in the five months preceding the election day to identify 30 million tweets, from 2.2 million users, which contain a link to news outlets. Based on a classification of news outlets curated by this http URL, we find that 25% of these tweets spread either fake or extremely biased news. We characterize the networks of information flow to find the most influential spreaders of fake and traditional news and use causal modeling to uncover how fake news influenced the presidential election. We find that, while top influencers spreading traditional center and left leaning news largely influence the activity of Clinton supporters, this causality is reversed for the fake news: the activity of Trump supporters influences the dynamics of the top fake news spreaders.

169 citations


Posted Content
TL;DR: GEMSEC as mentioned in this paper places nodes in an abstract feature space where the vertex features minimize the negative log-likelihood of preserving sampled vertex neighborhoods, and incorporates known social network properties through a machine learning regularization.
Abstract: Modern graph embedding procedures can efficiently process graphs with millions of nodes In this paper, we propose GEMSEC -- a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their embedding GEMSEC is a general extension of earlier work in the domain of sequence-based graph embedding GEMSEC places nodes in an abstract feature space where the vertex features minimize the negative log-likelihood of preserving sampled vertex neighborhoods, and it incorporates known social network properties through a machine learning regularization We present two new social network datasets and show that by simultaneously considering the embedding and clustering problems with respect to social properties, GEMSEC extracts high-quality clusters competitive with or superior to other community detection algorithms In experiments, the method is found to be computationally efficient and robust to the choice of hyperparameters

148 citations


Posted Content
TL;DR: This study performs the first cross-sectional view of how hateful users diffuse hate content in online social media on Gab and finds that the hateful users are far more densely connected among themselves.
Abstract: The present online social media platform is afflicted with several issues, with hate speech being on the predominant forefront. The prevalence of online hate speech has fueled horrific real-world hate-crime such as the mass-genocide of Rohingya Muslims, communal violence in Colombo and the recent massacre in the Pittsburgh synagogue. Consequently, It is imperative to understand the diffusion of such hateful content in an online setting. We conduct the first study that analyses the flow and dynamics of posts generated by hateful and non-hateful users on Gab (this http URL) over a massive dataset of 341K users and 21M posts. Our observations confirms that hateful content diffuse farther, wider and faster and have a greater outreach than those of non-hateful users. A deeper inspection into the profiles and network of hateful and non-hateful users reveals that the former are more influential, popular and cohesive. Thus, our research explores the interesting facets of diffusion dynamics of hateful users and broadens our understanding of hate speech in the online world.

134 citations


Posted Content
TL;DR: FakeNewsNet as mentioned in this paper is a fake news data repository that includes news content, social context, and dynamic information, which contains two comprehensive datasets that include news content and social context.
Abstract: Social media has become a popular means for people to consume news. Meanwhile, it also enables the wide dissemination of fake news, i.e., news with intentionally false information, which brings significant negative effects to the society. Thus, fake news detection is attracting increasing attention. However, fake news detection is a non-trivial task, which requires multi-source information such as news content, social context, and dynamic information. First, fake news is written to fool people, which makes it difficult to detect fake news simply based on news contents. In addition to news contents, we need to explore social contexts such as user engagements and social behaviors. For example, a credible user's comment that "this is a fake news" is a strong signal for detecting fake news. Second, dynamic information such as how fake news and true news propagate and how users' opinions toward news pieces are very important for extracting useful patterns for (early) fake news detection and intervention. Thus, comprehensive datasets which contain news content, social context, and dynamic information could facilitate fake news propagation, detection, and mitigation; while to the best of our knowledge, existing datasets only contains one or two aspects. Therefore, in this paper, to facilitate fake news related researches, we provide a fake news data repository FakeNewsNet, which contains two comprehensive datasets that includes news content, social context, and dynamic information. We present a comprehensive description of datasets collection, demonstrate an exploratory analysis of this data repository from different perspectives, and discuss the benefits of FakeNewsNet for potential applications on fake news study on social media.

Proceedings ArticleDOI
TL;DR: Gab as discussed by the authors is a new social network for the dissemination and discussion of news and world events, and it attracts alt-right users, conspiracy theorists, and other trolls, as well as hate speech on the platform.
Abstract: Over the past few years, a number of new "fringe" communities, like 4chan or certain subreddits, have gained traction on the Web at a rapid pace However, more often than not, little is known about how they evolve or what kind of activities they attract, despite recent research has shown that they influence how false information reaches mainstream communities This motivates the need to monitor these communities and analyze their impact on the Web's information ecosystem In August 2016, a new social network called Gab was created as an alternative to Twitter It positions itself as putting "people and free speech first'", welcoming users banned or suspended from other social networks In this paper, we provide, to the best of our knowledge, the first characterization of Gab We collect and analyze 22M posts produced by 336K users between August 2016 and January 2018, finding that Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls We also measure the prevalence of hate speech on the platform, finding it to be much higher than Twitter, but lower than 4chan's Politically Incorrect board

Book ChapterDOI
TL;DR: Recent advancements in applied studies of complex contagions, particularly in the domains of health, innovation diffusion, social media, and politics are discussed, including both diversity of demographic profiles among local peers and the broader notion of structural diversity within a network.
Abstract: Since the publication of “Complex Contagions and the Weakness of Long Ties” in 2007, complex contagions have been studied across an enormous variety of social domains. In reviewing this decade of research, we discuss recent advancements in applied studies of complex contagions, particularly in the domains of health, innovation diffusion, social media, and politics. We also discuss how these empirical studies have spurred complementary advancements in the theoretical modeling of contagions, which concern the effects of network topology on diffusion, as well as the effects of individual-level attributes and thresholds. In synthesizing these developments, we suggest three main directions for future research. The first concerns the study of how multiple contagions interact within the same network and across networks, in what may be called an ecology of contagions. The second concerns the study of how the structure of thresholds and their behavioral consequences can vary by individual and social context. The third area concerns the roles of diversity and homophily in the dynamics of complex contagion, including both diversity of demographic profiles among local peers and the broader notion of structural diversity within a network. Throughout this discussion, we make an effort to highlight the theoretical and empirical opportunities that lie ahead.

Posted Content
TL;DR: This article found that extremist violence tends to lead to an increase in online hate speech, particularly on messages directly advocating violence, and the effect of violent events on the volume and type of hateful speech on two social media platforms, Twitter and Reddit.
Abstract: User-generated content online is shaped by many factors, including endogenous elements such as platform affordances and norms, as well as exogenous elements, in particular significant events. These impact what users say, how they say it, and when they say it. In this paper, we focus on quantifying the impact of violent events on various types of hate speech, from offensive and derogatory to intimidation and explicit calls for violence. We anchor this study in a series of attacks involving Arabs and Muslims as perpetrators or victims, occurring in Western countries, that have been covered extensively by news media. These attacks have fueled intense policy debates around immigration in various fora, including online media, which have been marred by racist prejudice and hateful speech. The focus of our research is to model the effect of the attacks on the volume and type of hateful speech on two social media platforms, Twitter and Reddit. Among other findings, we observe that extremist violence tends to lead to an increase in online hate speech, particularly on messages directly advocating violence. Our research has implications for the way in which hate speech online is monitored and suggests ways in which it could be fought.

Proceedings ArticleDOI
TL;DR: In this paper, a hybrid of convolutional neural networks and long short term recurrent neural network models is proposed to detect and classify fake news messages from Twitter posts using 82% accuracy.
Abstract: The problem associated with the propagation of fake news continues to grow at an alarming scale. This trend has generated much interest from politics to academia and industry alike. We propose a framework that detects and classifies fake news messages from Twitter posts using hybrid of convolutional neural networks and long-short term recurrent neural network models. The proposed work using this deep learning approach achieves 82% accuracy. Our approach intuitively identifies relevant features associated with fake news stories without previous knowledge of the domain.

Journal ArticleDOI
TL;DR: Using graphs to model pairwise relationships between entities is a ubiquitous framework for studying complex systems and data and Simplicial complexes extend this dyadic model of graphs to polyadic structures.
Abstract: Focusing on coupling between edges, we generalize the relationship between the normalized graph Laplacian and random walks on graphs by devising an appropriate normalization for the Hodge Laplacian -- the generalization of the graph Laplacian for simplicial complexes -- and relate this to a random walk on edges. Importantly, these random walks are intimately connected to the topology of the simplicial complex, just as random walks on graphs are related to the topology of the graph. This serves as a foundational step towards incorporating Laplacian-based analytics for higher-order interactions. We demonstrate how to use these dynamics for data analytics that extract information about the edge-space of a simplicial complex that complements and extends graph-based analysis. Specifically, we use our normalized Hodge Laplacian to derive spectral embeddings for examining trajectory data of ocean drifters near Madagascar and also develop a generalization of personalized PageRank for the edge-space of simplicial complexes to analyze a book co-purchasing dataset.

Posted Content
TL;DR: In this article, the authors used community detection algorithms to automatically detect the emergent communities from the users activity and to quantify the cohesiveness over time of the communities, finding that content consumption about vaccines is dominated by the echo-chamber effect and that polarization increased over years.
Abstract: Vaccine hesitancy has been recognized as a major global health threat. Having access to any type of information in social media has been suggested as a potential powerful influence factor to hesitancy. Recent studies in other fields than vaccination show that access to a wide amount of content through the Internet without intermediaries resolved into major segregation of the users in polarized groups. Users select the information adhering to theirs system of beliefs and tend to ignore dissenting information. In this paper we assess whether there is polarization in Social Media use in the field of vaccination. We perform a thorough quantitative analysis on Facebook analyzing 2.6M users interacting with 298.018 posts over a time span of seven years and 5 months. We used community detection algorithms to automatically detect the emergent communities from the users activity and to quantify the cohesiveness over time of the communities. Our findings show that content consumption about vaccines is dominated by the echo-chamber effect and that polarization increased over years. Communities emerge from the users consumption habits, i.e. the majority of users only consumes information in favor or against vaccines, not both. The existence of echo-chambers may explain why social-media campaigns providing accurate information may have limited reach, may be effective only in sub-groups and might even foment further polarization of opinions. The introduction of dissenting information into a sub-group is disregarded and can have a backfire effect, further reinforcing the existing opinions within the sub-group.

Proceedings ArticleDOI
TL;DR: This work proposes REGAL (REpresentation learning-based Graph ALignment), a framework that leverages the power of automatically-learned node representations to match nodes across different graphs, and devise xNetMF, an elegant and principled node embedding formulation that uniquely generalizes to multi-network problems.
Abstract: Problems involving multiple networks are prevalent in many scientific and other domains. In particular, network alignment, or the task of identifying corresponding nodes in different networks, has applications across the social and natural sciences. Motivated by recent advancements in node representation learning for single-graph tasks, we propose REGAL (REpresentation learning-based Graph ALignment), a framework that leverages the power of automatically-learned node representations to match nodes across different graphs. Within REGAL we devise xNetMF, an elegant and principled node embedding formulation that uniquely generalizes to multi-network problems. Our results demonstrate the utility and promise of unsupervised representation learning-based network alignment in terms of both speed and accuracy. REGAL runs up to 30x faster in the representation learning stage than comparable methods, outperforms existing network alignment methods by 20 to 30% accuracy on average, and scales to networks with millions of nodes each.

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed GC-LSTM, a Graph Convolution Network (GC) embedded Long Short Term Memory network (LTSM), for end-to-end dynamic link prediction.
Abstract: Dynamic link prediction is a research hot in complex networks area, especially for its wide applications in biology, social network, economy and industry. Compared with static link prediction, dynamic one is much more difficult since network structure evolves over time. Currently most researches focus on static link prediction which cannot achieve expected performance in dynamic network. Aiming at low AUC, high Error Rate, add/remove link prediction difficulty, we propose GC-LSTM, a Graph Convolution Network (GC) embedded Long Short Term Memory network (LTSM), for end-to-end dynamic link prediction. To the best of our knowledge, it is the first time that GCN embedded LSTM is put forward for link prediction of dynamic networks. GCN in this new deep model is capable of node structure learning of network snapshot for each time slide, while LSTM is responsible for temporal feature learning for network snapshot. Besides, current dynamic link prediction method can only handle removed links, GC-LSTM can predict both added or removed link at the same time. Extensive experiments are carried out to testify its performance in aspects of prediction accuracy, Error Rate, add/remove link prediction and key link prediction. The results prove that GC-LSTM outperforms current state-of-art method.

Proceedings ArticleDOI
TL;DR: The HEER algorithm is proposed, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics, and demonstrates the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneity metrics.
Abstract: Heterogeneous information networks (HINs) are ubiquitous in real-world applications. In the meantime, network embedding has emerged as a convenient tool to mine and learn from networked data. As a result, it is of interest to develop HIN embedding methods. However, the heterogeneity in HINs introduces not only rich information but also potentially incompatible semantics, which poses special challenges to embedding learning in HINs. With the intention to preserve the rich yet potentially incompatible information in HIN embedding, we propose to study the problem of comprehensive transcription of heterogeneous information networks. The comprehensive transcription of HINs also provides an easy-to-use approach to unleash the power of HINs, since it requires no additional supervision, expertise, or feature engineering. To cope with the challenges in the comprehensive transcription of HINs, we propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics. To corroborate the efficacy of HEER, we conducted experiments on two large-scale real-words datasets with an edge reconstruction task and multiple case studies. Experiment results demonstrate the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneous metrics. The code and data are available at this https URL.

Posted Content
TL;DR: In this paper, the authors present a survey on the progress in and around the TSS problem and discuss current research trends and future research directions, as well as discuss current and future directions as well.
Abstract: Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). This is an active area of research in computational social network analysis domain since one and half decades or so. Due to its practical importance in various domains, such as viral marketing, target advertisement, personalized recommendation, the problem has been studied in different variants, and different solution methodologies have been proposed over the years. Hence, there is a need for an organized and comprehensive review on this topic. This paper presents a survey on the progress in and around TSS Problem. At last, it discusses current research trends and future research directions as well.

Proceedings ArticleDOI
TL;DR: In this article, the authors study inter-community interactions across 36,000 communities on Reddit, examining cases where users of one community are mobilized by negative sentiment to comment in another community.
Abstract: Users organize themselves into communities on web platforms. These communities can interact with one another, often leading to conflicts and toxic interactions. However, little is known about the mechanisms of interactions between communities and how they impact users. Here we study intercommunity interactions across 36,000 communities on Reddit, examining cases where users of one community are mobilized by negative sentiment to comment in another community. We show that such conflicts tend to be initiated by a handful of communities---less than 1% of communities start 74% of conflicts. While conflicts tend to be initiated by highly active community members, they are carried out by significantly less active members. We find that conflicts are marked by formation of echo chambers, where users primarily talk to other users from their own community. In the long-term, conflicts have adverse effects and reduce the overall activity of users in the targeted communities. Our analysis of user interactions also suggests strategies for mitigating the negative impact of conflicts---such as increasing direct engagement between attackers and defenders. Further, we accurately predict whether a conflict will occur by creating a novel LSTM model that combines graph embeddings, user, community, and text features. This model can be used toreate early-warning systems for community moderators to prevent conflicts. Altogether, this work presents a data-driven view of community interactions and conflict, and paves the way towards healthier online communities.

Posted Content
TL;DR: In this paper, a large multi-modal dataset collected from Twitter during different natural disasters was used to address a number of crisis response and management tasks for different humanitarian organizations, and three types of annotations were provided.
Abstract: During natural and man-made disasters, people use social media platforms such as Twitter to post textual and multime- dia content to report updates about injured or dead people, infrastructure damage, and missing or found people among other information types. Studies have revealed that this on- line information, if processed timely and effectively, is ex- tremely useful for humanitarian organizations to gain situational awareness and plan relief operations. In addition to the analysis of textual content, recent studies have shown that imagery content on social media can boost disaster response significantly. Despite extensive research that mainly focuses on textual content to extract useful information, limited work has focused on the use of imagery content or the combination of both content types. One of the reasons is the lack of labeled imagery data in this domain. Therefore, in this paper, we aim to tackle this limitation by releasing a large multi-modal dataset collected from Twitter during different natural disasters. We provide three types of annotations, which are useful to address a number of crisis response and management tasks for different humanitarian organizations.

Journal ArticleDOI
TL;DR: A typology of the Web’s false-information ecosystem, composed of various types of false- information, actors, and their motives is provided, which pays particular attention to political false information as it can have dire consequences to the community and previous work shows that this type of false information propagates faster and further when compared to other types offalse information.
Abstract: A new era of Information Warfare has arrived. Various actors, including state-sponsored ones, are weaponizing information on Online Social Networks to run false information campaigns with targeted manipulation of public opinion on specific topics. These false information campaigns can have dire consequences to the public: mutating their opinions and actions, especially with respect to critical world events like major elections. Evidently, the problem of false information on the Web is a crucial one, and needs increased public awareness, as well as immediate attention from law enforcement agencies, public institutions, and in particular, the research community. In this paper, we make a step in this direction by providing a typology of the Web's false information ecosystem, comprising various types of false information, actors, and their motives. We report a comprehensive overview of existing research on the false information ecosystem by identifying several lines of work: 1) how the public perceives false information; 2) understanding the propagation of false information; 3) detecting and containing false information on the Web; and 4) false information on the political stage. In this work, we pay particular attention to political false information as: 1) it can have dire consequences to the community (e.g., when election results are mutated) and 2) previous work show that this type of false information propagates faster and further when compared to other types of false information. Finally, for each of these lines of work, we report several future research directions that can help us better understand and mitigate the emerging problem of false information dissemination on the Web.

Posted Content
TL;DR: It is found that hate instigators target more popular and high profile Twitter users, and that participating in hate speech can result in greater online visibility, which advance the state of the art of understanding online hate speech engagement.
Abstract: While social media has become an empowering agent to individual voices and freedom of expression, it also facilitates anti-social behaviors including online harassment, cyberbullying, and hate speech. In this paper, we present the first comparative study of hate speech instigators and target users on Twitter. Through a multi-step classification process, we curate a comprehensive hate speech dataset capturing various types of hate. We study the distinctive characteristics of hate instigators and targets in terms of their profile self-presentation, activities, and online visibility. We find that hate instigators target more popular and high profile Twitter users, and that participating in hate speech can result in greater online visibility. We conduct a personality analysis of hate instigators and targets and show that both groups have eccentric personality facets that differ from the general Twitter population. Our results advance the state of the art of understanding online hate speech engagement.

Posted Content
TL;DR: This paper detects and measure the propagation of memes across multiple Web communities, using a processing pipeline based on perceptual hashing and clustering techniques, and a dataset of 160M images from 2.6B posts gathered from Twitter, Reddit, 4chan's Politically Incorrect board, and Gab, over the course of 13 months.
Abstract: Internet memes are increasingly used to sway and manipulate public opinion. This prompts the need to study their propagation, evolution, and influence across the Web. In this paper, we detect and measure the propagation of memes across multiple Web communities, using a processing pipeline based on perceptual hashing and clustering techniques, and a dataset of 160M images from 2.6B posts gathered from Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab, over the course of 13 months. We group the images posted on fringe Web communities (/pol/, Gab, and The_Donald subreddit) into clusters, annotate them using meme metadata obtained from Know Your Meme, and also map images from mainstream communities (Twitter and Reddit) to the clusters. Our analysis provides an assessment of the popularity and diversity of memes in the context of each community, showing, e.g., that racist memes are extremely common in fringe Web communities. We also find a substantial number of politics-related memes on both mainstream and fringe Web communities, supporting media reports that memes might be used to enhance or harm politicians. Finally, we use Hawkes processes to model the interplay between Web communities and quantify their reciprocal influence, finding that /pol/ substantially influences the meme ecosystem with the number of memes it produces, while \td has a higher success rate in pushing them to other communities.

Posted Content
TL;DR: This survey gives an overview of network embeddings by summarizing and categorizing recent advancements in this research field, and discusses network embedding methods under different scenarios, such as supervised versus unsupervised learning, learningembeddings for homogeneous networks versus for heterogeneous networks, etc.
Abstract: Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering, link prediction, and visualization. In this survey, we give an overview of network embeddings by summarizing and categorizing recent advancements in this research field. We first discuss the desirable properties of network embeddings and briefly introduce the history of network embedding algorithms. Then, we discuss network embedding methods under different scenarios, such as supervised versus unsupervised learning, learning embeddings for homogeneous networks versus for heterogeneous networks, etc. We further demonstrate the applications of network embeddings, and conclude the survey with future work in this area.

Proceedings ArticleDOI
TL;DR: The Network Laplacian Spectral Descriptor (NetLSD) as discussed by the authors is the first graph representation method that is invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable.
Abstract: Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representation-based methods, which are all inefficient and impractical for large graph collections. In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD): the first, to our knowledge, permutation- and size-invariant, scale-adaptive, and efficiently computable graph representation method that allows for straightforward comparisons of large graphs. NetLSD extracts a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel; thus, it hears the shape of a graph. Our evaluation on a variety of real-world graphs demonstrates that it outperforms previous works in both expressiveness and efficiency.

Posted Content
TL;DR: This paper analyzed 27k tweets posted by 1k Twitter users identified as having ties with Russia's Internet Research Agency and thus likely state-sponsored trolls, and quantified the influence that trolls had on the dissemination of news on social platforms like Twitter, Reddit, and 4chan.
Abstract: Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or "trolls" Although they are often involved in spreading disinformation on social media, there is little understanding of how these trolls operate, what type of content they disseminate, and most importantly their influence on the information ecosystem In this paper, we shed light on these questions by analyzing 27K tweets posted by 1K Twitter users identified as having ties with Russia's Internet Research Agency and thus likely state-sponsored trolls We compare their behavior to a random set of Twitter users, finding interesting differences in terms of the content they disseminate, the evolution of their account, as well as their general behavior and use of Twitter Then, using Hawkes Processes, we quantify the influence that trolls had on the dissemination of news on social platforms like Twitter, Reddit, and 4chan Overall, our findings indicate that Russian trolls managed to stay active for long periods of time and to reach a substantial number of Twitter users with their tweets When looking at their ability of spreading news content and making it viral, however, we find that their effect on social platforms was minor, with the significant exception of news published by the Russian state-sponsored news outlet RT (Russia Today)

Proceedings ArticleDOI
TL;DR: Experimental results show that the proposed GraphSGAN significantly outperforms several state-of-the-art methods and can be also trained using mini-batch, thus enjoys the scalability advantage.
Abstract: We investigate how generative adversarial nets (GANs) can help semi-supervised learning on graphs. We first provide insights on working principles of adversarial learning over graphs and then present GraphSGAN, a novel approach to semi-supervised learning on graphs. In GraphSGAN, generator and classifier networks play a novel competitive game. At equilibrium, generator generates fake samples in low-density areas between subgraphs. In order to discriminate fake samples from the real, classifier implicitly takes the density property of subgraph into consideration. An efficient adversarial learning algorithm has been developed to improve traditional normalized graph Laplacian regularization with a theoretical guarantee. Experimental results on several different genres of datasets show that the proposed GraphSGAN significantly outperforms several state-of-the-art methods. GraphSGAN can be also trained using mini-batch, thus enjoys the scalability advantage.