scispace - formally typeset
Search or ask a question

Showing papers by "Aristides Gionis published in 2017"


Proceedings ArticleDOI
02 Feb 2017
TL;DR: This paper presents a simple model based on a recently-developed user-level controversy score, that is competitive with state-of-the-art link-prediction algorithms and proposes an efficient algorithm that considers only a fraction of all the possible combinations of edges.
Abstract: Society is often polarized by controversial issues that split the population into groups with opposing views. When such issues emerge on social media, we often observe the creation of `echo chambers', i.e., situations where like-minded people reinforce each other's opinion, but do not get exposed to the views of the opposing side. In this paper we study algorithmic techniques for bridging these chambers, and thus reduce controversy. Specifically, we represent the discussion on a controversial issue with an endorsement graph, and cast our problem as an edge-recommendation problem on this graph. The goal of the recommendation is to reduce the controversy score of the graph, which is measured by a recently-developed metric based on random walks. At the same time, we take into account the acceptance probability of the recommended edge, which represents how likely the edge is to materialize in the endorsement graph. We propose a simple model based on a recently-developed user-level controversy score, that is competitive with state-of-the-art link-prediction algorithms. Our goal then becomes finding the edges that produce the largest reduction in the controversy score, in expectation. To solve this problem, we propose an efficient algorithm that considers only a fraction of all the possible combinations of edges. Experimental results show that our algorithm is more efficient than a simple greedy heuristic, while producing comparable score reduction. Finally, a comparison with other state-of-the-art edge-addition algorithms shows that this problem is fundamentally different from what has been studied in the literature.

168 citations


Posted Content
TL;DR: This paper addresses the problem of balancing the information exposure in a social network by modeling a symmetric difference function, which is neither monotone nor submodular, and thus, not amenable to existing approaches.
Abstract: Social media has brought a revolution on how people are consuming news. Beyond the undoubtedly large number of advantages brought by social-media platforms, a point of criticism has been the creation of echo chambers and filter bubbles, caused by social homophily and algorithmic personalization. In this paper we address the problem of balancing the information exposure in a social network. We assume that two opposing campaigns (or viewpoints) are present in the network, and that network nodes have different preferences towards these campaigns. Our goal is to find two sets of nodes to employ in the respective campaigns, so that the overall information exposure for the two campaigns is balanced. We formally define the problem, characterize its hardness, develop approximation algorithms, and present experimental evaluation results. Our model is inspired by the literature on influence maximization, but we offer significant novelties. First, balance of information exposure is modeled by a symmetric difference function, which is neither monotone nor submodular, and thus, not amenable to existing approaches. Second, while previous papers consider a setting with selfish agents and provide bounds on best response strategies (i.e., move of the last player), we consider a setting with a centralized agent and provide bounds for a global objective function.

49 citations


Proceedings ArticleDOI
25 Jun 2017
TL;DR: This work is the first to study the dynamic evolution of polarized online debates at such scale and finds consistent evidence that increased collective attention is associated with increased network polarization and network concentration within each side of the debate.
Abstract: We study the evolution of long-lived controversial debates as manifested on Twitter from 2011 to 2016. Specifically, we explore how the structure of interactions and content of discussion varies with the level of collective attention, as evidenced by the number of users discussing a topic. Spikes in the volume of users typically correspond to external events that increase the public attention on the topic - as, for instance, discussions about 'gun control' often erupt after a mass shooting.This work is the first to study the dynamic evolution of polarized online debates at such scale. By employing a wide array of network and content analysis measures, we find consistent evidence that increased collective attention is associated with increased network polarization and network concentration within each side of the debate; and overall more uniform lexicon usage across all users.

38 citations


Journal ArticleDOI
TL;DR: It is proved that the problem the authors define is NP-hard, and efficient algorithms are provided by adapting techniques for finding dense subgraphs whose edges occur in short time intervals.
Abstract: Online social networks are often defined by considering interactions of entities at an aggregate level. For example, a call graph is formed among individuals who have called each other at least once; or at least k times. Similarly, in social-media platforms, we consider implicit social networks among users who have interacted in some way, e.g., have made a conversation, have commented to the content of each other, and so on. Such definitions have been used widely in the literature and they have offered significant insights regarding the structure of social networks. However, it is obvious that they suffer from a severe limitation: They neglect the precise time that interactions among the network entities occur. In this article, we consider interaction networks, where the data description contains not only information about the underlying topology of the social network, but also the exact time instances that network entities interact. In an interaction network, an edge is associated with a timestamp, and multiple edges may occur for the same pair of entities. Consequently, interaction networks offer a more fine-grained representation, which can be leveraged to reveal otherwise hidden dynamic phenomena. In the setting of interaction networks, we study the problem of discovering dynamic dense subgraphs whose edges occur in short time intervals. We view such subgraphs as fingerprints of dynamic activity occurring within network communities. Such communities represent groups of individuals who interact with each other in specific time instances, for example, a group of employees who work on a project and whose interaction intensifies before certain project milestones. We prove that the problem we define is NP-hard, and we provide efficient algorithms by adapting techniques for finding dense subgraphs. We also show how to speed-up the proposed methods by exploiting concavity properties of our objective function and by the means of fractional programming. We perform extensive evaluation of the proposed methods on synthetic and real datasets, which demonstrates the validity of our approach and shows that our algorithms can be used to obtain high-quality results.

30 citations


Journal ArticleDOI
TL;DR: It is shown that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction, which allows for a language-independent and fine-grained analysis of user discussions and their evolution over time.

29 citations


Proceedings ArticleDOI
06 Nov 2017
TL;DR: This paper studies the top-k densest-subgraph problem in the sliding-window model and proposes an efficient fully-dynamic algorithm that profits from the observation that updates only affect a limited region of the graph.
Abstract: Given a large graph,the densest-subgraph problem asks to find a subgraph with maximum average degree. When considering the top-k version of this problem, a naive solution is to iteratively find the densest subgraph and remove it in each iteration. However, such a solution is impractical due to high processing cost. The problem is further complicated when dealing with dynamic graphs, since adding or removing an edge requires re-running the algorithm. In this paper, we study the top-k densest-subgraph problem in the sliding-window model and propose an efficient fully-dynamic algorithm. The input of our algorithm consists of an edge stream, and the goal is to find the node-disjoint subgraphs that maximize the sum of their densities. In contrast to existing state-of-the-art solutions that require iterating over the entire graph upon any update, our algorithm profits from the observation that updates only affect a limited region of the graph. Therefore, the top-k densest subgraphs are maintained by only applying local updates. We provide a theoretical analysis of the proposed algorithm and show empirically that the algorithm often generates denser subgraphs than state-of-the-art competitors. Experiments show an improvement in efficiency of up to five orders of magnitude compared to state-of-the-art solutions.

24 citations


Journal ArticleDOI
TL;DR: A new method to analyze team activity data by segmenting the overall activity stream into a sequence of potentially recurrent modes, which reflect different strategies adopted by a team, and thus, help to analyze and understand team tactics.
Abstract: Recent advances in data-acquisition technologies have equipped team coaches and sports analysts with the capability of collecting and analyzing detailed data of team activity in the field. It is now possible to monitor a sports event and record information regarding the position of the players in the field, passing the ball, coordinated moves, and so on. In this paper we propose a new method to analyze such team activity data. Our goal is to segment the overall activity stream into a sequence of potentially recurrent modes, which reflect different strategies adopted by a team, and thus, help to analyze and understand team tactics. We model team activity data as a temporal network, that is, a sequence of time-stamped edges that capture interactions between players. We then formulate the problem of identifying a small number of team modes and segmenting the overall timespan so that each segment can be mapped to one of the team modes; hence the set of modes summarizes the overall team activity. We prove that the resulting optimization problem is $$\mathrm {NP}$$ -hard, and we discuss its properties. We then present a number of different algorithms for solving the problem, including an approximation algorithm that is practical only for one mode, as well as heuristic methods based on iterative and greedy approaches. We benchmark the performance of our algorithms on real and synthetic datasets. Of all methods, the iterative algorithm provides the best combination of performance and running time. We demonstrate practical examples of the insights provided by our algorithms when mining real sports-activity data. In addition, we show the applicability of our algorithms on other types of data, such as social networks.

21 citations


Proceedings ArticleDOI
13 Aug 2017
TL;DR: In this paper, the authors study the problem of inferring the strength of social ties in a given network, motivated by a recent approach by Sintos et al. which leverages the Strong Triadic Closure} STC principle, a hypothesis rooted in social psychology.
Abstract: Online social networks are growing and becoming denser.The social connections of a given person may have very high variability: from close friends and relatives to acquaintances to people who hardly know. Inferring the strength of social ties is an important ingredient for modeling the interaction of users in a network and understanding their behavior. Furthermore, the problem has applications in computational social science, viral marketing, and people recommendation. In this paper we study the problem of inferring the strength of social ties in a given network. Our work is motivated by a recent approach by Sintos et. al [24], which leverages the Strong Triadic Closure} STC principle, a hypothesis rooted in social psychology. To guide our inference process, in addition to the network structure, we also consider as input a collection of tight communities. Those are sets of vertices that we expect to be connected via strong ties. Such communities appear in different situations, e.g., when being part of a community implies a strong connection to one of the existing members. We consider two related problem formalizations that reflect the assumptions of our setting: small number of STC violations and strong-tie connectivity in the input communities. We show that both problem formulations are NP-hard. We also show that one problem formulation is hard to approximate, while for the second we develop an algorithm with approximation guarantee. We validate the proposed method on real-world datasets by comparing with baselines that optimize STC violations and community connectivity separately.

20 citations


Proceedings Article
01 Sep 2017
TL;DR: In this paper, the authors address the problem of balancing the information exposure in a social network, where two opposing campaigns (or viewpoints) are present in the network, and nodes have different preferences towards these campaigns.
Abstract: Social media has brought a revolution on how people are consuming news. Beyond the undoubtedly large number of advantages brought by social-media platforms, a point of criticism has been the creation of echo chambers and filter bubbles, caused by social homophily and algorithmic personalization. In this paper we address the problem of balancing the information exposure} in a social network. We assume that two opposing campaigns (or viewpoints) are present in the network, and that network nodes have different preferences towards these campaigns. Our goal is to find two sets of nodes to employ in the respective campaigns, so that the overall information exposure for the two campaigns is balanced. We formally define the problem, characterize its hardness, develop approximation algorithms, and present experimental evaluation results. Our model is inspired by the literature on influence maximization, but we offer significant novelties. First, balance of information exposure is modeled by a symmetric difference function, which is neither monotone nor submodular, and thus, not amenable to existing approaches. Second, while previous papers consider a setting with selfish agents and provide bounds on best response strategies (i.e., move of the last player), we consider a setting with a centralized agent and provide bounds for a global objective function.

19 citations


Proceedings ArticleDOI
06 Nov 2017
TL;DR: This paper introduces two novel relative-query strategies, TopMatchings and GibbsMatchings, which can be applied on top of any network alignment method that constructs and solves a bipartite matching problem.
Abstract: Network alignment is the problem of matching the nodes of two graphs, maximizing the similarity of the matched nodes and the edges between them. This problem is encountered in a wide array of applications---from biological networks to social networks to ontologies---where multiple networked data sources need to be integrated. Due to the difficulty of the task, an accurate alignment can rarely be found without human assistance. Thus, it is of great practical importance to develop network alignment algorithms that can optimally leverage experts who are able to provide the correct alignment for a small number of nodes. Yet, only a handful of existing works address this active network alignment setting. The majority of the existing active methods focus on absolute queries ("are nodes a and b the same or not?"), whereas we argue that it is generally easier for a human expert to answer relative queries ("which node in the set b1,...,bn is the most similar to node a?"). This paper introduces two novel relative-query strategies, TopMatchings and GibbsMatchings, which can be applied on top of any network alignment method that constructs and solves a bipartite matching problem. Our methods identify the most informative nodes to query by sampling the matchings of the bipartite graph associated to the network-alignment instance. We compare the proposed approaches to several commonly-used query strategies and perform experiments on both synthetic and real-world datasets. Our sampling-based strategies yield the highest overall performance, outperforming all the baseline methods by more than 15 percentage points in some cases. In terms of accuracy, TopMatchings and GibbsMatchings perform comparably. However, GibbsMatchings is significantly more scalable, but it also requires hyperparameter tuning for a temperature parameter.

19 citations


Proceedings ArticleDOI
25 Jun 2017
TL;DR: The proposed Geo-Group-Recommender (GGR), a class of hybrid recommender systems that combine the group geographical preferences using Kernel Density Estimation, category and location features and group check-ins outperform a large number of otherRecommender systems, is considered.
Abstract: Location-Based Social Networks (LBSNs) enable their users to share with their friends the places they go to and whom they go with. Additionally, they provide users with recommendations for Points of Interest (POI) they have not visited before. This functionality is of great importance for users of LBSNs, as it allows them to discover interesting places in populous cities that are not easy to explore. For this reason, previous research has focused on providing recommendations to LBSN users. Nevertheless, while most existing work focuses on recommendations for individual users, techniques to provide recommendations to groups of users are scarce.In this paper, we consider the problem of recommending a list of POIs to a group of users in the areas that the group frequents. Our data consist of activity on Swarm, a social networking app by Foursquare, and our results demonstrate that our proposed Geo-Group-Recommender (GGR), a class of hybrid recommender systems that combine the group geographical preferences using Kernel Density Estimation, category and location features and group check-ins outperform a large number of other recommender systems. Moreover, we find evidence that user preferences differ both in venue category and in location between individual and group activities. We also show that combining individual recommendations using group aggregation strategies is not as good as building a profile for a group. Our experiments show that (GGR) outperforms the baselines in terms of precision and recall at different cutoffs.

Proceedings Article
01 Jan 2017
TL;DR: In this paper, the authors proposed a method to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction, which allows for a language-independent and fine-grained and efficient to compute analysis of user discussions and their evolution over time.
Abstract: Among the topics discussed in Social Media, some lead to controversy. A number of recent studies have focused on the problem of identifying controversy in social media mostly based on the analysis of textual content or rely on global network structure. Such approaches have strong limitations due to the difficulty of understanding natural language, and of investigating the global network structure. In this work we show that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction. The proposed approach allows for a language-independent and fine- grained and efficient-to-compute analysis of user discussions and their evolution over time. The supervised model exploiting motif patterns can achieve 85% accuracy, with an improvement of 7% compared to baseline structural, propagation-based and temporal network features.

Journal ArticleDOI
TL;DR: It is proved that the basic problem of discrepancy maximization on graphs is $\mathbf {NP}$ -hard, and empirically evaluate the performance of four heuristics for solving it.
Abstract: We study the problem of discrepancy maximization on graphs: given a set of nodes $Q$ of an underlying graph $G$ , we aim to identify a connected subgraph of $G$ that contains many more nodes from $Q$ than other nodes. This variant of the discrepancy-maximization problem extends the well-known notion of “bump hunting” in the Euclidean space [1] . We consider the problem under two access models. In the unrestricted-access model, the whole graph $G$ is given as input, while in the local-access model we can only retrieve the neighbors of a given node in $G$ using a possibly slow and costly interface. We prove that the basic problem of discrepancy maximization on graphs is $\mathbf {NP}$ -hard, and empirically evaluate the performance of four heuristics for solving it. For the local-access model, we consider three different algorithms that aim to recover a part of $G$ large enough to contain an optimal solution, while using only a small number of calls to the neighbor-function interface. We perform a thorough experimental evaluation in order to understand the trade offs between the proposed methods and their dependencies on characteristics of the input graph.

Posted Content
TL;DR: It is shown that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction, which allows for a language-independent and fine- grained and efficient-to-compute analysis of user discussions and their evolution over time.
Abstract: Among the topics discussed in Social Media, some lead to controversy. A number of recent studies have focused on the problem of identifying controversy in social media mostly based on the analysis of textual content or rely on global network structure. Such approaches have strong limitations due to the difficulty of understanding natural language, and of investigating the global network structure. In this work we show that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction. The proposed approach allows for a language-independent and fine- grained and efficient-to-compute analysis of user discussions and their evolution over time. The supervised model exploiting motif patterns can achieve 85% accuracy, with an improvement of 7% compared to baseline structural, propagation-based and temporal network features.

Proceedings ArticleDOI
03 Apr 2017
TL;DR: The demo provides one of the first steps in developing automated tools that help users explore, and possibly escape, their echo chambers by exposing users to information which presents a contrarian point of view.
Abstract: Polarized topics often spark discussion and debate on social media. Recent studies have shown that polarized debates have a specific clustered structure in the endorsement net- work, which indicates that users direct their endorsements mostly to ideas they already agree with. Understanding these polarized discussions and exposing social media users to con- tent that broadens their views is of paramount importance. The contribution of this demonstration is two-fold. (i) A tool to visualize retweet networks about controversial issues on Twitter. By using our visualization, users can understand how polarized discussions are shaped on Twitter, and explore the positions of the various actors. (ii) A solution to reduce polarization of such discussions. We do so by exposing users to information which presents a contrarian point of view. Users can visually inspect our recommendations and understand why and how these would play out in terms of the retweet network. Our demo (https://users.ics.aalto.fi/kiran/reducingControversy/ homepage) provides one of the first steps in developing automated tools that help users explore, and possibly escape, their echo chambers. The ideas in the demo can also help content providers design tools to broaden their reach to people with different political and ideological backgrounds.

Journal ArticleDOI
TL;DR: The approach, called Flan, introduces the idea of generalizing the facility location problem by adding a non-linear term to capture edge similarities and to infer the underlying entity network in order to solve the multiple network alignment problem.
Abstract: We propose a principled approach for the problem of aligning multiple partially overlapping networks. The objective is to map multiple graphs into a single graph while preserving vertex and edge similarities. The problem is inspired by the task of integrating partial views of a family tree (genealogical network) into one unified network, but it also has applications, for example, in social and biological networks. Our approach, called Flan, introduces the idea of generalizing the facility location problem by adding a non-linear term to capture edge similarities and to infer the underlying entity network. The problem is solved using an alternating optimization procedure with a Lagrangian relaxation. Flan has the advantage of being able to leverage prior information on the number of entities, so that when this information is available, Flan is shown to work robustly without the need to use any ground truth data for fine-tuning method parameters. Additionally, we present three multiple-network extensions to an existing state-of-the-art pairwise alignment method called Natalie. Extensive experiments on synthetic, as well as real-world datasets on social networks and genealogical networks, attest to the effectiveness of the proposed approaches which clearly outperform a popular multiple network alignment method called IsoRankN.

Posted Content
TL;DR: The demo provides one of the first steps in developing automated tools that help users explore, and possibly escape, their echo chambers and expose users to information which presents a contrarian point of view.
Abstract: Polarized topics often spark discussion and debate on social media. Recent studies have shown that polarized debates have a specific clustered structure in the endorsement net- work, which indicates that users direct their endorsements mostly to ideas they already agree with. Understanding these polarized discussions and exposing social media users to content that broadens their views is of paramount importance. The contribution of this demonstration is two-fold. (i) A tool to visualize retweet networks about controversial issues on Twitter. By using our visualization, users can understand how polarized discussions are shaped on Twitter, and explore the positions of the various actors. (ii) A solution to reduce polarization of such discussions. We do so by exposing users to information which presents a contrarian point of view. Users can visually inspect our recommendations and understand why and how these would play out in terms of the retweet network. Our demo (this https URL homepage) provides one of the first steps in developing automated tools that help users explore, and possibly escape, their echo chambers. The ideas in the demo can also help content providers design tools to broaden their reach to people with different political and ideological backgrounds.

Proceedings ArticleDOI
31 Jul 2017
TL;DR: An expert-finding algorithm for Twitter is introduced, which can be generalized to find topical experts in any social network with endorsement features, and which significantly improves on query-dependent PageRank, outperforms the current publicly-known state-of-the-art methods, and is competitive with Twitter's own search system.
Abstract: Finding topical experts on micro-blogging sites, such as Twitter, is an essential information-seeking task. In this paper, we introduce an expert-finding algorithm for Twitter, which can be generalized to find topical experts in any social network with endorsement features. Our approach combines traditional link analysis with text mining. It relies on crowd-sourced data from Twitter lists to build a labeled directed graph called the endorsement graph, which captures topical expertise as perceived by users. Given a text query, our algorithm uses a dynamic topic-sensitive weighting scheme, which sets the weights on the edges of the graph. Then, it uses an improved version of query-dependent PageRank to find important nodes in the graph, which correspond to topical experts. In addition, we address the scalability and performance issues posed by large social networks by pruning the input graph via a focused-crawling algorithm. Extensive evaluation on a number of different topics demonstrates that the proposed approach significantly improves on query-dependent PageRank, outperforms the current publicly-known state-of-the-art methods, and is competitive with Twitter's own search system, while using less than 0.05% of all Twitter accounts.

Proceedings ArticleDOI
25 Jun 2017
TL;DR: In this paper, the authors proposed an algorithmic solution to the problem of reducing polarization by exposing users to content that challenges their point of view, with the hope broadening their perspective, and thus reducing their polarity.
Abstract: Polarization is a troubling phenomenon that can lead to societal divisions and hurt the democratic process. It is therefore important to develop methods to reduce it.We propose an algorithmic solution to the problem of reducing polarization. The core idea is to expose users to content that challenges their point of view, with the hope broadening their perspective, and thus reduce their polarity. Our method takes into account several aspects of the problem, such as the estimated polarity of the user, the probability of accepting the recommendation, the polarity of the content, and popularity of the content being recommended.We evaluate our recommendations via a large-scale user study on Twitter users that were actively involved in the discussion of the US elections results. Results shows that, in most cases, the factors taken into account in the recommendation affect the users as expected, and thus capture the essential features of the problem.

Book ChapterDOI
18 Sep 2017
TL;DR: This problem of determining when entities are active based on their interactions with each other, referred to as the network-untangling problem, can be applied to discover timelines of events from complex interactions among entities.
Abstract: In this paper we study a problem of determining when entities are active based on their interactions with each other. More formally, we consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge \((u,v,t)\in E\) denotes an interaction between entities u and v that takes place at time t. We view this input as a temporal network. We then assume a simple activity model in which each entity is active during a short time interval. An interaction (u, v, t) can be explained if at least one of u or v are active at time t. Our goal is to reconstruct the activity intervals, for all entities in the network, so as to explain the observed interactions. This problem, which we refer to as the network-untangling problem, can be applied to discover timelines of events from complex interactions among entities.

Book ChapterDOI
29 Nov 2017
TL;DR: How other graph-related problems, such as prediction, learning, and summarization, can be solved by applying out-of-the-box algorithms devised for event-interval sequences is described.
Abstract: Given a social network with dynamic interactions, how can we discover frequent interactions between groups of entities? What are the temporal patterns exhibited by these interactions? Which entities interact frequently with each other before, during, or after others have stopped or started? Such dynamic-network datasets are becoming prevailing, as modern data-gathering capabilities allow to record not only a static view of the network structure, but also detailed activity of the network entities and interactions along the network edges. Analysis of dynamic networks has applications in telecommunication networks, social network analysis, computational biology, and more. We study the problem of mining interactions in dynamic graphs. We assume that these interactions are not instantaneous, but more naturally, each interaction has a duration. We solve the problem of mining dynamic graphs by establishing a novel connection with the problem of mining event-interval sequences, and adapting methods from the latter domain. We apply the proposed methods to a real-world social network and to dynamic graphs from the field of sports. In addition, having established the aforementioned equivalence between the two pattern-mining settings, we proceed to describe how other graph-related problems, such as prediction, learning, and summarization, can be solved by applying out-of-the-box algorithms devised for event-interval sequences. In light of these results, we conjecture that there may be further connections between the two research domains, and the two communities should work closer to share goals and methodology.

Proceedings Article
01 Jan 2017
TL;DR: It is found that increased activity is typically associated with increased polarization; however, there is no consistent long-term trend in polarization over time among the topics the authors study.
Abstract: We explore how the polarization around controversial topics evolves on Twitter - over a long period of time (2011 to 2016), and also as a response to major external events that lead to increased related activity. We find that increased activity is typically associated with increased polarization; however, we find no consistent long-term trend in polarization over time among the topics we study.

Posted Content
TL;DR: A machine-learning approach is used to learn a liberal-conservative ideology space on Twitter, and it is shown how the learned latent space can be used to develop exploratory and interactive interfaces that can help users in diffusing their information filter bubble.
Abstract: People are shifting from traditional news sources to online news at an incredibly fast rate. However, the technology behind online news consumption promotes content that confirms the users' existing point of view. This phenomenon has led to polarization of opinions and intolerance towards opposing views. Thus, a key problem is to model information filter bubbles on social media and design methods to eliminate them. In this paper, we use a machine-learning approach to learn a liberal-conservative ideology space on Twitter, and show how we can use the learned latent space to tackle the filter bubble problem. We model the problem of learning the liberal-conservative ideology space of social media users and media sources as a constrained non-negative matrix-factorization problem. Our model incorporates the social-network structure and content-consumption information in a joint factorization problem with shared latent factors. We validate our model and solution on a real-world Twitter dataset consisting of controversial topics, and show that we are able to separate users by ideology with over 90% purity. When applied to media sources, our approach estimates ideology scores that are highly correlated (Pearson correlation 0.9) with ground-truth ideology scores. Finally, we demonstrate the utility of our model in real-world scenarios, by illustrating how the learned ideology latent space can be used to develop exploratory and interactive interfaces that can help users in diffusing their information filter bubble.

Proceedings ArticleDOI
03 Apr 2017
TL;DR: A record-linkage method for computing the probabilities of the candidate matches allows the users to either directly identify the next ancestor or narrow down the search in AncestryAI, an open-source tool for automatically linking historical records and exploring the resulting family trees.
Abstract: Many people are excited to discover their ancestors and thus decide to take up genealogy. However, the process of finding the ancestors is often very laborious since it involves comparing a large number of historical birth records and trying to manually match the people mentioned in them. We have developed AncestryAI, an open-source tool for automatically linking historical records and exploring the resulting family trees. We introduce a record-linkage method for computing the probabilities of the candidate matches, which allows the users to either directly identify the next ancestor or narrow down the search. We also propose an efficient layout algorithm for drawing and navigating genealogical graphs. The tool is additionally used to crowdsource training and evaluation data so as to improve the matching algorithm. Our objective is to build a large genealogical graph, which could be used to resolve various interesting questions in the areas of computational social science, genetics, and evolutionary studies. The tool is openly available at: http://emalmi.kapsi.fi/ancestryai/.

Book ChapterDOI
11 Jun 2017
TL;DR: In this article, two variants of the community-aware sparsification problem are introduced, leading to sparsifiers that satisfy different connectedness community properties, and they prove hardness results and devise effective approximation algorithms.
Abstract: Network sparsification aims to reduce the number of edges of a network while maintaining its structural properties; such properties include shortest paths, cuts, spectral measures, or network modularity. Sparsification has multiple applications, such as, speeding up graph-mining algorithms, graph visualization, as well as identifying the important network edges. In this paper we consider a novel formulation of the network-sparsification problem. In addition to the network, we also consider as input a set of communities. The goal is to sparsify the network so as to preserve the network structure with respect to the given communities. We introduce two variants of the community-aware sparsification problem, leading to sparsifiers that satisfy different connectedness community properties. From the technical point of view, we prove hardness results and devise effective approximation algorithms. Our experimental results on a large collection of datasets demonstrate the effectiveness of our algorithms.

Posted Content
TL;DR: This article studied the evolution of long-lived controversial debates as manifested on Twitter from 2011 to 2016 and explored how the structure of interactions and content of discussion varies with the level of collective attention, as evidenced by the number of users discussing a topic.
Abstract: We study the evolution of long-lived controversial debates as manifested on Twitter from 2011 to 2016. Specifically, we explore how the structure of interactions and content of discussion varies with the level of collective attention, as evidenced by the number of users discussing a topic. Spikes in the volume of users typically correspond to external events that increase the public attention on the topic -- as, for instance, discussions about `gun control' often erupt after a mass shooting. This work is the first to study the dynamic evolution of polarized online debates at such scale. By employing a wide array of network and content analysis measures, we find consistent evidence that increased collective attention is associated with increased network polarization and network concentration within each side of the debate; and overall more uniform lexicon usage across all users.

Posted Content
TL;DR: In this article, the authors propose a model where each individual starts with a latent profile and arrives to a conformed profile through a dynamic conformation process, which takes into account the individual's social interaction and the tendency to conform with one's social environment.
Abstract: Motivated by applications that arise in online social media and collaboration networks, there has been a lot of work on community-search and team-formation problems. In the former class of problems, the goal is to find a subgraph that satisfies a certain connectivity requirement and contains a given collection of seed nodes. In the latter class of problems, on the other hand, the goal is to find individuals who collectively have the skills required for a task and form a connected subgraph with certain properties. In this paper, we extend both the community-search and the team-formation problems by associating each individual with a profile. The profile is a numeric score that quantifies the position of an individual with respect to a topic. We adopt a model where each individual starts with a latent profile and arrives to a conformed profile through a dynamic conformation process, which takes into account the individual's social interaction and the tendency to conform with one's social environment. In this framework, social tension arises from the differences between the conformed profiles of neighboring individuals as well as from differences between individuals' conformed and latent profiles. Given a network of individuals, their latent profiles and this conformation process, we extend the community-search and the team-formation problems by requiring the output subgraphs to have low social tension. From the technical point of view, we study the complexity of these problems and propose algorithms for solving them effectively. Our experimental evaluation in a number of social networks reveals the efficacy and efficiency of our methods.

Book ChapterDOI
27 Apr 2017
TL;DR: A model where each individual starts with a latent profile and arrives to a conformed profile through a dynamic conformation process, which takes into account the individual's social interaction and the tendency to conform with one's social environment is adopted.
Abstract: Motivated by applications that arise in online social media and collaboration networks, there has been a lot of work on community-search. In this class of problems, the goal is to find a subgraph that satisfies a certain connectivity requirement and contains a given collection of seed nodes. In this paper, we extend the community-search problem by associating each individual with a profile. The profile is a numeric score that quantifies the position of an individual with respect to a topic. We adopt a model where each individual starts with a latent profile and arrives to a conformed profile through a dynamic conformation process, which takes into account the individual's social interaction and the tendency to conform with one's social environment. In this framework, social tension arises from the differences between the conformed profiles of neighboring individuals as well as from the differences between individuals' conformed and latent profiles. Given a network of individuals, their latent profiles and this conformation process, we extend the community-search problem by requiring the output subgraphs to have low social tension. From the technical point of view, we study the complexity of this problem and propose algorithms for solving it effectively. Our experimental evaluation in a number of social networks reveals the efficacy and efficiency of our methods.

Posted Content
TL;DR: This paper explore how the polarization around controversial topics evolves on Twitter over a long period of time (2011 to 2016), and also as a response to major external events that lead to increased related activity.
Abstract: We explore how the polarization around controversial topics evolves on Twitter - over a long period of time (2011 to 2016), and also as a response to major external events that lead to increased related activity. We find that increased activity is typically associated with increased polarization; however, we find no consistent long-term trend in polarization over time among the topics we study.

Posted Content
TL;DR: This paper introduces two variants of the community-aware sparsification problem, leading to sparsifiers that satisfy different connectedness community properties and proves hardness results and devise effective approximation algorithms.
Abstract: Network sparsification aims to reduce the number of edges of a network while maintaining its structural properties; such properties include shortest paths, cuts, spectral measures, or network modularity. Sparsification has multiple applications, such as, speeding up graph-mining algorithms, graph visualization, as well as identifying the important network edges. In this paper we consider a novel formulation of the network-sparsification problem. In addition to the network, we also consider as input a set of communities. The goal is to sparsify the network so as to preserve the network structure with respect to the given communities. We introduce two variants of the community-aware sparsification problem, leading to sparsifiers that satisfy different connectedness community properties. From the technical point of view, we prove hardness results and devise effective approximation algorithms. Our experimental results on a large collection of datasets demonstrate the effectiveness of our algorithms.