scispace - formally typeset
Search or ask a question

Showing papers in "Social Network Analysis and Mining in 2016"


Journal ArticleDOI
TL;DR: This work proposes a novel technique that maintains the community structure always up-to-date following the addition or removal of nodes and edges, and performs a local modularity optimization that maximizes the modularity gain function only for those communities where the editing of node and edges was performed, keeping the rest of the network unchanged.
Abstract: The amount and the variety of data generated by today’s online social and telecommunication network services are changing the way researchers analyze social networks. Facing fast evolving networks with millions of nodes and edges are, among other factors, its main challenge. Community detection algorithms in these conditions have also to be updated or improved. Previous state-of-the-art algorithms based on the modularity optimization (i.e. Louvain algorithm), provide fast, efficient and robust community detection on large static networks. Nonetheless, due to the high computing complexity of these algorithms, the use of batch techniques in dynamic networks requires to perform network community detection for the whole network in each one of the evolution steps. This fact reveals to be computationally expensive and unstable in terms of tracking of communities. Our contribution is a novel technique that maintains the community structure always up-to-date following the addition or removal of nodes and edges. The proposed algorithm performs a local modularity optimization that maximizes the modularity gain function only for those communities where the editing of nodes and edges was performed, keeping the rest of the network unchanged. The effectiveness of our algorithm is demonstrated with the comparison to other state-of-the-art community detection algorithms with respect to Newman’s Modularity, Modularity with Split Penalty, Modularity Density, number of detected communities and running time.

64 citations


Journal ArticleDOI
TL;DR: It is argued that researchers interested in tweet sentiment prevalence should switch to quantification-specific learning algorithms and evaluation measures, which produce substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC.
Abstract: Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.

63 citations


Journal ArticleDOI
TL;DR: A survey is done for event detection techniques in OSN based on social text streams—newswire, web forums, emails, blogs and microblogs, for natural disasters, trending or emerging topics and public opinion-based events.
Abstract: The online social networks (OSNs) have become an important platform for detecting real-world event in recent years. These real-world events are detected by analyzing huge social-stream data available on different OSN platforms. Event detection has become significant because it contains substantial information which describes different scenarios during events or crisis. This information further helps to enable contextual decision making, regarding the event location, content and the temporal specifications. Several studies exist, which offers plethora of frameworks and tools for detecting and analyzing events used for applications like crisis management, monitoring and predicting events in different OSN platforms. In this paper, a survey is done for event detection techniques in OSN based on social text streams—newswire, web forums, emails, blogs and microblogs, for natural disasters, trending or emerging topics and public opinion-based events. The work done and the open problems are explicitly mentioned for each social stream. Further, this paper elucidates the list of event detection tools available for the researchers.

53 citations


Journal ArticleDOI
TL;DR: It is suggested that the in-degree of the tweet that initiates a reply tree may play an important role in forming the global shape of the reply tree.
Abstract: Structure of networks constructed from mentioning relationships between posts in online media may be valuable for understanding how information and opinions spread in these media We crawled Twitter to collect tweets and replies to construct a large number of so-called reply trees, each of which was rooted at a tweet and joined by replies Consistent with the previous literature, we found that the empirical trees were characterized by some long path-like reply trees, large star-like trees, and long irregular trees, although their frequencies were not high We tested several branching process models to explain the empirical frequency of these types of reply trees as well as more basic quantities such as the distributions of the size and depth of the reply tree Based on our modeling results, we suggest that the in-degree of the tweet that initiates a reply tree (ie, the number of times that the tweet is directly mentioned by other reply posts) may play an important role in forming the global shape of the reply tree

50 citations


Journal ArticleDOI
TL;DR: This manuscript proposes user activity features, quality of answer features, linguistic features and temporal features to identify distinguishing patterns between experts and non-experts, and develops statistical models based on the features to automatically detect experts.
Abstract: Quora is a fast growing social QA (2) propose user activity features, quality of answer features, linguistic features and temporal features to identify distinguishing patterns between experts and non-experts; and (3) develop statistical models based on the features to automatically detect experts. Our experimental results show that our classifiers effectively identify experts in general topics and a specific topic, achieving up to 97 % accuracy and 0.987 AUC.

44 citations


Journal ArticleDOI
TL;DR: This paper enhances recommendation algorithms used in social networks by taking into account qualitative aspects of the recommended items, such as price and reliability, the influencing factors between social network users, the social network user behavior regarding their purchases in different item categories and the semantic categorization of the products to be recommended.
Abstract: One of the major problems in the domain of social networks is the handling and diffusion of the vast, dynamic and disparate information created by its users. In this context, the information contributed by users can be exploited to generate recommendations for other users. Relevant recommender systems take into account static data from users’ profiles, such as location, age or gender, complemented with dynamic aspects stemming from the user behavior and/or social network state such as user preferences, items’ general acceptance and influence from social friends. In this paper, we enhance recommendation algorithms used in social networks by taking into account qualitative aspects of the recommended items, such as price and reliability, the influencing factors between social network users, the social network user behavior regarding their purchases in different item categories and the semantic categorization of the products to be recommended. The inclusion of these aspects leads to more accurate recommendations and diffusion of better user-targeted information. This allows for better exploitation of the limited recommendation space, and therefore, online advertisement efficiency is raised.

42 citations


Journal ArticleDOI
TL;DR: It is found that hashtags related to Nigerian sociopolitical issues, including the #bringbackourgirls hashtag, are more likely to be adopted among densely connected users with multiple network neighbors who have also adopted the hashtag, compared to mainstream news hashtags.
Abstract: Social media sites such as Facebook and Twitter provide highly granular time-stamped data about the interactions and communications between people and provide us unprecedented opportunities for empirically testing theory about information flow in social networks. Using publicly available data from Twitter’s free API (Application Program Interface), we track the adoption of popular hashtags in Nigeria during 2014. These hashtags reference online marketing campaigns, major news stories, and events and issues specific to Nigeria, including reactions to the kidnapping of 276 schoolgirls in Northeastern Nigeria by the Islamic extremist group Boko Haram. We find that hashtags related to Nigerian sociopolitical issues, including the #bringbackourgirls hashtag, which was associated with protests against the Nigerian government’s response to the kidnapping, are more likely to be adopted among densely connected users with multiple network neighbors who have also adopted the hashtag, compared to mainstream news hashtags. This association between adoption threshold and local network structure is consistent with theory about the spread of complex contagions, a type of social contagion which requires social reinforcement from multiple adopting neighbors. Theory also predicts the need for a critical mass of adopters before the contagion can become viral. We illustrate this with the #bringbackourgirls hashtag by identifying the point at which the local social movement transforms into a more widespread phenomenon. We also show that these results are robust across both the follow and reply/mention/retweet networks on Twitter. Our analysis involves data mining records of hashtag adoption and of the social connections between adopters.

41 citations


Journal ArticleDOI
TL;DR: It is found that spammers contributing to pornographic content follow legitimate Twitter users and send URLs that link users to pornographic sites, in what is the first attempt to analyze and categorize the behavior of pornographic users in Twitter as spammers.
Abstract: Social spam is a huge and complicated problem plaguing social networking sites in several ways. This includes posts, reviews or blogs containing product promotions and contests, adult content and general spam. It has been found that social media websites such as Twitter is also acting as a distributor of pornographic content, although it is considered against their own stated policy. In this paper, we have reviewed the case of Twitter and found that spammers contributing to pornographic content follow legitimate Twitter users and send URLs that link users to pornographic sites. Behavioral analysis of such type of spammers has been conducted using graph-based as well as content-based information fetched using simple text operators to study their characteristics. In the present study, about 74,000 tweets containing pornographic adult content posted by around 18,000 users have been collected and analyzed. The analysis shows that the users posting pornographic content fulfill the characteristics of spammers as stated by the rules and guidelines of Twitter. It has been observed that the illegitimate use of social media for spreading social spam has been spreading at a fast pace, with the network companies turning a blind eye toward this growing problem. Clearly, there is an immense requirement to build an effective solution to remove objectionable and slanderous content as stated above from social networking websites to promote and protect public decency and the welfare of children and adults. It is also essential so as to enhance public experience of genuine users using social media and protect them from harm to their public identity on the World Wide Web. Further in this paper, classification of pornographic spammers and genuine users has also been performed using machine learning technique. Experimental results show that Random Forest classifier is able to predict pornographic spammers with a reasonably high accuracy of 91.96 %. To the best of our knowledge, this is the first attempt to analyze and categorize the behavior of pornographic users in Twitter as spammers. So far, the work has been done for identifying spammers but they are not specifically targeting pornographic spammers.

40 citations


Journal ArticleDOI
TL;DR: In this article, a wide range of content-based features for predicting online influence of Twitter users is presented. But the authors show that most of these features are not relevant to the offline influence detection problem.
Abstract: Many works related to Twitter aim at characterizing its users in some way: role on the service (spammers, bots, organizations, etc.), nature of the user (socio-professional category, age, etc.), topics of interest, and others. However, for a given user classification problem, it is very difficult to select a set of appropriate features, because the many features described in the literature are very heterogeneous, with name overlaps and collisions, and numerous very close variants. In this article, we review a wide range of such features. In order to present a clear state-of-the-art description, we unify their names, definitions and relationships, and we propose a new, neutral, typology. We then illustrate the interest of our review by applying a selection of these features to the offline influence detection problem. This task consists in identifying users who are influential in real life, based on their Twitter account and related data. We show that most features deemed efficient to predict online influence, such as the numbers of retweets and followers, are not relevant to this problem. However, we propose several content-based approaches to label Twitter users as influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art methods.

35 citations


Journal ArticleDOI
TL;DR: The taxonomy of game models and their characteristics along with their performance are provided and the interesting applications of game theory for social networks are discussed and further research directions are provided as well as some open challenges.
Abstract: Community detection in social networks has received much attention from the researchers of multiple disciplines due to its impactful applications such as recommendation systems, link prediction, and anomaly detection. The focus of community detection is to determine the more dense subgraphs of the network which are called communities. The nodes of the community are expected to have similar features and interests. Assuming the nodes as selfish agents, the evolution of communities can be effectively modelled as a community formation game. Game theory provides a systematic framework to model the competition and coordination among the players. In the past decade, there are several contributions from the domain of game theory to address the problem of community detection in social networks. In this paper, we make a comprehensive survey that studies and provides an insight into available game theory-based community detection algorithms. The current study provides the taxonomy of game models and their characteristics along with their performance. We discuss the interesting applications of game theory for social networks and also provide further research directions as well as some open challenges.

34 citations


Journal ArticleDOI
TL;DR: This work has been extended by moving to a Machine Learning Approach which treats the prediction process as a classification problem, and shows that using both classical and ensemble classifiers outperforms baseline algorithms when applied individually.
Abstract: The growth of social networks has lately attracted both academic and industrial researchers to study the ties between people, and how the social networks evolve with time. Social networks like Facebook, Twitter and Flickr require efficient and accurate methods to recommend friends to their users in the network. Several algorithms have been developed to recommend friends or predict likelihood of future links. Two main approaches are used to utilize those features; Score-based Approaches and Machine Learning Approaches. In a previous work, a score-based method was used based on topological, node and social features to calculate similarity between users and determine the likelihood of forming future links. This work has been extended by moving to a Machine Learning Approach which treats the prediction process as a classification problem. The classifier predicts the class of each edge whether it exists or doesn’t exist. Machine Learning Approaches have the benefit of adding all similarity indices needed as the feature set fed to the classifier. While in Score-based Approach when we used multiple features with associated weights, the performance was sensitive to the values of such weights. When machine learning is applied, the learning process is performed by the classifier which is fed by eight similarity indices representing connectivity, community, interaction and trust in social network. When indices are combined, a much higher accuracy than the previous Score-based Approach is obtained and hence enhancing the prediction accuracy. In order to evaluate the correctness of the proposed model, it has been applied on a real dataset of 2.974k users on the Twitter social network. Experiments show that using both classical and ensemble classifiers outperforms baseline algorithms when applied individually.

Journal ArticleDOI
TL;DR: The Focal Structures Analysis (FSA) methodology is developed to extract key sets of individuals, called focal structures, in a social network, and goes beyond the traditional unit of analysis, which is an individual or a set of influential individuals, and places focal structures between the individuals and communities/clusters as the unit ofAnalysis.
Abstract: Identifying influential individuals is a well-known approach in extracting actionable knowledge in a network. Existing studies suggest measures to identify influential individuals, i.e., they focus on the question “which individuals are best connected to others or have the most influence?”. Such individuals, however, may not represent the context (relationships, interactions, etc.) entirely in a social network. For example, it is nearly an impossible task for a single individual to organize a mass protest of the scale of the Saudi Arabian women’s 2013 Oct26Driving campaign, the 2012 Occupy Wall Street and the 2011 Arab Spring. Similarly, other events such as mobilizing the 2013 Taksim square-Gezi Park protesters, coordinating crisis response for natural disasters (e.g., the 2010 Haiti earthquake), or even organizing flash mobs would require a key set of individuals rather than a single or the most influential individual in a social network. An alternate line of research dealing with community or cluster identification approaches extract subnetworks of individuals. However, these structures may not represent the key sets of individuals that could coordinate the social processes mentioned above. Therefore, we develop the Focal Structures Analysis (FSA) methodology to extract such key sets of individuals, called focal structures, in a social network. This research goes beyond the traditional unit of analysis, which is an individual or a set of influential individuals, and places focal structures between the individuals and communities/clusters as the unit of analysis. To the best of our knowledge, this type of work is the first effort in identifying influential sets of individuals and would open up new directions for researchers to develop new methods in social network analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors study the research studies that are helpful for user characterization as online users may not always reveal their true identity or attributes, focusing on user attribute determination such as gender and age, user behavior analysis such as motives for deception, mental models that are indicators of user behavior, user categorization such as bots versus humans, and entity matching on different social networks.
Abstract: Online social network analysis has attracted great attention with a vast number of users sharing information and availability of APIs that help to crawl online social network data. In this paper, we study the research studies that are helpful for user characterization as online users may not always reveal their true identity or attributes. We especially focused on user attribute determination such as gender and age; user behavior analysis such as motives for deception; mental models that are indicators of user behavior; user categorization such as bots versus humans; and entity matching on different social networks. We believe our summary of analysis of user characterization will provide important insights into researchers and better services to online users.

Journal ArticleDOI
TL;DR: There is significant evidence that suspended users exist on the periphery of social networks on Twitter and consequently that removing them has little impact on network structure, and prior attempts to distinguish among different types of suspended users are improved by using a much larger dataset.
Abstract: Social media is rapidly becoming a medium of choice for understanding the cultural pulse of a region; e.g. for identifying what the population is concerned with and what kind of help is needed in a crisis. To assess this cultural pulse, it is critical to have an accurate assessment of who is saying what. Unfortunately, social media is also the home of users who engage in disruptive, disingenuous, and potentially illegal activity. A range of users, both human and non-human, carry out such social cyber-attacks. We ask, to what extent does the presence or absence of such users influence our ability to assess the cultural pulse of a region? Our prior research on this topic showed that Twitter-based network structures and content are unstable and can be highly impacted by the removal of suspended users. Because of this, statistical techniques can be established to differentiate potential types of suspended and non-suspended users. In this extended paper, we develop additional experiments to explore the spatial patterns of suspended users, and we further consider how these users affect structural and content concentrations via the development of new metrics and new analyses. We find significant evidence that suspended users exist on the periphery of social networks on Twitter and consequently that removing them has little impact on network structure. We also improve prior attempts to distinguish among different types of suspended users by using a much larger dataset. Finally, we conduct a temporal sentiment analysis to illustrate differences between suspended users and non-suspended users on this dimension.

Journal ArticleDOI
TL;DR: In this article, the first systematic conceptual and experimental comparison of edge sparsification methods on a diverse set of network properties is presented, which can be understood as methods for rating edges by importance and then filtering globally or locally by these scores.
Abstract: Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of edge sparsification methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsification method (Local Degree) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. To assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20 % of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure.

Journal ArticleDOI
TL;DR: A mathematical model of news spreading from some posts displayed in an online social network using the epidemiological modeling technique is proposed and criteria of rumor detection and verification for the model are proposed.
Abstract: People of the modern world are using social network Web sites to communicate with others either known or unknown, for getting opinions of others and giving their opinions to others. The post, weblogs, effects or affects human mind, at least for some time. These posts take a part in choosing their decisions and play an important role. But the information present in the post is either information or just misinformation, i.e., just a rumor. People are confused to distinguish these posts in either a correct information or misinformation. It is important to decide whether this is information or just a rumor because it may cause a support of the wrong decision of the whole majority. In this paper, a mathematical framework is presented related to these matters. Firstly, we proposed a mathematical model of news spreading from some posts displayed in an online social network. The development of mathematical models of news propagation uses the epidemiological modeling technique. Then, we proposed criteria of rumor detection and verification for the model. In the case of rumor, a revised model is proposed with media awareness as a control strategy for reducing the rumor spreading.

Journal ArticleDOI
TL;DR: In this paper, the authors define the addition and multiplication of temporal quantities in a way that can be used for the definition of temporal networks and develop fast algorithms for the proposed operations.
Abstract: In a temporal network, the presence and activity of nodes and links can change through time. To describe temporal networks we introduce the notion of temporal quantities. We define the addition and multiplication of temporal quantities in a way that can be used for the definition of addition and multiplication of temporal networks. The corresponding algebraic structures are semirings. The usual approach to (data) analysis of temporal networks is to transform the network into a sequence of time slices—static networks corresponding to selected time intervals and analyze each of them using standard methods to produce a sequence of results. The approach proposed in this paper enables us to compute these results directly. We developed fast algorithms for the proposed operations. They are available as an open source Python library TQ (Temporal Quantities) and a program Ianus. The proposed approach enables us to treat as temporal quantities also other network characteristics such as degrees, connectivity components, centrality measures, Pathfinder skeleton, etc. To illustrate the developed tools we present some results from the analysis of Franzosi’s violence network and Corman’s Reuters terror news network.

Journal ArticleDOI
TL;DR: In this article, a dynamic game-theoretic community detection method, D-GT (Dynamic Game-Theoretic Community Detection), is proposed. But it does not address the problem of detecting communities in dynamic networks.
Abstract: Most real-world social networks are inherently dynamic, composed of communities that are constantly changing in membership. To track these evolving communities, we need dynamic community detection techniques. This article evaluates the performance of a set of game-theoretic approaches for identifying communities in dynamic networks. Our method, D-GT (Dynamic Game-Theoretic community detection), models each network node as a rational agent who periodically plays a community membership game with its neighbors. During game play, nodes seek to maximize their local utility by joining or leaving the communities of network neighbors. The community structure emerges after the game reaches a Nash equilibrium. Compared to the benchmark community detection methods, D-GT more accurately predicts the number of communities and finds community assignments with a higher normalized mutual information, while retaining a good modularity.

Journal ArticleDOI
TL;DR: This paper investigates the top Weibo accounts whose follower lists duplicate or nearly duplicate each other (hereafter called near-duplicates), and proposes a novel fake account detection method based on the very purpose of the existence of these accounts.
Abstract: Weibo is the Chinese counterpart of Twitter, which has attracted hundreds of millions of users. Just like other Online Social Networks (hereafter OSNs), Weibo has a large number of fake accounts. They are created to sell their following links to customers, who want to boost their follower counts. These bogus accounts are difficult to identify individually, especially when they are created by sophisticated programs or controlled by human beings directly. This paper proposes a novel fake account detection method that is based on the very purpose of the existence of these accounts: they are created to follow their targets en masse, resulting in high-overlapping between the follower lists of their customers. This paper investigates the top Weibo accounts whose follower lists duplicate or nearly duplicate each other (hereafter called near-duplicates). Discovering near-duplicates is a challenging task. The network is large; the data in its entirety are not available; the pair-wise comparison is very expensive. We developed a sampling-based approach to discover all the near-duplicates of the top accounts, who have at least 50,000 followers. In the experiment, we found 395 near-duplicates, which leads us to 11.90 million fake accounts (4.56 % of total users) who send 741.10 million links (9.50 % of the entire edges). Furthermore, we characterize four typical structures of the spammers, cluster these spammers into 34 groups, and analyze the properties of each group.

Journal ArticleDOI
TL;DR: The role that social media can play during the time of natural disasters, with the help of the recent case of Chennai floods in India is focused on.
Abstract: Social media has altered the way individuals communicate in present scenario. Individuals feel more connected on Facebook and Twitter with greater communication freedom to chat, share pictures, and videos. Hence, social media is widely employed by various companies to promote their product and services and establish better customer relationships. Owing to the increasing popularity of these social media platforms, their usage is also expanding significantly. Various studies have discussed the importance of social media in the corporate world for effective marketing communication, customer relationships, and firm performance, but no studies have focused on the social role of social media, i.e., in disaster resilience in India. Various academicians and practitioners have advocated the importance and use of social media in disaster resilience. This article focuses on the role that social media can play during the time of natural disasters, with the help of the recent case of Chennai floods in India. This study provides a better understanding about the role social media can play in natural disaster resilience in Indian context.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate the global board interlock network, covering 400,000 firms linked through 1,700,000 edges representing shared directors between these firms, and investigate the concept of centrality, which is used to investigate the embeddedness of firms from a particular country within the global network.
Abstract: Corporations across the world are highly interconnected in a large global network of corporate control. This paper investigates the global board interlock network, covering 400,000 firms linked through 1,700,000 edges representing shared directors between these firms. The main focus is on the concept of centrality, which is used to investigate the embeddedness of firms from a particular country within the global network. The study results in three contributions. First, to the best of our knowledge for the first time we can investigate the topology as well as the concept of centrality in corporate networks at a global scale, allowing for the largest cross-country comparison ever done in interlocking directorates literature. We demonstrate, among other things, extremely similar network topologies, yet large differences between countries when it comes to the relation between economic prominence indicators and firm centrality. Second, we introduce two new metrics that are specifically suitable for comparing the centrality ranking of a partition to that of the full network. Using the notion of centrality persistence we propose to measure the persistence of a partition’s centrality ranking in the full network. In the board interlock network, it allows us to assess the extent to which the footprint of a national network is still present within the global network. Next, the measure of centrality ranking dominance tells us whether a partition (country) is more dominant at the top or the bottom of the centrality ranking of the full (global) network. Finally, comparing these two new measures of persistence and dominance between different countries allows us to classify these countries based the their embeddedness, measured using the relation between the centrality of a country’s firms on the national and the global scale of the board interlock network.

Journal ArticleDOI
TL;DR: This research paper performs a thorough investigation of cyberbullying instances in Vine, a video-based online social network, and trains different classifiers based upon the labeled media sessions to detect instances of cyber Bullying.
Abstract: The last decade has experienced an exponential growth of popularity in online social networks. This growth in popularity has also paved the way for the threat of cyberbullying to grow to an extent that was never seen before. Online social network users are now constantly under the threat of cyberbullying from predators and stalkers. In our research paper, we perform a thorough investigation of cyberbullying instances in Vine, a video-based online social network. We collect a set of media sessions (shared videos with their associated meta-data) and then label those using CrowdFlower, a crowd-sourced website for cyberaggression and cyberbullying. We also perform a second survey that labels the videos’ contents and emotions exhibited. After the labeling of the media sessions, we provide a detailed analysis of the media sessions to investigate the cyberbullying and cyberaggression behavior in Vine. After the analysis, we train different classifiers based upon the labeled media sessions. We then investigate, evaluate and compare the classifers’ performances to detect instances of cyberbullying.

Journal ArticleDOI
TL;DR: It is found that social media and news are more informative than other data sources, including the political event databases, and enhance the prediction performance, however, social media increases the variation in the performance metrics.
Abstract: Civil unrest events (protests, strikes, and "occupy" events) range from small, nonviolent protests that address specific issues to events that turn into large-scale riots. Detecting and forecasting these events is of key interest to social scientists and policy makers because they can lead to significant societal and cultural changes. We forecast civil unrest events in six countries in Latin America on a daily basis, from November 2012 through August 2014, using multiple data sources that capture social, political and economic contexts within which civil unrest occurs. The models contain predictors extracted from social media sites (Twitter and blogs) and news sources, in addition to volume of requests to Tor, a widely used anonymity network. Two political event databases and country-specific exchange rates are also used. Our forecasting models are evaluated using a Gold Standard Report (GSR), which is compiled by an independent group of social scientists and subject matter experts. We use logistic regression models with Lasso to select a sparse feature set from our diverse datasets. The experimental results, measured by F1-scores, are in the range 0.68 to 0.95, and demonstrate the efficacy of using a multi-source approach for predicting civil unrest. Case studies illustrate the insights into unrest events that are obtained with our method. The ablation study demonstrates the relative value of data sources for prediction. We find that social media and news are more informative than other data sources, including the political event databases, and enhance the prediction performance. However, social media increases the variation in the performance metrics.

Journal ArticleDOI
TL;DR: This work proposes a framework for geolocating tweets that are not geotagged and aims at providing accurate geolocation estimates at fine grain (i.e., within a city) by exploiting the similarities in the content between this post and a set of geot tagged tweets.
Abstract: The rise in the use of social networks in the recent years has resulted in an abundance of information on different aspects of everyday social activities that is available online, with the most prominent and timely source of such information being Twitter. This has resulted in a proliferation of tools and applications that can help end users and large-scale event organizers to better plan and manage their activities. In this process of analysis of the information originating from social networks, an important aspect is that of the geographic coordinates, i.e., geolocalization, of the relevant information, which is necessary for several applications (e.g., on trending venues, traffic jams). Unfortunately, only a very small percentage of the twitter posts are geotagged, which significantly restricts the applicability and utility of such applications. In this work, we address this problem by proposing a framework for geolocating tweets that are not geotagged. Our solution is general and estimates the location from which a post was generated by exploiting the similarities in the content between this post and a set of geotagged tweets, as well as their time-evolution characteristics. Contrary to previous approaches, our framework aims at providing accurate geolocation estimates at fine grain (i.e., within a city). The experimental evaluation with real data demonstrates the efficiency and effectiveness of our approach.

Journal ArticleDOI
TL;DR: A contagion model is used to predict the near-quadratic scaling for the disaster response case and suggests that diffusion is present in emergency response case, while regular charity does not spread via social network.
Abstract: We study the relationship between chatter on social media and observed actions concerning charitable donation. One hypothesis is that a fraction of those who act will also tweet about it, implying a linear relation. However, if the contagion is present, we expect a superlinear scaling. We consider two scenarios: donations in response to a natural disaster, and regular donations. We empirically validate the model using two location-paired sets of social media and donation data, corresponding to the two scenarios. Results show a quadratic relation between chatter and action in emergency response case. In case of regular donations, we observe a near-linear relation. Additionally, regular donations can be explained by demographic factors, while for a disaster response social media is a much better predictor of action. A contagion model is used to predict the near-quadratic scaling for the disaster response case. This suggests that diffusion is present in emergency response case, while regular charity does not spread via social network. Understanding the scaling behavior that relates social media chatter to physical actions is an important step in estimating the extent of a response and for determining social media strategies to affect the response.

Journal ArticleDOI
TL;DR: This paper surveys different ways used for building systems for subjective and sentiment analysis for languages other than English and presents a separate section devoted to Arabic sentiment analysis.
Abstract: Subjective and sentiment analysis have gained considerable attention recently. Most of the resources and systems built so far are done for English. The need for designing systems for other languages is increasing. This paper surveys different ways used for building systems for subjective and sentiment analysis for languages other than English. There are three different types of systems used for building these systems. The first (and the best) one is the language-specific systems. The second type of systems involves reusing or transferring sentiment resources from English to the target language. The third type of methods is based on using language-independent methods. The paper presents a separate section devoted to Arabic sentiment analysis.

Journal ArticleDOI
TL;DR: This work introduces similarity measures that capture the unique features and characteristics of the online dating network, for example, the interest similarity between two users if they send messages to same users, and attractiveness similarity if they receive messages from same users.
Abstract: Online dating sites have become popular platforms for people to look for potential romantic partners. Different from traditional user-item recommendations where the goal is to match items (e.g., books, videos) with a user’s interests, a recommendation system for online dating aims to match people who are mutually interested in and likely to communicate with each other. We introduce similarity measures that capture the unique features and characteristics of the online dating network, for example, the interest similarity between two users if they send messages to same users, and attractiveness similarity if they receive messages from same users. A reciprocal score that measures the compatibility between a user and each potential dating candidate is computed, and the recommendation list is generated to include users with top scores. The performance of our proposed recommendation system is evaluated on a real-world dataset from a major online dating site in China. The results show that our recommendation algorithms significantly outperform previously proposed approaches, and the collaborative filtering-based algorithms achieve much better performance than content-based algorithms in both precision and recall. Our results also reveal interesting behavioral difference between male and female users when it comes to looking for potential dates. In particular, males tend to be focused on their own interest and oblivious toward their attractiveness to potential dates, while females are more conscientious to their own attractiveness to the other side of the line.

Journal ArticleDOI
TL;DR: This paper proposes a supervised learning approach which exploits features computed by time-aware forecasts of topological measures calculated between node pairs, and instantiate the interaction prediction problem in two disjoint applicative scenarios: intra-community and inter-community link prediction.
Abstract: Due to the growing availability of Internet services in the last decade, the interactions between people became more and more easy to establish. For example, we can have an intercontinental job interview, or we can send real-time multimedia content to any friend of us just owning a smartphone. All this kind of human activities generates digital footprints, that describe a complex, rapidly evolving, network structures. In such dynamic scenario, one of the most challenging tasks involves the prediction of future interactions between couples of actors (i.e., users in online social networks, researchers in collaboration networks). In this paper, we approach such problem by leveraging networks dynamics: to this extent, we propose a supervised learning approach which exploits features computed by time-aware forecasts of topological measures calculated between node pairs. Moreover, since real social networks are generally composed by weakly connected modules, we instantiate the interaction prediction problem in two disjoint applicative scenarios: intra-community and inter-community link prediction. Experimental results on real time-stamped networks show how our approach is able to reach high accuracy. Furthermore, we analyze the performances of our methodology when varying the typologies of features, community discovery algorithms and forecast methods.

Journal ArticleDOI
TL;DR: This paper presents an ongoing research project aiming to explore how approaches and techniques at the boundaries between Network analysis, Legal informatics and Visualization can help shedding new light into legal matters.
Abstract: In recent years, the encounter between network analysis (NA) and Law has issued new challenges both on a scientific and application level. If, on the one hand, it is fostering new computational-inspired approaches to visualize, retrieve, manipulate and analyze legal information, on the other hand, it is inspiring the creation of innovative tools allowing legal scholars without technical skills to start dealing with NA and visual analytics on their own. This paper presents an ongoing research project aiming to explore how approaches and techniques at the boundaries between Network analysis, Legal informatics and Visualization can help shedding new light into legal matters. The attention is focused, on EuCaseNet, an online toolkit allowing legal scholars to apply NA and visual analytics techniques to the entire corpus of EU case law.

Journal ArticleDOI
TL;DR: This work proposes a heuristic based on the hitting time statistics of a surrogate random walk process that can be used to approximate the maximum likelihood estimator of the rumor source.
Abstract: We consider the problem of inferring the source of a rumor in a given large network. We assume that the rumor propagates in the network through a discrete time susceptible-infected model. Input to our problem includes information regarding the entire network, an infected subgraph of the network observed at some known time instant, and the probability of one-hop rumor propagation. We propose a heuristic based on the hitting time statistics of a surrogate random walk process that can be used to approximate the maximum likelihood estimator of the rumor source. We test the performance of our heuristic on some standard synthetic and real-world network datasets and show that it outperforms many centrality-based heuristics that have traditionally been used in rumor source inference literature. Through time complexity analysis and extensive experimental evaluation, we demonstrate that our heuristic is computationally efficient for large, undirected and dense non-tree networks.