Showing papers in "IEEE Transactions on Computational Social Systems in 2016"
TL;DR: A new information propagation model based on a heterogeneous user representation and modeling approach is developed that is able to differentiate rumors from credible messages through observing distinctions in their respective propagation patterns in social media.
Abstract: In the midst of today’s pervasive influence of social media content and activities, information credibility has increasingly become a major issue. Accordingly, identifying false information, e.g., rumors circulated in social media environments, attracts expanding research attention and growing interests. Many previous studies have exploited user-independent features for rumor detection. These prior investigations uniformly treat all users relevant to the propagation of a social media message as instances of a generic entity. Such a modeling approach usually adopts a homogeneous network to represent all users, the practice of which ignores the variety across an entire user population in a social media environment. Recognizing this limitation in modeling methodologies, this paper explores user-specific features in a social media environment for rumor detection. The new approach hypothesizes whether a user tending to spread a rumor message is dependent on specific attributes of the user in addition to content characteristics of the message itself. Under this hypothesis, the information propagation patterns of rumors versus those of credible messages in a social media environment are differentiable. To explore and exploit this hypothesis, we develop a new information propagation model based on a heterogeneous user representation and modeling approach. By applying the new approach, we are able to differentiate rumors from credible messages through observing distinctions in their respective propagation patterns in social media. The experimental results show that the new information propagation model based on heterogeneous user representation can effectively distinguish rumors from credible social media content. Our experimental findings further show that rumors are more likely to spread among certain user groups.
TL;DR: A novel approach to incorporate spatial, temporal, and social context into a traditional collaborative filtering algorithm is introduced, and it is demonstrated that this approach is at the least competitive with existing state-of-the-art location recommenders.
Abstract: Location-based social networks (LBSNs) such as Foursquare, Brightkite, and Gowalla are a growing area where recommendation algorithms find a practical application. With an ever-increasing variety of venues to choose from deciding on a destination can be overwhelming. Recommenders aid their users in the decision-making process by providing a list of locations likely to be relevant to the user’s needs and interests. Traditional collaborative filtering algorithms consider relationships between users and locations, finding users to be similar only if their location histories overlap. However, the availability of spatial, temporal, and social information in an LBSN offers an opportunity to improve the quality of a recommendation engine. Social network data allows us to connect users who can directly influence each other’s decisions. Temporal data allows us to account for the drifting preferences of users, giving more weight to recent location visits over historical selections, and taking advantages of repetitive behaviors. Spatial information allows us to focus recommendations on locations close to the user, keeping our recommendations relevant as a user travels. We introduce a novel approach to incorporate spatial, temporal, and social context into a traditional collaborative filtering algorithm. We evaluate our method on data sets collected from three LBSNs, and demonstrate that our approach is at the least competitive with existing state-of-the-art location recommenders.
TL;DR: The results suggest that interactions reiterate the information contained in friendship ties sufficiently well to serve as a proxy when the majority of a network is unobserved.
Abstract: While privacy preserving mechanisms, such as hiding one’s friends list, may be available to withhold personal information on online social networking sites, it is not obvious whether to which degree a user’s social behavior renders such an attempt futile. In this paper, we study the impact of additional interaction information on the inference of links between nodes in partially covert networks. This investigation is based on the assumption that interaction might be a proxy for connectivity patterns in online social networks. For this purpose, we use data collected from 586 Facebook profiles consisting of friendship ties (conceptualized as the network) and comments on wall posts (serving as interaction information) by a total of 64 000 users. The link-inference problem is formulated as a binary classification problem using a comprehensive set of features and multiple supervised learning algorithms. Our results suggest that interactions reiterate the information contained in friendship ties sufficiently well to serve as a proxy when the majority of a network is unobserved.
TL;DR: This paper proposes a signaling game approach, namely, Sig4UDD, to study the impact of uncertain cooperation among well-behaved and SS nodes on the performance of data forwarding, and establishes a belief system to help SS nodes predict the type of their opponents and take appropriate actions to maximize their utilities.
Abstract: Cooperative data delivery among mobile nodes can improve the performance of data delivery in mobile social networks. However, data routing in the presence of socially selfish (SS) nodes is challenging, where they mitigate the degree of their cooperation level based on their social features and ties to achieve their social objectives. This issue becomes more challenging when they prevent revealing their reactions about incoming messages, which leads data forwarding under uncertain behavior. In this paper, we propose a signaling game approach, namely, Sig4UDD, to study the impact of uncertain cooperation among well-behaved and SS nodes on the performance of data forwarding. In Sig4UDD, we employ Bayesian Nash equilibrium to analyze one-stage interactions among nodes. Then, perfect Bayesian equilibrium is applied to analyze their multistage interactions. In this stage, we establish a belief system to help SS nodes predict the type of their opponents and take appropriate actions to maximize their utilities. To update the beliefs of SS nodes, we devised the weighted social distance metric to measure the global social distance among nodes. Finally, we compare the performance of Sig4UDD to some benchmark cooperative and noncooperative data forwarding protocols using Reality Mining and Social Evolution data sets.
TL;DR: The reported results shed light on the sensitivity of betweenness, closeness, and degree centrality metrics to fused graph inputs and the role of HVI identification as a test and evaluation tool for fusion process optimization.
Abstract: This paper reports on the utility of social network analysis methods in the data fusion domain. Given fused data that combine multiple intelligence reports from the same environment, social network extraction and high value individual (HVI) identification are of interest. The research on the feasibility of such activities may help not only in methodological developments in network science but also in testing and evaluation of fusion quality. This paper offers a parallel computing-based methodology to extract a social network of individuals from fused data, captured as a cumulative associated data graph (CDG). To obtain the desired social network, two approaches including a hop count weighted and a path salience approach are developed and compared. A supervised learning framework is implemented for parameterizing the extraction algorithms. Parameters utilized in the extraction algorithm consider paths between individuals within the social network, weighing relationships between these individuals based on the count weighted and the path salience calculation methodologies. An overall link strength value is then calculated by aggregating path hop count weights and saliences between unique individual pairs for the hop count weighted and path salience approaches, respectively. Ordered centrality-based HVI lists are obtained from the CDGs constructed from the Sunni criminal thread and Bath’est resurgence threads of the SYNCOIN data set, under various fusion system settings. The reported results shed light on the sensitivity of betweenness, closeness, and degree centrality metrics to fused graph inputs and the role of HVI identification as a test and evaluation tool for fusion process optimization. The computational results demonstrate superiority of path salience approach in identifying HVIs. The insights generated by these approaches and directions for future research are discussed.
TL;DR: This paper proposes the influence-distance-based effector detection problem and provides a 3-approximation approach and proves that the optimal MLE can be obtained in polynomial time for connected directed acyclic graphs.
Abstract: In a social network, influence diffusion is the process of spreading innovations from user to user. An activation state identifies who are the active users who have adopted the target innovation. Given an activation state of a certain diffusion, effector detection aims to reveal the active users who are able to best explain the observed state. In this paper, we tackle the effector detection problem from two perspectives. The first approach is based on the influence distance that measures the chance that an active user can activate its neighbors. For a certain pair of users, the shorter the influence distance, the higher probability that one can activate the other. Given an activation state, the effectors are expected to have short influence distance to active users while long to inactive users. By this idea, we propose the influence-distance-based effector detection problem and provide a 3-approximation. Second, we address the effector detection problem by the maximum likelihood estimation (MLE) approach. We prove that the optimal MLE can be obtained in polynomial time for connected directed acyclic graphs. For general graphs, we first extract a directed acyclic subgraph that can well preserve the information in the original graph and then apply the MLE approach to the extracted subgraph to obtain the effectors. The effectiveness of our algorithms is experimentally verified via simulations on the real-world social network.
TL;DR: By applying a fluid limit theorem for jump Markov processes, a system of differential equations for the density functions of opinions for large networks is derived and it is shown that the equilibrium points corresponding to consensus and polarization are the only stable equilibrium points.
Abstract: In this paper, we propose a new model for binary opinion dynamics in a (fully connected) structurally balanced network. In a structurally balanced network, agents are classified into two clusters and two agents in the same cluster (resp. different clusters) are connected with a positive (resp. negative) edge. Initially, every agent is assigned with one of the two opinions randomly. In every time slot, three agents are randomly selected to have their opinions updated. If the three agents belong to the same cluster, the majority rule (MR) is used to update their opinions. On the other hand, if the three agents belong to two different clusters, with probability $p$ , a consensus is reached by the MR, and with probability $1-p$ , a polarization (in line with the signs of the three edges) is reached. The probability $p$ , called the rationality probability, plays a significant role for measuring how rational the agents in a network behave when they encounter different opinions. By applying a fluid limit theorem for jump Markov processes, we derive a system of differential equations for the density functions of opinions for large networks. We show that the equilibrium points corresponding to consensus and polarization are the only stable equilibrium points. All other equilibrium points are all unstable. As such, as time goes on, the network eventually reaches a consensus or a polarization, depending on the rationality probability and the initial state of the network.
TL;DR: The analysis of temporal causality of CSN sentiment dynamics offers new insights that the designers, managers, and moderators of an online community, such as CSN, can utilize to facilitate and enhance the interactions so as to better meet the social support needs of the CSN participants.
Abstract: Online health communities (OHCs) constitute a useful source of information and social support for patients. American Cancer Society’s Cancer Survivor Network (CSN), a 173 000-member community, is the largest online network for cancer patients, survivors, and caregivers. A discussion thread in CSN is often initiated by a cancer survivor seeking support from other members of CSN. Discussion threads are multiparty conversations that often provide a source of social support, e.g., by bringing about a change of sentiment from negative to positive on the part of the thread originator. While previous studies regarding cancer survivors have shown that the members of an OHC derive benefits from their participation in such communities, causal accounts of the factors that contribute to the observed benefits have been lacking. We introduce a novel framework to examine the temporal causality of sentiment dynamics in the CSN. We construct a probabilistic computation tree logic representation and a corresponding probabilistic Kripke structure to represent and reason about the changes in sentiments of posts in a thread over time. We use a sentiment classifier trained using machine learning on a set of posts manually tagged with sentiment labels to classify posts as expressing either positive or negative sentiment. We analyze the probabilistic Kripke structure to identify the prima facie causes of sentiment change on the part of the thread originators in the CSN forum and their significance. We find that the sentiment of replies appears to causally influence the sentiment of the thread originator. Our experiments also show that the conclusions are robust with respect to the choice of the: 1) classification threshold of the sentiment classifier and 2) choice of the specific sentiment classifier used. We also extend the basic framework for temporal causality analysis to incorporate the uncertainty in the states of the probabilistic Kripke structure resulting from the use of an imperfect state transducer (in our case, the sentiment classifier). Our analysis of temporal causality of CSN sentiment dynamics offers new insights that the designers, managers, and moderators of an online community, such as CSN, can utilize to facilitate and enhance the interactions so as to better meet the social support needs of the CSN participants. The proposed methodology for the analysis of temporal causality has broad applicability in a variety of settings where the dynamics of the underlying system can be modeled in terms of state variables that change in response to internal or external inputs.
TL;DR: The clustering results suggest that some symptom clusters are consistent across social media data and clinical data, such as gastrointestinal related symptoms, menopausal symptoms, mood-change symptoms, cognitive impairment, and pain-related symptoms.
Abstract: Most cancer patients, including patients with breast cancer, experience multiple symptoms simultaneously while receiving active treatment. Some symptoms tend to occur together and may be related, such as hot flashes and night sweats. Co-occurring symptoms may have a multiplicative effect on patients’ functioning, mental health, and quality of life. Symptom clusters in the context of oncology were originally described as groups of three or more related symptoms. Some authors have suggested symptom clusters may have practical applications, such as the formulation of more effective therapeutic interventions that address the combined effects of symptoms rather than treating each symptom separately. Most studies that have sought to identify clusters in breast cancer survivors have relied on traditional research studies. Social media, such as online health-related forums, contain a bevy of user-generated content in the form of threads and posts, and could be used as a data source to identify and characterize symptom clusters among cancer patients. This paper seeks to determine patterns of symptom clusters in breast cancer survivors derived from both social media and research study data using improved $K$ -medoid clustering. A total of 50426 publicly available messages were collected from Medhelp.com and 653 questionnaires were collected as part of a research study. The network of symptoms built from social media was sparse compared with that of the research study data, making the social media data easier to partition. The proposed revised $K$ -medoid clustering helps to improve the clustering performance by reassigning some of the negative-average silhouette width (ASW) symptoms to other clusters after initial $K$ -medoid clustering. This retains an overall nondecreasing ASW and avoids the problem of trapping in local optima. The overall ASW, individual ASW, and improved interpretation of the final clustering solution suggest improvement. The clustering results suggest that some symptom clusters are consistent across social media data and clinical data, such as gastrointestinal related symptoms, menopausal symptoms, mood-change symptoms, cognitive impairment, and pain-related symptoms. We recommend an integrative approach taking advantage of both data sources. Social media data could provide context for the interpretation of clustering results derived from research study data, while research study data could compensate for the risk of lower precision and recall found using social media data.
TL;DR: It is propagated that a blend of directional advancement and the mixing of schools of thoughts is essential for the steady development of a particular field of research.
Abstract: We investigate the Geom collaboration network under the random matrix theory framework. While the spectral density exhibiting triangular shape with high degeneracy at zero emphasizes on the complexity of interactions in underlying system, the spectral fluctuations provide a measure of the complexity. The short-range correlations follow the random matrix prediction, suggesting the existence of a minimal amount of randomness in the interactions between authors, whereas the long-range correlations deviating from the random matrix prediction implicate more directionality in collaboration behavior leading to less randomness. A higher degeneracy at −1 eigenvalue in the Geom collaboration network as compared with its configuration model indicates a large number of close to complete subgraphs in the network, suggesting collaboration groups among scientists. These structures can be considered to convey the same school of thoughts, whereas the randomness in spectra might be arising due to the intermingling of different collaboration modules. These results lead us to propagate that a blend of directional advancement and the mixing of schools of thoughts is essential for the steady development of a particular field of research.
TL;DR: This paper attempts to perform a multivariate analysis of video call record data collected from a wide area organizational network over a period of time and exhibits deviations from the conventional machine learning paradigms.
Abstract: Integration of physical processes with the computing world is driving newer challenges for networking frameworks. Cyber physical social systems (CPSSs) are another upcoming paradigm that encompasses the ever-growing interaction between the physical, social, and cyber worlds. As communication networks form the basis of these interactions, a cognitive evaluation of networks is called for. This CPSS driven network evolution was a direction motivating this paper. With the implementation of the next generation networks, traffic from real-time interactive services, such as video conferencing, is surpassing those of conventional transactional services. As such multimedia data transportation over IP networks has stringent quality constraints in terms of required bandwidth, latency, and jitter, legacy networks with no quality of service face challenges in terms of performance. We attempt to perform a multivariate analysis of video call record data collected from a wide area organizational network over a period of time. Learning-based prediction is attempted by training four classifiers: naive Bayes, $k$ -nearest neighbor, decision tree, and support vector machine. Two independent set of experiments were conducted with oversights of bandwidth and destination prediction. Both the discrete and continuous valued predictors were involved in the training. Performance evaluation of the generated hypothesis in both the cases was conducted using tenfold cross validation. Combined analysis using the assorted combinations of attributes was conducted, and thereafter, the effect of each feature was evaluated through singular attribute portioning. This paper presents observations, which exhibit deviations from the conventional machine learning paradigms. An attempt to increase the prediction accuracy of the classifiers was made through the boosting ensemble methodology. However, miniscule addition in performance was achieved. A maximum prediction accuracy of 81% for bandwidth and 60% for destination was obtained. Reasons of low accuracy of conventionally better performing algorithm were reasoned with a mathematical comprehension. Divergence of the obtained results from the accepted patterns poses an open research problem, particularly with respect to the nature and peculiarities of the data set. The proposed learning technique can have potential applications in social, tactical, and strategic spheres.
TL;DR: This paper model the task of intent detection as a binary classification problem, and thus for each question, two classes are defined: subjective and objective, and finds that the two types of questions exhibited very different characteristics.
Abstract: The explosive popularity of social networking sites has provided an additional venue for online information seeking. By posting questions in their status updates, more and more people are turning to social networks to fulfill their information needs. Given that understanding individuals’ information needs could improve the performance of question answering, in this paper, we model the task of intent detection as a binary classification problem, and thus for each question, two classes are defined: subjective and objective. We use a comprehensive set of lexical, syntactical, and contextual features to build the classifier and the experimental results show satisfactory classification performance. By applying the classifier on a larger dataset, we then present in-depth analyses to compare subjective and objective questions, in terms of the way they are being asked and answered. We find that the two types of questions exhibited very different characteristics, and further validate the expected benefits of differentiating questions according to their subjectivity orientations.
TL;DR: This paper shows the existence of stable nodes in various networks and indicates that the design of the consensus approach based on the properties of the stable nodes can further improve the stability of the rank orders.
Abstract: In complex network analysis, the problem of ranking individual nodes based on their importance has attracted increasing attention from the scientific community due to its vast application, such as identification of influential spreaders for viral marketing or epidemic control, bottlenecks for traffic congestion control, and so on. The growing literature proposes a number of measures to determine the rank order of the network entities where complete information about the nodes and their interaction is available. Degree centrality, PageRank, eigenvector centrality, closeness centrality are few such popular measures. In most real-life scenarios, however, the information about the underlying network is incomplete or affected due to noise. The few works that study the effects of incomplete information on the rank orders show the vulnerability of the rank orders in various topologies. In this paper, we investigate the effects of noise, both random and nonrandom, on the aggregated rank orders determined from the degree, PageRank, eigenvector centrality, and closeness centrality-based rankings. This paper reveals an important insight that even the simple Borda Count ranking has the potential to improve on the accuracy of rank orders in networks with uncertainty. This paper shows the existence of stable nodes in various networks and indicates that the design of the consensus approach based on the properties of the stable nodes can further improve the stability of the rank orders.
TL;DR: A novel approach for harnessing a collective (crowdsourced) predictive ability available through publicly made technology-related statements by automatically determining significant convergences on technology forecasts is described.
Abstract: Efforts to predict emerging, new, or disruptive technologies use various analyses and data sources to derive indicators and subsequent forecasts about technological innovations, including quantitative (such as bibliometric analysis) and qualitative methods (such as expert elicitation). We describe a novel approach for harnessing a collective (crowdsourced) predictive ability available through publicly made technology-related statements by automatically determining significant convergences on technology forecasts. We evaluate our approach using a corpus of science-related articles and demonstrate that passive crowdsourcing may be a powerful source of technology-related predictive intelligence.
TL;DR: This paper proposes a new trust model referred to as “Web of credit (WoC),” where one gives credit to those others one has interacted with based on the quality of the information one’s peers have provided, and contributes a WoC-based trust inference algorithm that is adaptive to the change of user profiles by automatically redistributing credit and reinferring trust measures within the network.
Abstract: Trust is a pivotal element of any information system that allows users to share, communicate, interact, or collaborate with one another. Trust inference is particularly crucial for online social networks where interaction with acquaintances or even anonymous strangers is widely a norm. In the past decade, a number of trust inference algorithms have been proposed to address this issue, which are primarily based either on the “reputation” or the “Web of trust (WoT)” model. The reputation-based model supports objective inference of a universal reputation for each user by analyzing the interaction histories among the users; however, it does not allow individual users to specify personalized trust measures for the same other users. In contrast, the WoT-based model allows each individual user to specify a trust value for their direct neighbors within a trust network. However, the accuracy of such a subjective trust value is questionable and further subject to loss in the course of propagating trust measures to nonneighboring users in the network. In this paper, we propose a new trust model referred to as “Web of credit (WoC),” where one gives credit to those others one has interacted with based on the quality of the information one’s peers have provided. Credit flows from one user to another within a trust network, forming trust relationships. This new model combines the objectivism from the reputation-based model for credit assignment by exploiting the actual interaction histories among users in the form of online rating data and the individualism from the WoT-based model for personalized trust measures. We further contribute a WoC-based trust inference algorithm that is adaptive to the change of user profiles by automatically redistributing credit and reinferring trust measures within the network. Experiments with two real-world data sets have shown that the WoC-based trust inference algorithm is not only able to infer more accurate trust measures than both reputation-based and WoT-based algorithms do but also fast enough to be a viable solution for real-time trust inference in large-scale trust networks.
TL;DR: The emerging fields of social computing and computational social science have been driven by the transdisciplinary efforts of scientists, researchers, and scholars from many fields of study.
Abstract: The emerging fields of social computing and computational social science have been driven by the transdisciplinary efforts of scientists, researchers, and scholars from many fields of study. Collaborations across the social, natural, physical, information, computer, mathematical, and health sciences in addition to the humanities have resulted in significant advancements in our understanding of human behavior.
TL;DR: Presents correcttions to the paper, “A performance evaluation of machine learning-based streaming spam tweets detection,” (Chen ], C.; et al).
Abstract: Presents correcttions to the paper, “A performance evaluation of machine learning-based streaming spam tweets detection,” (Chen ], C.; et al) , IEEE Trans. Comput. Social Syst., vol. 2, no. 3, pp. 65–76, Sep. 2015.