scispace - formally typeset
Search or ask a question
Author

Yang Liu

Bio: Yang Liu is an academic researcher from New Jersey Institute of Technology. The author has contributed to research in topics: Rumor & Population. The author has an hindex of 4, co-authored 4 publications receiving 80 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A new information propagation model based on a heterogeneous user representation and modeling approach is developed that is able to differentiate rumors from credible messages through observing distinctions in their respective propagation patterns in social media.
Abstract: In the midst of today’s pervasive influence of social media content and activities, information credibility has increasingly become a major issue. Accordingly, identifying false information, e.g., rumors circulated in social media environments, attracts expanding research attention and growing interests. Many previous studies have exploited user-independent features for rumor detection. These prior investigations uniformly treat all users relevant to the propagation of a social media message as instances of a generic entity. Such a modeling approach usually adopts a homogeneous network to represent all users, the practice of which ignores the variety across an entire user population in a social media environment. Recognizing this limitation in modeling methodologies, this paper explores user-specific features in a social media environment for rumor detection. The new approach hypothesizes whether a user tending to spread a rumor message is dependent on specific attributes of the user in addition to content characteristics of the message itself. Under this hypothesis, the information propagation patterns of rumors versus those of credible messages in a social media environment are differentiable. To explore and exploit this hypothesis, we develop a new information propagation model based on a heterogeneous user representation and modeling approach. By applying the new approach, we are able to differentiate rumors from credible messages through observing distinctions in their respective propagation patterns in social media. The experimental results show that the new information propagation model based on heterogeneous user representation can effectively distinguish rumors from credible social media content. Our experimental findings further show that rumors are more likely to spread among certain user groups.

67 citations

Book ChapterDOI
31 Mar 2015
TL;DR: Experimental results show that the new information propagation model based on a heterogeneous user representation can effectively distinguish rumors from credible social media content.
Abstract: In the midst of today’s pervasive influence of social media content and activities, information credibility has increasingly become a major issue. Accordingly, identifying false information, e.g. rumors circulated in social media environments, attracts expanding research attention and growing interests. Many previous studies have exploited user-independent features for rumor detection. These prior investigations uniformly treat all users relevant to the propagation of a social media message as instances of a generic entity. Such a modeling approach usually adopts a homogeneous network to represent all users, the practice of which ignores the variety across an entire user population in a social media environment. Recognizing this limitation of modeling methodologies, this study explores user-specific features in a social media environment for rumor detection. The new approach hypothesizes that whether a user tends to spread a rumor is dependent upon specific attributes of the user in addition to content characteristics of the message itself. Under this hypothesis, information propagation patterns of rumors versus those of credible messages in a social media environment are systematically differentiable. To explore and exploit this hypothesis, we develop a new information propagation model based on a heterogeneous user representation for rumor recognition. The new approach is capable of differentiating rumors from credible messages through observing distinctions in their respective propagation patterns in social media. Experimental results show that the new information propagation model based on heterogeneous user representation can effectively distinguish rumors from credible social media content.

17 citations

Proceedings Article
14 Nov 2014
TL;DR: A new integrated biomedical NLP pipeline is developed that automatically extracts a comprehensive set of patient demographics and medical information from online health forums and can be adopted to construct structured personal health profiles from unstructured user-contributed content on eHealth social media sites.
Abstract: Natural language processing has been successfully leveraged to extract patient information from unstructured clinical text. However the majority of the existing work targets at obtaining a specific category of clinical information through individual efforts. In the midst of the Health 2.0 wave, online health forums increasingly host abundant and diverse health-related information regarding the demographics and medical information of patients who are either actively participating in or passively reported at these forums. The potential categories of such information span a wide spectrum, whose extraction requires a systematic and comprehensive approach beyond the traditional isolated efforts that specialize in harvesting information of single categories. In this paper, we develop a new integrated biomedical NLP pipeline that automatically extracts a comprehensive set of patient demographics and medical information from online health forums. The pipeline can be adopted to construct structured personal health profiles from unstructured user-contributed content on eHealth social media sites. This paper describes key aspects of the pipeline as well as reports experimental results that show the system’s satisfactory performance in accomplishing a series of NLP tasks of extracting patient information from online health forums.

10 citations

Journal ArticleDOI
01 Jun 2017
TL;DR: This paper introduces a novel Local Context‐Aware LDA Model (LC‐LDA), which is capable of observing a local context comprising a rich collection of documents that may directly or indirectly influence the topic distributions of a target document.
Abstract: With the rapid development of the Internet and its applications, growing volumes of documents increasingly become interconnected to form large-scale document networks. Accordingly, topic modeling in a network of documents has been attracting continuous research attention. Most of the existing network-based topic models assume that topics in a document are influenced by its directly linked neighbouring documents in a document network and overlook the potential influence from indirectly linked ones. The existing work also has not carefully modeled variations of such influence among neighboring documents. Recognizing these modeling limitations, this paper introduces a novel Local Context-Aware LDA Model LC-LDA, which is capable of observing a local context comprising a rich collection of documents that may directly or indirectly influence the topic distributions of a target document. The proposed model can also differentiate the respective influence of each document in the local context on the target document according to both structural and temporal relationships between the two documents. The proposed model is extensively evaluated through multiple document clustering and classification tasks conducted over several large-scale document sets. Evaluation results clearly and consistently demonstrate the effectiveness and superiority of the new model with respect to several state-of-the-art peer models.

8 citations


Cited by
More filters
Journal ArticleDOI
12 Jan 2017-PLOS ONE
TL;DR: In this paper, a comprehensive set of user, structural, linguistic, and temporal features was examined and their relative strength was compared from near-complete date of Twitter, and a new rumor classification algorithm that achieves competitive accuracy over both short and long time windows.
Abstract: This study determines the major difference between rumors and non-rumors and explores rumor classification performance levels over varying time windows-from the first three days to nearly two months. A comprehensive set of user, structural, linguistic, and temporal features was examined and their relative strength was compared from near-complete date of Twitter. Our contribution is at providing deep insight into the cumulative spreading patterns of rumors over time as well as at tracking the precise changes in predictive powers across rumor features. Statistical analysis finds that structural and temporal features distinguish rumors from non-rumors over a long-term window, yet they are not available during the initial propagation phase. In contrast, user and linguistic features are readily available and act as a good indicator during the initial propagation phase. Based on these findings, we suggest a new rumor classification algorithm that achieves competitive accuracy over both short and long time windows. These findings provide new insights for explaining rumor mechanism theories and for identifying features of early rumor detection.

314 citations

Journal ArticleDOI
TL;DR: In this article, the authors introduce a framework for promptly identifying polarizing content on social media and thus predicting future fake news topics, based on a series of characteristics related to users' behavior on online social media.
Abstract: Users’ polarization and confirmation bias play a key role in misinformation spreading on online social media. Our aim is to use this information to determine in advance potential targets for hoaxes and fake news. In this article, we introduce a framework for promptly identifying polarizing content on social media and, thus, “predicting” future fake news topics. We validate the performances of the proposed methodology on a massive Italian Facebook dataset, showing that we are able to identify topics that are susceptible to misinformation with 77% accuracy. Moreover, such information may be embedded as a new feature in an additional classifier able to recognize fake news with 91% accuracy. The novelty of our approach consists in taking into account a series of characteristics related to users’ behavior on online social media such as Facebook, making a first, important step towards the mitigation of misinformation phenomena by supporting the identification of potential misinformation targets and thus the design of tailored counter-narratives.

185 citations

Journal ArticleDOI
TL;DR: A state-of-the-art review of automated misinformation detection in social networks where deep learning (DL) is used to automatically process data and create patterns to make decisions not only to extract global features but also to achieve better results.
Abstract: Recently, the use of social networks such as Facebook, Twitter, and Sina Weibo has become an inseparable part of our daily lives. It is considered as a convenient platform for users to share personal messages, pictures, and videos. However, while people enjoy social networks, many deceptive activities such as fake news or rumors can mislead users into believing misinformation. Besides, spreading the massive amount of misinformation in social networks has become a global risk. Therefore, misinformation detection (MID) in social networks has gained a great deal of attention and is considered an emerging area of research interest. We find that several studies related to MID have been studied to new research problems and techniques. While important, however, the automated detection of misinformation is difficult to accomplish as it requires the advanced model to understand how related or unrelated the reported information is when compared to real information. The existing studies have mainly focused on three broad categories of misinformation: false information, fake news, and rumor detection. Therefore, related to the previous issues, we present a comprehensive survey of automated misinformation detection on (i) false information, (ii) rumors, (iii) spam, (iv) fake news, and (v) disinformation. We provide a state-of-the-art review on MID where deep learning (DL) is used to automatically process data and create patterns to make decisions not only to extract global features but also to achieve better results. We further show that DL is an effective and scalable technique for the state-of-the-art MID. Finally, we suggest several open issues that currently limit real-world implementation and point to future directions along this dimension.

125 citations

Journal ArticleDOI
TL;DR: A framework to train and validate Latent Dirichlet Allocation (LDA), the simplest and most popular topic modeling algorithm, using e-petition data is described and findings have significant implications for developing LDA tools and assuring validity and interpretability of LDA content analysis.
Abstract: E-petitions have become a popular vehicle for political activism, but studying them has been difficult because efficient methods for analyzing their content are currently lacking. Researchers have used topic modeling for content analysis, but current practices carry some serious limitations. While modeling may be more efficient than manually reading each petition, it generally relies on unsupervised machine learning and so requires a dependable training and validation process. And so this paper describes a framework to train and validate Latent Dirichlet Allocation (LDA), the simplest and most popular topic modeling algorithm, using e-petition data. With rigorous training and evaluation, 87% of LDA-generated topics made sense to human judges. Topics also aligned well with results from an independent content analysis by the Pew Research Center, and were strongly associated with corresponding social events. Computer-assisted content analysts can benefit from our guidelines to supervise every process of training and evaluation of LDA. Software developers can benefit from learning the demands of social scientists when using LDA for content analysis. These findings have significant implications for developing LDA tools and assuring validity and interpretability of LDA content analysis. In addition, LDA topics can have some advantages over subjects extracted by manual content analysis by reflecting multiple themes expressed in texts, by extracting new themes that are not highlighted by human coders, and by being less prone to human bias.

113 citations

Journal ArticleDOI
TL;DR: This study is original by presenting an important source of research by explaining the problems of online social network and the studies performed in this area and a reference work for researchers interested in analyzingOnline social network data and social network problems.
Abstract: The use of online social networks has made significant progress in recent years as the use of the Internet has become widespread worldwide as the technological infrastructure and the use of technological products evolve. It has become more suitable to reach online social networking sites such as Facebook, Twitter, Instagram and LinkedIn via the internet and web 3.0 technologies. Thus, people have shared their views on many different topics and their emotions with other users more widely on these platforms. This means that a huge amount of data is created on platforms where millions of people connect with each other through social networks. Nevertheless, the development of computational paradigms at high speed and complexity with technological possibilities allows analysis of valuable data by means of social network analysis methods. Our goal for this paper is to present a review of novel and popular online social network analysis problems with related applications and a reference work for researchers interested in analyzing online social network data and social network problems. Unlike other individual studies we have gathered 21 online social network problems and defined them with related studies. Thus, this study is original by presenting an important source of research by explaining the problems of online social network and the studies performed in this area.

105 citations