scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fake News Detection on Social Media: A Data Mining Perspective

01 Sep 2017-Sigkdd Explorations (ACM)-Vol. 19, Iss: 1, pp 22-36
TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Citations
More filters
Proceedings ArticleDOI
19 Jul 2018
TL;DR: An end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events, is proposed.
Abstract: As news reading on social media becomes more and more popular, fake news becomes a major issue concerning the public and government. The fake news can take advantage of multimedia content to mislead readers and get dissemination, which can cause negative effects or even manipulate the public events. One of the unique challenges for fake news detection on social media is how to identify fake news on newly emerged events. Unfortunately, most of the existing approaches can hardly handle this challenge, since they tend to learn event-specific features that can not be transferred to unseen events. In order to address this issue, we propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events. It consists of three main components: the multi-modal feature extractor, the fake news detector, and the event discriminator. The multi-modal feature extractor is responsible for extracting the textual and visual features from posts. It cooperates with the fake news detector to learn the discriminable representation for the detection of fake news. The role of event discriminator is to remove the event-specific features and keep shared features among events. Extensive experiments are conducted on multimedia datasets collected from Weibo and Twitter. The experimental results show our proposed EANN model can outperform the state-of-the-art methods, and learn transferable feature representations.

627 citations

Journal ArticleDOI
TL;DR: A fake news data repository FakeNewsNet is presented, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information, and is discussed for potential applications on fake news study on social media.
Abstract: Social media has become a popular means for people to consume and share the news. At the same time, however, it has also enabled the wide dissemination of fake news, that is, news with intentionally false information, causing significant negative effects on society. To mitigate this problem, the research of fake news detection has recently received a lot of attention. Despite several existing computational solutions on the detection of fake news, the lack of comprehensive and community-driven fake news data sets has become one of major roadblocks. Not only existing data sets are scarce, they do not contain a myriad of features often required in the study such as news content, social context, and spatiotemporal information. Therefore, in this article, to facilitate fake news-related research, we present a fake news data repository FakeNewsNet, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information. We present a comprehensive description of the FakeNewsNet, demonstrate an exploratory analysis of two data sets from different perspectives, and discuss the benefits of the FakeNewsNet for potential applications on fake news study on social media.

577 citations

Journal ArticleDOI
TL;DR: This survey provides a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations, with special attention to the latest generation of DeepFakes.

502 citations

Journal ArticleDOI
TL;DR: To address the spread of misinformation, the frontline healthcare providers should be equipped with the most recent research findings and accurate information, and advanced technologies like natural language processing or data mining approaches should be applied in the detection and removal of online content with no scientific basis from all social media platforms.
Abstract: The coronavirus disease 2019 (COVID-19) pandemic has not only caused significant challenges for health systems all over the globe but also fueled the surge of numerous rumors, hoaxes, and misinformation, regarding the etiology, outcomes, prevention, and cure of the disease. Such spread of misinformation is masking healthy behaviors and promoting erroneous practices that increase the spread of the virus and ultimately result in poor physical and mental health outcomes among individuals. Myriad incidents of mishaps caused by these rumors have been reported globally. To address this issue, the frontline healthcare providers should be equipped with the most recent research findings and accurate information. The mass media, healthcare organization, community-based organizations, and other important stakeholders should build strategic partnerships and launch common platforms for disseminating authentic public health messages. Also, advanced technologies like natural language processing or data mining approaches should be applied in the detection and removal of online content with no scientific basis from all social media platforms. Furthermore, these practices should be controlled with regulatory and law enforcement measures alongside ensuring telemedicine-based services providing accurate information on COVID-19.

474 citations

Journal ArticleDOI
TL;DR: A comprehensive overview of the finding to date relating to fake news is presented, characterized the negative impact of online fake news, and the state-of-the-art in detection methods are characterized.
Abstract: Over the recent years, the growth of online social media has greatly facilitated the way people communicate with each other. Users of online social media share information, connect with other people and stay informed about trending events. However, much recent information appearing on social media is dubious and, in some cases, intended to mislead. Such content is often called fake news. Large amounts of online fake news has the potential to cause serious problems in society. Many point to the 2016 U.S. presidential election campaign as having been influenced by fake news. Subsequent to this election, the term has entered the mainstream vernacular. Moreover it has drawn the attention of industry and academia, seeking to understand its origins, distribution and effects. Of critical interest is the ability to detect when online content is untrue and intended to mislead. This is technically challenging for several reasons. Using social media tools, content is easily generated and quickly spread, leading to a large volume of content to analyse. Online information is very diverse, covering a large number of subjects, which contributes complexity to this task. The truth and intent of any statement often cannot be assessed by computers alone, so efforts must depend on collaboration between humans and technology. For instance, some content that is deemed by experts of being false and intended to mislead are available. While these sources are in limited supply, they can form a basis for such a shared effort. In this survey, we present a comprehensive overview of the finding to date relating to fake news. We characterize the negative impact of online fake news, and the state-of-the-art in detection methods. Many of these rely on identifying features of the users, content, and context that indicate misinformation. We also study existing datasets that have been used for classifying fake news. Finally, we propose promising research directions for online fake news analysis.

449 citations

References
More filters
Proceedings ArticleDOI
01 Jul 2018
TL;DR: The authors report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news, showing that 97% of the 299 fake news articles identified are also hyperpartisan.
Abstract: We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus of 1,627 articles from 9 political publishers, three each from the mainstream, the hyperpartisan left, and the hyperpartisan right, have been fact-checked by professional journalists at BuzzFeed: 97% of the 299 fake news articles identified are also hyperpartisan. We show how a style analysis can distinguish hyperpartisan news from the mainstream (F1 = 0.78), and satire from both (F1 = 0.81). But stylometry is no silver bullet as style-based fake news detection does not work (F1 = 0.46). We further reveal that left-wing and right-wing news share significantly more stylistic similarities than either does with the mainstream. This result is robust: it has been confirmed by three different modeling approaches, one of which employs Unmasking in a novel way. Applications of our results include partisanship detection and pre-screening for semi-automatic fake news detection.

341 citations

Journal ArticleDOI
TL;DR: This survey focuses on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects, and offers some guidelines on how to apply these approaches in application domains.
Abstract: Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, for the same object, there usually exist conflicts among the collected multi-source information. To tackle this challenge, truth discovery, which integrates multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this survey, we focus on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects. We also discuss some future directions of truth discovery research. We hope that this survey will promote a better understanding of the current progress on truth discovery, and offer some guidelines on how to apply these approaches in application domains.

331 citations

Journal ArticleDOI
TL;DR: The findings show that communities’ emotional behavior is affected by the users’ involvement inside the echo chamber, and that, on average, more active users show a faster shift towards the negativity than less active ones.
Abstract: Recent findings showed that users on Facebook tend to select information that adhere to their system of beliefs and to form polarized groups - i.e., echo chambers. Such a tendency dominates information cascades and might affect public debates on social relevant issues. In this work we explore the structural evolution of communities of interest by accounting for users emotions and engagement. Focusing on the Facebook pages reporting on scientific and conspiracy content, we characterize the evolution of the size of the two communities by fitting daily resolution data with three growth models - i.e. the Gompertz model, the Logistic model, and the Log-logistic model. Although all the models appropriately describe the data structure, the Logistic one shows the best fit. Then, we explore the interplay between emotional state and engagement of users in the group dynamics. Our findings show that communities' emotional behavior is affected by the users' involvement inside the echo chamber. Indeed, to an higher involvement corresponds a more negative approach. Moreover, we observe that, on average, more active users show a faster shift towards the negativity than less active ones.

326 citations

Journal ArticleDOI
TL;DR: This paper explores the key role of image content in the task of automatic news verification on microblogs and proposes several visual and statistical features to characterize these patterns visually and statistically for detecting fake news.
Abstract: Microblog has been a popular media platform for reporting and propagating news. However, fake news spreading on microblogs would severely jeopardize its public credibility. To identify the truthfulness of news on microblogs, images are very crucial content. In this paper, we explore the key role of image content in the task of automatic news verification on microblogs. Existing approaches to news verification depend on features extracted mainly from the text content of news tweets, while image features for news verification are often ignored. According to our study, however, images are very popular and have a great influence on microblogs news propagation. In addition, fake and real news events have different image distribution patterns. Therefore, we propose several visual and statistical features to characterize these patterns visually and statistically for detecting fake news. Experiments on a real-world multimedia dataset collected from Sina Weibo validate the effectiveness of our proposed image features. The news verification performance of our method outperforms baseline methods. To the best of our knowledge, this is the first attempt that systematically explores image features on news verification task.

323 citations

Proceedings Article
12 Feb 2016
TL;DR: This paper discovers conflicting viewpoints in news tweets with a topic model method, and builds a credibility propagation network of tweets linked with supporting or opposing relations that generates the final evaluation result for news.
Abstract: Fake news spreading in social media severely jeopardizes the veracity of online content. Fortunately, with the interactive and open features of microblogs, skeptical and opposing voices against fake news always arise along with it. The conflicting information, ignored by existing studies, is crucial for news verification. In this paper, we take advantage of this "wisdom of crowds" information to improve news verification by mining conflicting viewpoints in microblogs. First, we discover conflicting viewpoints in news tweets with a topic model method. Based on identified tweets' viewpoints, we then build a credibility propagation network of tweets linked with supporting or opposing relations. Finally, with iterative deduction, the credibility propagation on the network generates the final evaluation result for news. Experiments conducted on a real-world data set show that the news verification performance of our approach significantly outperforms those of the baseline approaches.

318 citations

Trending Questions (1)
Issue of fake news

The paper discusses the issue of fake news on social media and its potential negative impacts on individuals and society.