scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fake News Detection on Social Media: A Data Mining Perspective

01 Sep 2017-Sigkdd Explorations (ACM)-Vol. 19, Iss: 1, pp 22-36
TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Citations
More filters
Proceedings ArticleDOI
19 Jul 2018
TL;DR: An end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events, is proposed.
Abstract: As news reading on social media becomes more and more popular, fake news becomes a major issue concerning the public and government. The fake news can take advantage of multimedia content to mislead readers and get dissemination, which can cause negative effects or even manipulate the public events. One of the unique challenges for fake news detection on social media is how to identify fake news on newly emerged events. Unfortunately, most of the existing approaches can hardly handle this challenge, since they tend to learn event-specific features that can not be transferred to unseen events. In order to address this issue, we propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events. It consists of three main components: the multi-modal feature extractor, the fake news detector, and the event discriminator. The multi-modal feature extractor is responsible for extracting the textual and visual features from posts. It cooperates with the fake news detector to learn the discriminable representation for the detection of fake news. The role of event discriminator is to remove the event-specific features and keep shared features among events. Extensive experiments are conducted on multimedia datasets collected from Weibo and Twitter. The experimental results show our proposed EANN model can outperform the state-of-the-art methods, and learn transferable feature representations.

627 citations

Journal ArticleDOI
TL;DR: A fake news data repository FakeNewsNet is presented, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information, and is discussed for potential applications on fake news study on social media.
Abstract: Social media has become a popular means for people to consume and share the news. At the same time, however, it has also enabled the wide dissemination of fake news, that is, news with intentionally false information, causing significant negative effects on society. To mitigate this problem, the research of fake news detection has recently received a lot of attention. Despite several existing computational solutions on the detection of fake news, the lack of comprehensive and community-driven fake news data sets has become one of major roadblocks. Not only existing data sets are scarce, they do not contain a myriad of features often required in the study such as news content, social context, and spatiotemporal information. Therefore, in this article, to facilitate fake news-related research, we present a fake news data repository FakeNewsNet, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information. We present a comprehensive description of the FakeNewsNet, demonstrate an exploratory analysis of two data sets from different perspectives, and discuss the benefits of the FakeNewsNet for potential applications on fake news study on social media.

577 citations

Journal ArticleDOI
TL;DR: This survey provides a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations, with special attention to the latest generation of DeepFakes.

502 citations

Journal ArticleDOI
TL;DR: To address the spread of misinformation, the frontline healthcare providers should be equipped with the most recent research findings and accurate information, and advanced technologies like natural language processing or data mining approaches should be applied in the detection and removal of online content with no scientific basis from all social media platforms.
Abstract: The coronavirus disease 2019 (COVID-19) pandemic has not only caused significant challenges for health systems all over the globe but also fueled the surge of numerous rumors, hoaxes, and misinformation, regarding the etiology, outcomes, prevention, and cure of the disease. Such spread of misinformation is masking healthy behaviors and promoting erroneous practices that increase the spread of the virus and ultimately result in poor physical and mental health outcomes among individuals. Myriad incidents of mishaps caused by these rumors have been reported globally. To address this issue, the frontline healthcare providers should be equipped with the most recent research findings and accurate information. The mass media, healthcare organization, community-based organizations, and other important stakeholders should build strategic partnerships and launch common platforms for disseminating authentic public health messages. Also, advanced technologies like natural language processing or data mining approaches should be applied in the detection and removal of online content with no scientific basis from all social media platforms. Furthermore, these practices should be controlled with regulatory and law enforcement measures alongside ensuring telemedicine-based services providing accurate information on COVID-19.

474 citations

Journal ArticleDOI
TL;DR: A comprehensive overview of the finding to date relating to fake news is presented, characterized the negative impact of online fake news, and the state-of-the-art in detection methods are characterized.
Abstract: Over the recent years, the growth of online social media has greatly facilitated the way people communicate with each other. Users of online social media share information, connect with other people and stay informed about trending events. However, much recent information appearing on social media is dubious and, in some cases, intended to mislead. Such content is often called fake news. Large amounts of online fake news has the potential to cause serious problems in society. Many point to the 2016 U.S. presidential election campaign as having been influenced by fake news. Subsequent to this election, the term has entered the mainstream vernacular. Moreover it has drawn the attention of industry and academia, seeking to understand its origins, distribution and effects. Of critical interest is the ability to detect when online content is untrue and intended to mislead. This is technically challenging for several reasons. Using social media tools, content is easily generated and quickly spread, leading to a large volume of content to analyse. Online information is very diverse, covering a large number of subjects, which contributes complexity to this task. The truth and intent of any statement often cannot be assessed by computers alone, so efforts must depend on collaboration between humans and technology. For instance, some content that is deemed by experts of being false and intended to mislead are available. While these sources are in limited supply, they can form a basis for such a shared effort. In this survey, we present a comprehensive overview of the finding to date relating to fake news. We characterize the negative impact of online fake news, and the state-of-the-art in detection methods. Many of these rely on identifying features of the users, content, and context that indicate misinformation. We also study existing datasets that have been used for classifying fake news. Finally, we propose promising research directions for online fake news analysis.

449 citations

References
More filters
Proceedings Article
21 Apr 2015
TL;DR: CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements designed to bridge the gap between machine and human computation in online information credibility in fields such as social science, data mining and health.
Abstract: Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ judgements of its accuracy. In this paper we present CREDBANK, a corpus designed to bridge this gap by systematically combining machine and human computation. Specifically, CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements. It is based on the real-time tracking of more than 1 billion streaming tweets over a period of more than three months, computational summarizations of those tweets, and intelligent routings of the tweet streams to human annotators — within a few hours of those events unfolding on Twitter. In total CREDBANK comprises more than 60 million tweets grouped into 1049 real-world events, each annotated by 30 human annotators. As an example, with CREDBANK one can quickly calculate that roughly 24% of the events in the global tweet stream are not perceived as credible. We have made CREDBANK publicly available, and hope it will enable new research questions related to online information credibility in fields such as social science, data mining and health.

227 citations

Proceedings ArticleDOI
18 Aug 2016
TL;DR: Wang et al. as mentioned in this paper proposed clickbait detection and personalized blocking approaches to detect clickbaits and then build a browser extension which warns the readers of different media sites about the possibility of being baited by such headlines.
Abstract: Most of the online news media outlets rely heavily on the revenues generated from the clicks made by their readers, and due to the presence of numerous such outlets, they need to compete with each other for reader attention To attract the readers to click on an article and subsequently visit the media site, the outlets often come up with catchy headlines accompanying the article links, which lure the readers to click on the link Such headlines are known as Clickbaits While these baits may trick the readers into clicking, in the long-run, clickbaits usually don't live up to the expectation of the readers, and leave them disappointed In this work, we attempt to automatically detect clickbaits and then build a browser extension which warns the readers of different media sites about the possibility of being baited by such headlines The extension also offers each reader an option to block clickbaits she doesn't want to see Then, using such reader choices, the extension automatically blocks similar clickbaits during her future visits We run extensive offline and online experiments across multiple media sites and find that the proposed clickbait detection and the personalized blocking approaches perform very well achieving 93% accuracy in detecting and 89% accuracy in blocking clickbaits

225 citations

Journal ArticleDOI
TL;DR: Face2Face as mentioned in this paper is an approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video) by animating the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion.
Abstract: Face2Face is an approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time. This live setup has also been shown at SIGGRAPH Emerging Technologies 2016, by Thies et al. where it won the Best in Show Award.

220 citations

Journal ArticleDOI
TL;DR: It was demonstrated that perceived realism of fake news is stronger among individuals with high Exposure to fake news and low exposure to hard news than among those with high exposure to both fake and hard news.
Abstract: This research assesses possible associations between viewing fake news (i.e., political satire) and attitudes of inefficacy, alienation, and cynicism toward political candidates. Using survey data collected during the 2006 Israeli election campaign, the study provides evidence for an indirect positive effect of fake news viewing in fostering the feelings of inefficacy, alienation, and cynicism, through the mediator variable of perceived realism of fake news. Within this process, hard news viewing serves as a moderator of the association between viewing fake news and their perceived realism. It was also demonstrated that perceived realism of fake news is stronger among individuals with high exposure to fake news and low exposure to hard news than among those with high exposure to both fake and hard news. Overall, this study contributes to the scientific knowledge regarding the influence of the interaction between various types of media use on political effects.

212 citations

Book ChapterDOI
12 Sep 2006
TL;DR: The authors proposed a variable-length n-gram approach inspired by previous work for selecting variable length word sequences for authorship identification, using a subset of the new Reuters corpus, consisting of texts on the same topic by 50 different authors.
Abstract: Automatic authorship identification offers a valuable tool for supporting crime investigation and security. It can be seen as a multi-class, single-label text categorization task. Character n-grams are a very successful approach to represent text for stylistic purposes since they are able to capture nuances in lexical, syntactical, and structural level. So far, character n-grams of fixed length have been used for authorship identification. In this paper, we propose a variable-length n-gram approach inspired by previous work for selecting variable-length word sequences. Using a subset of the new Reuters corpus, consisting of texts on the same topic by 50 different authors, we show that the proposed approach is at least as effective as information gain for selecting the most significant n-grams although the feature sets produced by the two methods have few common members. Moreover, we explore the significance of digits for distinguishing between authors showing that an increase in performance can be achieved using simple text pre-processing.

210 citations

Trending Questions (1)
Issue of fake news

The paper discusses the issue of fake news on social media and its potential negative impacts on individuals and society.