scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fake News Detection on Social Media: A Data Mining Perspective

01 Sep 2017-Sigkdd Explorations (ACM)-Vol. 19, Iss: 1, pp 22-36
TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Citations
More filters
Proceedings ArticleDOI
19 Jul 2018
TL;DR: An end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events, is proposed.
Abstract: As news reading on social media becomes more and more popular, fake news becomes a major issue concerning the public and government. The fake news can take advantage of multimedia content to mislead readers and get dissemination, which can cause negative effects or even manipulate the public events. One of the unique challenges for fake news detection on social media is how to identify fake news on newly emerged events. Unfortunately, most of the existing approaches can hardly handle this challenge, since they tend to learn event-specific features that can not be transferred to unseen events. In order to address this issue, we propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events. It consists of three main components: the multi-modal feature extractor, the fake news detector, and the event discriminator. The multi-modal feature extractor is responsible for extracting the textual and visual features from posts. It cooperates with the fake news detector to learn the discriminable representation for the detection of fake news. The role of event discriminator is to remove the event-specific features and keep shared features among events. Extensive experiments are conducted on multimedia datasets collected from Weibo and Twitter. The experimental results show our proposed EANN model can outperform the state-of-the-art methods, and learn transferable feature representations.

627 citations

Journal ArticleDOI
TL;DR: A fake news data repository FakeNewsNet is presented, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information, and is discussed for potential applications on fake news study on social media.
Abstract: Social media has become a popular means for people to consume and share the news. At the same time, however, it has also enabled the wide dissemination of fake news, that is, news with intentionally false information, causing significant negative effects on society. To mitigate this problem, the research of fake news detection has recently received a lot of attention. Despite several existing computational solutions on the detection of fake news, the lack of comprehensive and community-driven fake news data sets has become one of major roadblocks. Not only existing data sets are scarce, they do not contain a myriad of features often required in the study such as news content, social context, and spatiotemporal information. Therefore, in this article, to facilitate fake news-related research, we present a fake news data repository FakeNewsNet, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information. We present a comprehensive description of the FakeNewsNet, demonstrate an exploratory analysis of two data sets from different perspectives, and discuss the benefits of the FakeNewsNet for potential applications on fake news study on social media.

577 citations

Journal ArticleDOI
TL;DR: This survey provides a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations, with special attention to the latest generation of DeepFakes.

502 citations

Journal ArticleDOI
TL;DR: To address the spread of misinformation, the frontline healthcare providers should be equipped with the most recent research findings and accurate information, and advanced technologies like natural language processing or data mining approaches should be applied in the detection and removal of online content with no scientific basis from all social media platforms.
Abstract: The coronavirus disease 2019 (COVID-19) pandemic has not only caused significant challenges for health systems all over the globe but also fueled the surge of numerous rumors, hoaxes, and misinformation, regarding the etiology, outcomes, prevention, and cure of the disease. Such spread of misinformation is masking healthy behaviors and promoting erroneous practices that increase the spread of the virus and ultimately result in poor physical and mental health outcomes among individuals. Myriad incidents of mishaps caused by these rumors have been reported globally. To address this issue, the frontline healthcare providers should be equipped with the most recent research findings and accurate information. The mass media, healthcare organization, community-based organizations, and other important stakeholders should build strategic partnerships and launch common platforms for disseminating authentic public health messages. Also, advanced technologies like natural language processing or data mining approaches should be applied in the detection and removal of online content with no scientific basis from all social media platforms. Furthermore, these practices should be controlled with regulatory and law enforcement measures alongside ensuring telemedicine-based services providing accurate information on COVID-19.

474 citations

Journal ArticleDOI
TL;DR: A comprehensive overview of the finding to date relating to fake news is presented, characterized the negative impact of online fake news, and the state-of-the-art in detection methods are characterized.
Abstract: Over the recent years, the growth of online social media has greatly facilitated the way people communicate with each other. Users of online social media share information, connect with other people and stay informed about trending events. However, much recent information appearing on social media is dubious and, in some cases, intended to mislead. Such content is often called fake news. Large amounts of online fake news has the potential to cause serious problems in society. Many point to the 2016 U.S. presidential election campaign as having been influenced by fake news. Subsequent to this election, the term has entered the mainstream vernacular. Moreover it has drawn the attention of industry and academia, seeking to understand its origins, distribution and effects. Of critical interest is the ability to detect when online content is untrue and intended to mislead. This is technically challenging for several reasons. Using social media tools, content is easily generated and quickly spread, leading to a large volume of content to analyse. Online information is very diverse, covering a large number of subjects, which contributes complexity to this task. The truth and intent of any statement often cannot be assessed by computers alone, so efforts must depend on collaboration between humans and technology. For instance, some content that is deemed by experts of being false and intended to mislead are available. While these sources are in limited supply, they can form a basis for such a shared effort. In this survey, we present a comprehensive overview of the finding to date relating to fake news. We characterize the negative impact of online fake news, and the state-of-the-art in detection methods. Many of these rely on identifying features of the users, content, and context that indicate misinformation. We also study existing datasets that have been used for classifying fake news. Finally, we propose promising research directions for online fake news analysis.

449 citations

References
More filters
Journal ArticleDOI
TL;DR: The authors proposed a simple stance detection system that outperforms submissions from all 19 teams that participated in the SemEval-2016 shared task and showed that although knowing the sentiment expressed by a tweet is beneficial for stance classification, it alone is not sufficient.
Abstract: We can often detect from a person’s utterances whether he or she is in favor of or against a given target entity—one’s stance toward the target. However, a person may express the same stance toward a target by using negative or positive language. Here for the first time we present a dataset of tweet–target pairs annotated for both stance and sentiment. The targets may or may not be referred to in the tweets, and they may or may not be the target of opinion in the tweets. Partitions of this dataset were used as training and test sets in a SemEval-2016 shared task competition. We propose a simple stance detection system that outperforms submissions from all 19 teams that participated in the shared task. Additionally, access to both stance and sentiment annotations allows us to explore several research questions. We show that although knowing the sentiment expressed by a tweet is beneficial for stance classification, it alone is not sufficient. Finally, we use additional unlabeled data through distant supervision techniques and word embeddings to further improve stance classification.

289 citations

Proceedings ArticleDOI
11 Apr 2016
TL;DR: Haxy is introduced, a platform for the collection, detection, and analysis of online misinformation and its related fact-checking efforts, and a preliminary analysis of a sample of public tweets containing both fake news and fact checking is presented.
Abstract: Massive amounts of misinformation have been observed to spread in uncontrolled fashion across social media. Examples include rumors, hoaxes, fake news, and conspiracy theories. At the same time, several journalistic organizations devote significant efforts to high-quality fact checking of online claims. The resulting information cascades contain instances of both accurate and inaccurate information, unfold over multiple time scales, and often reach audiences of considerable size. All these factors pose challenges for the study of the social dynamics of online news sharing. Here we introduce Hoaxy, a platform for the collection, detection, and analysis of online misinformation and its related fact-checking efforts. We discuss the design of the platform and present a preliminary analysis of a sample of public tweets containing both fake news and fact checking. We find that, in the aggregate, the sharing of fact-checking content typically lags that of misinformation by 10-20 hours. Moreover, fake news are dominated by very active users, while fact checking is a more grass-roots activity. With the increasing risks connected to massive online misinformation, social news observatories have the potential to help researchers, journalists, and the general public understand the dynamics of real and fake news sharing.

280 citations

Proceedings ArticleDOI
20 May 2012
TL;DR: It is shown that using a large feature set, it is possible to distinguish regular documents from deceptive documents with 96.6% accuracy (F-measure) and an analysis of linguistic features that can be modified to hide writing style is presented.
Abstract: In digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. While stylometry techniques can identify authors with high accuracy in non-adversarial scenarios, their accuracy is reduced to random guessing when faced with authors who intentionally obfuscate their writing style or attempt to imitate that of another author. While these results are good for privacy, they raise concerns about fraud. We argue that some linguistic features change when people hide their writing style and by identifying those features, stylistic deception can be recognized. The major contribution of this work is a method for detecting stylistic deception in written documents. We show that using a large feature set, it is possible to distinguish regular documents from deceptive documents with 96.6% accuracy (F-measure). We also present an analysis of linguistic features that can be modified to hide writing style.

276 citations

Proceedings ArticleDOI
01 Jan 2012
TL;DR: A credibility analysis approach enhanced with event graph-based optimization to solve the problem of automatically assessing the credibility of popular Twitter events and shows that its methods are significantly more accurate than the decision tree classifier approach.
Abstract: Though Twitter acts as a realtime news source with people acting as sensors and sending event updates from all over the world, rumors spread via Twitter have been noted to cause considerable damage. Given a set of popular Twitter events along with related users and tweets, we study the problem of automatically assessing the credibility of such events. We propose a credibility analysis approach enhanced with event graph-based optimization to solve the problem. First we experiment by performing PageRanklike credibility propagation on a multi-typed network consisting of events, tweets, and users. Further, within each iteration, we enhance the basic trust analysis by updating event credibility scores using regularization on a new graph of events. Our experiments using events extracted from two tweet feed datasets, each with millions of tweets show that our event graph optimization approach outperforms the basic credibility analysis approach. Also, our methods are significantly more accurate (∼86%) than the decision tree classifier approach (∼72%).

273 citations

Journal ArticleDOI
TL;DR: In this paper, the use of forward-referring headlines in online news journalism by conducting an analysis of 100,000 headlines from 10 different Danish news websites has been studied and the results show that commercialization and tabloidization seem to lead to a recurrent use of forwarding-reference in Danish online news headlines.

257 citations

Trending Questions (1)
Issue of fake news

The paper discusses the issue of fake news on social media and its potential negative impacts on individuals and society.