scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fake News Detection on Social Media: A Data Mining Perspective

01 Sep 2017-Sigkdd Explorations (ACM)-Vol. 19, Iss: 1, pp 22-36
TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Citations
More filters
Journal ArticleDOI
TL;DR: The authors conducted a systematic literature review of empirical research on the machine learning (ML) models for stance detection that were published from January 2015 to October 2022 and analyzed 96 primary studies, which spanned eight categories of ML techniques.
Abstract: Stance detection is an evolving opinion mining research area motivated by the vast increase in the variety and volume of user-generated content. In this regard, considerable research has been recently carried out in the area of stance detection. In this study, we review the different techniques proposed in the literature for stance detection as well as other applications such as rumor veracity detection. Particularly, we conducted a systematic literature review of empirical research on the machine learning (ML) models for stance detection that were published from January 2015 to October 2022. We analyzed 96 primary studies, which spanned eight categories of ML techniques. In this paper, we categorize the analyzed studies according to a taxonomy of six dimensions: approaches, target dependency, applications, modeling, language, and resources. We further classify and analyze the corresponding techniques from each dimension’s perspective and highlight their strengths and weaknesses. The analysis reveals that deep learning models that adopt a mechanism of self-attention have been used more frequently than the other approaches. It is worth noting that emerging ML techniques such as few-shot learning and multitask learning have been used extensively for stance detection. A major conclusion of our analysis is that despite that ML models have shown to be promising in this field, the application of these models in the real world is still limited. Our analysis lists challenges and gaps to be addressed in future research. Furthermore, the taxonomy presented can assist researchers in developing and positioning new techniques for stance detection-related applications.

3 citations

01 Jan 2019
TL;DR: The results of this thesis indicate untrained crowd workers may not be the ideal candidates for modeling complex values in sociotechnical systems, and investigates the technical elements and social context of Google’s Reviewed Claims.
Abstract: In the era of misinformation and machine learning, the fact-checking community is eager to develop automated fact-checking techniques that can detect misinformation and present fact-checks alongside problematic content. This thesis explores the technical elements and social context of one such “claim matching” system, Google’s Reviewed Claims. The Reviewed Claims feature was one of the few user-facing interfaces in the complex socio-technical system between fact-checking organizations, news publishers, Google, and online information seekers. This thesis addresses the following research questions: RQ1: How accurate was Google’s Reviewed Claims feature? RQ2: Is it possible to create a consensus definition for “relevant fact-checks” to enable the development of more successful automated fact-checking systems? RQ3: How do different actors in the fact-checking ecosystem define relevance? I investigate these research questions through a series of methods including qualitative coding, qualitative content analysis, quantitative data analysis, and user studies. To answer RQ1, I qualitatively label the relevance of 118 algorithmically assigned factchecks and find that 21% of fact-checks are not relevant to their assigned article. To address RQ2, I find that three independent raters using a survey are only able to come to “fair-moderate agreement” about whether the algorithmically assigned fact-checks are relevant to the matched articles. A reconciliation process substantially raised their agreement. This indicates that further discussions may create a common understanding of relevance among information seekers. Using raters’ open-ended justification responses, I generated 6 categories of justifications for their explanations. To further evaluate if information seekers shared a common definition of relevance, I asked Amazon Mechanical Turk workers to classify six different algorithmically assigned fact-checks and found that crowd workers were more likely to find the matched content relevant and were unable to agree on the justifications. With regard to RQ3, a sociotechnical analysis finds that the fact-checking ecosystem is fraught with distrust and conflicting incentives between individual actors (news publishers distrust fact-checking organizations and platforms, fact-checking organizations distrust platforms, etc.). Given the distrust among actors, future systems should be interpretable and transparent about their definition of “relevance” as well as the ways in which the claim matching is performed. Fact-checking is dependent on nuance and context, AI is not technically sophisticated enough to account for these variables. As such, human-in-the-loop models seem to be essential to future automated fact-checking approaches. However, the results of this thesis indicate untrained crowd workers may not be the ideal candidates for modeling complex values in sociotechnical systems.

3 citations

Journal ArticleDOI
01 Mar 2022
TL;DR: For example, the authors found that for conservative users, the efficacy of the labels depended on whether the posts were ideologically consistent: algorithmic labels were more effective in reducing the perceived accuracy and believability of fake conservative posts compared to community labels.
Abstract: Hyper-partisan misinformation has become a major public concern. In order to examine what type of misinformation label can mitigate hyper-partisan misinformation sharing on social media, we conducted a 4 (label type: algorithm, community, third-party fact-checker, and no label) X 2 (post ideology: liberal vs. conservative) between-subjects online experiment (N = 1,677) in the context of COVID-19 health information. The results suggest that for liberal users, all labels reduced the perceived accuracy and believability of fake posts regardless of the posts' ideology. In contrast, for conservative users, the efficacy of the labels depended on whether the posts were ideologically consistent: algorithmic labels were more effective in reducing the perceived accuracy and believability of fake conservative posts compared to community labels, whereas all labels were effective in reducing their belief in liberal posts. Our results shed light on the differing effects of various misinformation labels dependent on people's political ideology.

3 citations

Journal ArticleDOI
TL;DR: A comprehensive overview of false news detection can be found in this article , where the authors provide a clarity to problem definition by explaining different types of false information (like fake news, rumor, clickbait, satire, and hoax) with real-life examples.
Abstract: Fake news has become an industry on its own, where users paid to write fake news and create clickbait content to allure the audience. Apparently, the detection of fake news is a crucial problem and several studies have proposed machine-learning-based techniques to combat fake news. Existing surveys present the review of proposed solutions, while this survey presents several aspects that are required to be considered before designing an effective solution. To this aim, we provide a comprehensive overview of false news detection. The survey presents (1) a clarity to problem definition by explaining different types of false information (like fake news, rumor, clickbait, satire, and hoax) with real-life examples, (2) a list of actors involved in spreading false information, (3) actions taken by service providers, (4) a list of publicly available datasets for fake news in three different formats, i.e., texts, images, and videos, (5) a novel three-phase detection model based on the time of detection, (6) four different taxonomies to classify research based on new-fangled viewpoints in order to provide a succinct roadmap for future, and (7) key bibliometric indicators. In a nutshell, the survey focuses on three key aspects represented as the three T’s: Typology of false information, Time of detection, and Taxonomies to classify research. Finally, by reviewing and summarizing several studies on fake news, we outline some potential research directions.

3 citations

Journal ArticleDOI
TL;DR: An external push strategy is suggested that, compared to the early stages of the topic of interest, reinforces the emergence dimensions and leads to a higher level in every dimension for confronting the problem of false and unverified information.
Abstract: The spread of false and unverified information has the potential to inflict damage by harming the reputation of individuals or organisations, shaking financial markets, and influencing crowd decisions in important events. This phenomenon needs to be properly curbed, otherwise it can contaminate other aspects of our social life. In this regard, academia as a key institution against false and unverified information is expected to play a pivotal role. Despite a great deal of research in this arena, the amount of progress by academia is not clear yet. This can lead to misjudgements about the performance of the topic of interest that can ultimately result in wrong science policies regarding academic efforts for quelling false and unverified information. In this research, we address this issue by assessing the readiness of academia in the topic of false and unverified information. To this end, we adopt the emergence framework and measure its dimensions (novelty, growth, coherence, and impact) over more than 21,000 articles published by academia about false and unverified information. Our results show the current body of research has had organic growth so far, which is not promising enough for confronting the problem of false and unverified information. To tackle this problem, we suggest an external push strategy that, compared to the early stages of the topic of interest, reinforces the emergence dimensions and leads to a higher level in every dimension.

3 citations

References
More filters
Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Book ChapterDOI
TL;DR: In this paper, the authors present a critique of expected utility theory as a descriptive model of decision making under risk, and develop an alternative model, called prospect theory, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights.
Abstract: This paper presents a critique of expected utility theory as a descriptive model of decision making under risk, and develops an alternative model, called prospect theory. Choices among risky prospects exhibit several pervasive effects that are inconsistent with the basic tenets of utility theory. In particular, people underweight outcomes that are merely probable in comparison with outcomes that are obtained with certainty. This tendency, called the certainty effect, contributes to risk aversion in choices involving sure gains and to risk seeking in choices involving sure losses. In addition, people generally discard components that are shared by all prospects under consideration. This tendency, called the isolation effect, leads to inconsistent preferences when the same choice is presented in different forms. An alternative theory of choice is developed, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights. The value function is normally concave for gains, commonly convex for losses, and is generally steeper for losses than for gains. Decision weights are generally lower than the corresponding probabilities, except in the range of low prob- abilities. Overweighting of low probabilities may contribute to the attractiveness of both insurance and gambling. EXPECTED UTILITY THEORY has dominated the analysis of decision making under risk. It has been generally accepted as a normative model of rational choice (24), and widely applied as a descriptive model of economic behavior, e.g. (15, 4). Thus, it is assumed that all reasonable people would wish to obey the axioms of the theory (47, 36), and that most people actually do, most of the time. The present paper describes several classes of choice problems in which preferences systematically violate the axioms of expected utility theory. In the light of these observations we argue that utility theory, as it is commonly interpreted and applied, is not an adequate descriptive model and we propose an alternative account of choice under risk. 2. CRITIQUE

35,067 citations

Book ChapterDOI
09 Jan 2004
TL;DR: A theory of intergroup conflict and some preliminary data relating to the theory is presented in this article. But the analysis is limited to the case where the salient dimensions of the intergroup differentiation are those involving scarce resources.
Abstract: This chapter presents an outline of a theory of intergroup conflict and some preliminary data relating to the theory. Much of the work on the social psychology of intergroup relations has focused on patterns of individual prejudices and discrimination and on the motivational sequences of interpersonal interaction. The intensity of explicit intergroup conflicts of interests is closely related in human cultures to the degree of opprobrium attached to the notion of "renegade" or "traitor." The basic and highly reliable finding is that the trivial, ad hoc intergroup categorization leads to in-group favoritism and discrimination against the out-group. Many orthodox definitions of "social groups" are unduly restrictive when applied to the context of intergroup relations. The equation of social competition and intergroup conflict rests on the assumptions concerning an "ideal type" of social stratification in which the salient dimensions of intergroup differentiation are those involving scarce resources.

14,812 citations

Journal ArticleDOI
TL;DR: Cumulative prospect theory as discussed by the authors applies to uncertain as well as to risky prospects with any number of outcomes, and it allows different weighting functions for gains and for losses, and two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting function.
Abstract: We develop a new version of prospect theory that employs cumulative rather than separable decision weights and extends the theory in several respects. This version, called cumulative prospect theory, applies to uncertain as well as to risky prospects with any number of outcomes, and it allows different weighting functions for gains and for losses. Two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting functions. A review of the experimental evidence and the results of a new experiment confirm a distinctive fourfold pattern of risk attitudes: risk aversion for gains and risk seeking for losses of high probability; risk seeking for gains and risk aversion for losses of low probability. Expected utility theory reigned for several decades as the dominant normative and descriptive model of decision making under uncertainty, but it has come under serious question in recent years. There is now general agreement that the theory does not provide an adequate description of individual choice: a substantial body of evidence shows that decision makers systematically violate its basic tenets. Many alternative models have been proposed in response to this empirical challenge (for reviews, see Camerer, 1989; Fishburn, 1988; Machina, 1987). Some time ago we presented a model of choice, called prospect theory, which explained the major violations of expected utility theory in choices between risky prospects with a small number of outcomes (Kahneman and Tversky, 1979; Tversky and Kahneman, 1986). The key elements of this theory are 1) a value function that is concave for gains, convex for losses, and steeper for losses than for gains,

13,433 citations

Trending Questions (1)
Issue of fake news

The paper discusses the issue of fake news on social media and its potential negative impacts on individuals and society.