scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fake News Detection on Social Media: A Data Mining Perspective

01 Sep 2017-Sigkdd Explorations (ACM)-Vol. 19, Iss: 1, pp 22-36
TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Citations
More filters
Journal ArticleDOI
TL;DR: The authors investigated how exposure to and trust in information sources, and anxiety and depression, are associated with conspiracy and misinformation beliefs in eight countries/regions (Belgium, Canada, England, Philippines, Hong Kong, New Zealand, United States, Switzerland) during the COVID-19 pandemic.
Abstract: While COVID-19 spreads aggressively and rapidly across the globe, many societies have also witnessed the spread of other viral phenomena like misinformation, conspiracy theories, and general mass suspicions about what is really going on. This study investigates how exposure to and trust in information sources, and anxiety and depression, are associated with conspiracy and misinformation beliefs in eight countries/regions (Belgium, Canada, England, Philippines, Hong Kong, New Zealand, United States, Switzerland) during the COVID-19 pandemic. Data were collected in an online survey fielded from May 29, 2020 to June 12, 2020, resulting in a multinational representative sample of 8,806 adult respondents. Results indicate that greater exposure to traditional media (television, radio, newspapers) is associated with lower conspiracy and misinformation beliefs, while exposure to politicians and digital media and personal contacts are associated with greater conspiracy and misinformation beliefs. Exposure to health experts is associated with lower conspiracy beliefs only. Higher feelings of depression are also associated with greater conspiracy and misinformation beliefs. We also found relevant group- and country differences. We discuss the implications of these results.

118 citations

Journal ArticleDOI
TL;DR: This article proposes a preventative approach using a novel blockchain-based solution suited for IoFMT incorporated with a gamification component, and uses concepts of a customized Proof-of-Authority consensus algorithm, along with a weighted-ranking algorithm, serving as an incentive mechanism in the gamifying component to determine the integrity of fake news.
Abstract: The concept of Fake Media or Internet of Fake Media Things (IoFMT) has emerged in different domains of digital society such as politics, news, and social media. Due to the integrity of the media being compromised quite frequently, revolutionary changes must be taken to avoid further and more widespread IoFMT. With today’s advancements in Artificial Intelligence (AI) and Deep Learning (DL), such compromises may be profoundly limited. Providing proof of authenticity to outline the authorship and integrity for digital content has been a pressing need. Blockchain, a promising new decentralized secure platform, has been advocated to help combat the authenticity aspect of fake media in a context where resistance to the modification of data is important. Although some methods around blockchain have been proposed to take on authentication problems, most current studies are built on unrealistic assumptions with the after-the-incident type of mechanisms. In this article, we propose a preventative approach using a novel blockchain-based solution suited for IoFMT incorporated with a gamification component. More specifically, the proposed approach uses concepts of a customized Proof-of-Authority consensus algorithm, along with a weighted-ranking algorithm, serving as an incentive mechanism in the gamification component to determine the integrity of fake news. Although our approach focuses on fake news, the framework could be very well extended for other types of digital content as well. A proof of concept implementation is developed to outline the advantage of the proposed solution.

113 citations

Journal ArticleDOI
07 Nov 2019
TL;DR: This paper studies how scientific papers represent human research subjects in HCML, and shows how these five discourses create paradoxical subject and object representations of the human, which may inadvertently risk dehumanization.
Abstract: "Human-centered machine learning" (HCML) combines human insights and domain expertise with data-driven predictions to answer societal questions. This area's inherent interdisciplinarity causes tensions in the obligations researchers have to the humans whose data they use. This paper studies how scientific papers represent human research subjects in HCML. Using mental health status prediction on social media as a case study, we conduct thematic discourse analysis on 55 papers to examine these representations. We identify five discourses that weave a complex narrative of who the human subject is in this research: Disorder/Patient, Social Media, Scientific, Data/Machine Learning, and Person. We show how these five discourses create paradoxical subject and object representations of the human, which may inadvertently risk dehumanization. We also discuss the tensions and impacts of interdisciplinary research; the risks of this work to scientific rigor, online communities, and mental health; and guidelines for stronger HCML research in this nascent area.

110 citations

Proceedings ArticleDOI
28 Aug 2018
TL;DR: A weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets, and shows that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1 score of up to 0.9.
Abstract: The problem of automatic detection of fake news in social media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or non-fake news is an expensive and tedious endeavor. In this paper, we discuss a weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets. During collection, we automatically label tweets by their source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. We then use that classifier for a different classification target, i.e., the classification of fake and non-fake tweets. Although the labels are not accurate according to the new classification target (not all tweets by an untrustworthy source need to be fake news, and vice versa), we show that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1 score of up to 0.9.

106 citations

Proceedings Article
01 Aug 2018
TL;DR: This paper introduces approaches to combine information from multiple sources and to discriminate between different degrees of fakeness, and proposes a Multi-source Multi-class Fake news Detection framework MMFD, which combines automated feature extraction, multi-source fusion and automated degrees offakeness detection into a coherent and interpretable model.
Abstract: Fake news spreading through media outlets poses a real threat to the trustworthiness of information and detecting fake news has attracted increasing attention in recent years. Fake news is typically written intentionally to mislead readers, which determines that fake news detection merely based on news content is tremendously challenging. Meanwhile, fake news could contain true evidence to mock true news and presents different degrees of fakeness, which further exacerbates the detection difficulty. On the other hand, the spread of fake news produces various types of data from different perspectives. These multiple sources provide rich contextual information about fake news and offer unprecedented opportunities for advanced fake news detection. In this paper, we study fake news detection with different degrees of fakeness by integrating multiple sources. In particular, we introduce approaches to combine information from multiple sources and to discriminate between different degrees of fakeness, and propose a Multi-source Multi-class Fake news Detection framework MMFD, which combines automated feature extraction, multi-source fusion and automated degrees of fakeness detection into a coherent and interpretable model. Experimental results on the real-world data demonstrate the effectiveness of the proposed framework and extensive experiments are further conducted to understand the working of the proposed framework.

106 citations

References
More filters
Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Book ChapterDOI
TL;DR: In this paper, the authors present a critique of expected utility theory as a descriptive model of decision making under risk, and develop an alternative model, called prospect theory, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights.
Abstract: This paper presents a critique of expected utility theory as a descriptive model of decision making under risk, and develops an alternative model, called prospect theory. Choices among risky prospects exhibit several pervasive effects that are inconsistent with the basic tenets of utility theory. In particular, people underweight outcomes that are merely probable in comparison with outcomes that are obtained with certainty. This tendency, called the certainty effect, contributes to risk aversion in choices involving sure gains and to risk seeking in choices involving sure losses. In addition, people generally discard components that are shared by all prospects under consideration. This tendency, called the isolation effect, leads to inconsistent preferences when the same choice is presented in different forms. An alternative theory of choice is developed, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights. The value function is normally concave for gains, commonly convex for losses, and is generally steeper for losses than for gains. Decision weights are generally lower than the corresponding probabilities, except in the range of low prob- abilities. Overweighting of low probabilities may contribute to the attractiveness of both insurance and gambling. EXPECTED UTILITY THEORY has dominated the analysis of decision making under risk. It has been generally accepted as a normative model of rational choice (24), and widely applied as a descriptive model of economic behavior, e.g. (15, 4). Thus, it is assumed that all reasonable people would wish to obey the axioms of the theory (47, 36), and that most people actually do, most of the time. The present paper describes several classes of choice problems in which preferences systematically violate the axioms of expected utility theory. In the light of these observations we argue that utility theory, as it is commonly interpreted and applied, is not an adequate descriptive model and we propose an alternative account of choice under risk. 2. CRITIQUE

35,067 citations

Book ChapterDOI
09 Jan 2004
TL;DR: A theory of intergroup conflict and some preliminary data relating to the theory is presented in this article. But the analysis is limited to the case where the salient dimensions of the intergroup differentiation are those involving scarce resources.
Abstract: This chapter presents an outline of a theory of intergroup conflict and some preliminary data relating to the theory. Much of the work on the social psychology of intergroup relations has focused on patterns of individual prejudices and discrimination and on the motivational sequences of interpersonal interaction. The intensity of explicit intergroup conflicts of interests is closely related in human cultures to the degree of opprobrium attached to the notion of "renegade" or "traitor." The basic and highly reliable finding is that the trivial, ad hoc intergroup categorization leads to in-group favoritism and discrimination against the out-group. Many orthodox definitions of "social groups" are unduly restrictive when applied to the context of intergroup relations. The equation of social competition and intergroup conflict rests on the assumptions concerning an "ideal type" of social stratification in which the salient dimensions of intergroup differentiation are those involving scarce resources.

14,812 citations

Journal ArticleDOI
TL;DR: Cumulative prospect theory as discussed by the authors applies to uncertain as well as to risky prospects with any number of outcomes, and it allows different weighting functions for gains and for losses, and two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting function.
Abstract: We develop a new version of prospect theory that employs cumulative rather than separable decision weights and extends the theory in several respects. This version, called cumulative prospect theory, applies to uncertain as well as to risky prospects with any number of outcomes, and it allows different weighting functions for gains and for losses. Two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting functions. A review of the experimental evidence and the results of a new experiment confirm a distinctive fourfold pattern of risk attitudes: risk aversion for gains and risk seeking for losses of high probability; risk seeking for gains and risk aversion for losses of low probability. Expected utility theory reigned for several decades as the dominant normative and descriptive model of decision making under uncertainty, but it has come under serious question in recent years. There is now general agreement that the theory does not provide an adequate description of individual choice: a substantial body of evidence shows that decision makers systematically violate its basic tenets. Many alternative models have been proposed in response to this empirical challenge (for reviews, see Camerer, 1989; Fishburn, 1988; Machina, 1987). Some time ago we presented a model of choice, called prospect theory, which explained the major violations of expected utility theory in choices between risky prospects with a small number of outcomes (Kahneman and Tversky, 1979; Tversky and Kahneman, 1986). The key elements of this theory are 1) a value function that is concave for gains, convex for losses, and steeper for losses than for gains,

13,433 citations

Trending Questions (1)
Issue of fake news

The paper discusses the issue of fake news on social media and its potential negative impacts on individuals and society.