scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fake News Detection on Social Media: A Data Mining Perspective

01 Sep 2017-Sigkdd Explorations (ACM)-Vol. 19, Iss: 1, pp 22-36
TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
Citations
More filters
Posted Content
TL;DR: A novel dataset that can be used to prioritize check-worthy posts from multi-media content in Hindi, which is unique in its focus on user generated content, language and multi-modality.
Abstract: Volume of content and misinformation on social media is rapidly increasing. There is a need for systems that can support fact checkers by prioritizing content that needs to be fact checked. Prior research on prioritizing content for fact-checking has focused on news media articles, predominantly in English language. Increasingly, misinformation is found in user-generated content. In this paper we present a novel dataset that can be used to prioritize check-worthy posts from multi-media content in Hindi. It is unique in its 1) focus on user generated content, 2) language and 3) accommodation of multi-modality in social media posts. In addition, we also provide metadata for each post such as number of shares and likes of the post on ShareChat, a popular Indian social media platform, that allows for correlative analysis around virality and misinformation. The data is accessible on Zenodo (this https URL) under Creative Commons Attribution License (CC BY 4.0).

2 citations

Proceedings ArticleDOI
07 Apr 2022
TL;DR: In this article , the authors used Logistic Re-Gression (LR) and Support Vector Machine (SVM) algorithm to detect fake news and found that LR algorithm appears to be more accurate than SVM algorithm in identifying whether the news is fake or not.
Abstract: To perform accurate Fake News Detection using Logistic Re gression (LR) and compare textual property accuracy wi th Support Vector Machine (SVM) algorithm. Materials and Method: The analysis for fake news detection in this proposed research was done us i ng machine learning algo rithms such a s the LR al go rith m $\boldsymbol{(\mathrm{N}=311)}$ and SVM algorithm $\boldsymbol{(\mathrm{N}=311)}$ with G power 80 % and alpha value 0.05. Results: The accuracy offake news was analyzed using the LR and SVMalgorithms. The accuracy of the LR algorithm appears to be 95.12 %, and the accuracy of the SVM algorithm appears to be 91.68 %. With a significance value of 0.079 for accuracy and 0.125 for precision, there is a statistically significant value between th e s am pl e groups. Conclusion: The LR algorithm appears to be more accurate th an the SVM algo rithm in identifying whether the news is fake or not.

2 citations

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper investigated the propagation of two distinct narratives, namely, conspiracy information and scientific information, and found that conspiracy information triggers larger cascades, involves more users and generations, persists longer, and is more viral and bursty than science information.
Abstract: With the emergence and rapid proliferation of social media platforms and social networking sites, recent years have witnessed a surge of misinformation spreading in our daily life. Drawing on a large-scale dataset which covers more than 1.4M posts and 18M comments from an online social media platform, we investigate the propagation of two distinct narratives-(i) conspiracy information, whose claims are generally unsubstantiated and thus referred as misinformation to some extent, and (ii) scientific information, whose origins are generally readily identifiable and verifiable. We find that conspiracy cascades tend to propagate in a multigenerational branching process whereas science cascades are more likely to grow in a breadth-first manner. Specifically, conspiracy information triggers larger cascades, involves more users and generations, persists longer, and is more viral and bursty than science information. Content analysis reveals that conspiracy cascades contain more negative words and emotional words which convey anger, fear, disgust, surprise and trust. We also find that conspiracy cascades are much more concerned with political and controversial topics. After applying machine learning models, we achieve an AUC score of nearly 90% in discriminating conspiracy from science narratives using the constructed features. We further investigate user's role during the growth of cascades. In contrast with previous assumption that misinformation is primarily driven by a small set of users, we find that conspiracy cascades are more likely to be controlled by a broader set of users than science cascades, imposing new challenges on the management of misinformation. Although political affinity is thought to affect the consumption of misinformation, there is very little evidence that political orientation of the information source plays a role during the propagation of conspiracy information; Instead, we find that conspiracy information from media outlets with left or right orientation triggers smaller cascades and is less viral than information from online social media platforms (e.g., Twitter and Imgur) whose political orientations are unclear. Our study provides complementing evidence to current misinformation research and has practical policy implications to stem the propagation and mitigate the influence of misinformation online.

2 citations

Proceedings ArticleDOI
04 Jul 2022
TL;DR: An initial proof of concept of a deep learning model for identifying fake news spreaders in social media, focusing not only on the characteristics of the shared content but also on user interactions and the resulting content propagation tree structures is presented.
Abstract: Even though the Internet and social media are usually safe and enjoyable, communication through social media also bears risks. For more than ten years, there have been concerns regarding the manipulation of public opinion through the social Web. In particular, misinformation spreading has proven effective in influencing people, their beliefs and behaviors, from swaying opinions on elections to having direct consequences on health during the COVID-19 pandemic. Most techniques in the literature focus on identifying the individual pieces of misinformation or fake news based on a set of stylistic, content-derived features, user profiles or sharing statistics. Recently, those methods have been extended to identify spreaders. However, they are not enough to effectively detect either fake content or the users spreading it. In this context, this paper presents an initial proof of concept of a deep learning model for identifying fake news spreaders in social media, focusing not only on the characteristics of the shared content but also on user interactions and the resulting content propagation tree structures. Although preliminary, an experimental evaluation over COVID-related data showed promising results, significantly outperforming other alternatives in the literature.

2 citations

Proceedings ArticleDOI
01 Jan 2023
TL;DR: In this paper, the authors interview fact-checkers, journalists, trust and safety specialists, researchers, and analysts who work in different organizations tackling problematic information across the world, and use their findings to derive a cybersecurity-inspired framework to characterize the threat of disinformation.
Abstract: —Disinformation can be used to sway public opinion toward a certain political or economic direction, adversely impact public health, and mobilize groups to engage in violent disobedi- ence. A major challenge in mitigation is scarcity: disinformation is widespread but its mitigators are few. In this work, we interview fact-checkers, journalists, trust and safety specialists, researchers, and analysts who work in different organizations tackling problematic information across the world. From this interview study, we develop an understanding of the reality of combating disinformation across domains, and we use our findings to derive a cybersecurity-inspired framework to characterize the threat of disinformation. While related work has developed similar frameworks for conducting analyses and assessment, our work is distinct in providing the means to thoroughly consider the attacker side, their tactics and approaches. We demonstrate the applicability of our framework on several examples of recent disinformation campaigns.

2 citations

References
More filters
Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Book ChapterDOI
TL;DR: In this paper, the authors present a critique of expected utility theory as a descriptive model of decision making under risk, and develop an alternative model, called prospect theory, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights.
Abstract: This paper presents a critique of expected utility theory as a descriptive model of decision making under risk, and develops an alternative model, called prospect theory. Choices among risky prospects exhibit several pervasive effects that are inconsistent with the basic tenets of utility theory. In particular, people underweight outcomes that are merely probable in comparison with outcomes that are obtained with certainty. This tendency, called the certainty effect, contributes to risk aversion in choices involving sure gains and to risk seeking in choices involving sure losses. In addition, people generally discard components that are shared by all prospects under consideration. This tendency, called the isolation effect, leads to inconsistent preferences when the same choice is presented in different forms. An alternative theory of choice is developed, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights. The value function is normally concave for gains, commonly convex for losses, and is generally steeper for losses than for gains. Decision weights are generally lower than the corresponding probabilities, except in the range of low prob- abilities. Overweighting of low probabilities may contribute to the attractiveness of both insurance and gambling. EXPECTED UTILITY THEORY has dominated the analysis of decision making under risk. It has been generally accepted as a normative model of rational choice (24), and widely applied as a descriptive model of economic behavior, e.g. (15, 4). Thus, it is assumed that all reasonable people would wish to obey the axioms of the theory (47, 36), and that most people actually do, most of the time. The present paper describes several classes of choice problems in which preferences systematically violate the axioms of expected utility theory. In the light of these observations we argue that utility theory, as it is commonly interpreted and applied, is not an adequate descriptive model and we propose an alternative account of choice under risk. 2. CRITIQUE

35,067 citations

Book ChapterDOI
09 Jan 2004
TL;DR: A theory of intergroup conflict and some preliminary data relating to the theory is presented in this article. But the analysis is limited to the case where the salient dimensions of the intergroup differentiation are those involving scarce resources.
Abstract: This chapter presents an outline of a theory of intergroup conflict and some preliminary data relating to the theory. Much of the work on the social psychology of intergroup relations has focused on patterns of individual prejudices and discrimination and on the motivational sequences of interpersonal interaction. The intensity of explicit intergroup conflicts of interests is closely related in human cultures to the degree of opprobrium attached to the notion of "renegade" or "traitor." The basic and highly reliable finding is that the trivial, ad hoc intergroup categorization leads to in-group favoritism and discrimination against the out-group. Many orthodox definitions of "social groups" are unduly restrictive when applied to the context of intergroup relations. The equation of social competition and intergroup conflict rests on the assumptions concerning an "ideal type" of social stratification in which the salient dimensions of intergroup differentiation are those involving scarce resources.

14,812 citations

Journal ArticleDOI
TL;DR: Cumulative prospect theory as discussed by the authors applies to uncertain as well as to risky prospects with any number of outcomes, and it allows different weighting functions for gains and for losses, and two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting function.
Abstract: We develop a new version of prospect theory that employs cumulative rather than separable decision weights and extends the theory in several respects. This version, called cumulative prospect theory, applies to uncertain as well as to risky prospects with any number of outcomes, and it allows different weighting functions for gains and for losses. Two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting functions. A review of the experimental evidence and the results of a new experiment confirm a distinctive fourfold pattern of risk attitudes: risk aversion for gains and risk seeking for losses of high probability; risk seeking for gains and risk aversion for losses of low probability. Expected utility theory reigned for several decades as the dominant normative and descriptive model of decision making under uncertainty, but it has come under serious question in recent years. There is now general agreement that the theory does not provide an adequate description of individual choice: a substantial body of evidence shows that decision makers systematically violate its basic tenets. Many alternative models have been proposed in response to this empirical challenge (for reviews, see Camerer, 1989; Fishburn, 1988; Machina, 1987). Some time ago we presented a model of choice, called prospect theory, which explained the major violations of expected utility theory in choices between risky prospects with a small number of outcomes (Kahneman and Tversky, 1979; Tversky and Kahneman, 1986). The key elements of this theory are 1) a value function that is concave for gains, convex for losses, and steeper for losses than for gains,

13,433 citations

Trending Questions (1)
Issue of fake news

The paper discusses the issue of fake news on social media and its potential negative impacts on individuals and society.