scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter

TL;DR: This work builds predictive models to classify 130 thousand news posts as suspicious or verified, and predict four sub-types of suspicious news – satire, hoaxes, clickbait and propaganda, and shows that neural network models trained on tweet content and social network interactions outperform lexical models.
Abstract: Pew research polls report 62 percent of U.S. adults get news on social media (Gottfried and Shearer, 2016). In a December poll, 64 percent of U.S. adults said that “made-up news” has caused a “great deal of confusion” about the facts of current events (Barthel et al., 2016). Fabricated stories in social media, ranging from deliberate propaganda to hoaxes and satire, contributes to this confusion in addition to having serious effects on global stability. In this work we build predictive models to classify 130 thousand news posts as suspicious or verified, and predict four sub-types of suspicious news – satire, hoaxes, clickbait and propaganda. We show that neural network models trained on tweet content and social network interactions outperform lexical models. Unlike previous work on deception detection, we find that adding syntax and grammar features to our models does not improve performance. Incorporating linguistic features improves classification results, however, social interaction features are most informative for finer-grained separation between four types of suspicious news posts.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: It is revealed that left-wing and right-wing news share significantly more stylistic similarities than either does with the mainstream, and applications of the results include partisanship detection and pre-screening for semi-automatic fake news detection.
Abstract: This paper reports on a writing style analysis of hyperpartisan (i.e., extremely one-sided) news in connection to fake news. It presents a large corpus of 1,627 articles that were manually fact-checked by professional journalists from BuzzFeed. The articles originated from 9 well-known political publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the hyperpartisan right-wing. In sum, the corpus contains 299 fake news, 97% of which originated from hyperpartisan publishers. We propose and demonstrate a new way of assessing style similarity between text categories via Unmasking---a meta-learning approach originally devised for authorship verification---, revealing that the style of left-wing and right-wing news have a lot more in common than any of the two have with the mainstream. Furthermore, we show that hyperpartisan news can be discriminated well by its style from the mainstream (F1=0.78), as can be satire from both (F1=0.81). Unsurprisingly, style-based fake news detection does not live up to scratch (F1=0.46). Nevertheless, the former results are important to implement pre-screening for fake news detectors.

375 citations

Proceedings ArticleDOI
01 Jul 2018
TL;DR: The authors report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news, showing that 97% of the 299 fake news articles identified are also hyperpartisan.
Abstract: We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus of 1,627 articles from 9 political publishers, three each from the mainstream, the hyperpartisan left, and the hyperpartisan right, have been fact-checked by professional journalists at BuzzFeed: 97% of the 299 fake news articles identified are also hyperpartisan. We show how a style analysis can distinguish hyperpartisan news from the mainstream (F1 = 0.78), and satire from both (F1 = 0.81). But stylometry is no silver bullet as style-based fake news detection does not work (F1 = 0.46). We further reveal that left-wing and right-wing news share significantly more stylistic similarities than either does with the mainstream. This result is robust: it has been confirmed by three different modeling approaches, one of which employs Unmasking in a novel way. Applications of our results include partisanship detection and pre-screening for semi-automatic fake news detection.

341 citations

Journal ArticleDOI
TL;DR: This paper surveys the different approaches to automatic detection of fake news and rumours proposed in the recent literature and provides a comprehensive analysis on the various techniques used to perform rumour and fake news detection.

333 citations

Journal ArticleDOI
TL;DR: A new set of features is presented and the prediction performance of current approaches and features for automatic detection of fake news are measured, revealing interesting findings on the usefulness and importance of features for detecting false news.
Abstract: A large body of recent works has focused on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including source and posts from social media. In addition to exploring the main features proposed in the literature for fake news detection, we present a new set of features and measure the prediction performance of current approaches and features for automatic detection of fake news. Our results reveal interesting findings on the usefulness and importance of features for detecting false news. Finally, we discuss how fake news detection approaches can be used in the practice, highlighting challenges and opportunities.

325 citations


Cites background or methods from "Separating Facts from Fiction: Ling..."

  • ...io/en/dev/), we compute subjectivity and sentiment scores of a text as explored in previous efforts.(4)...

    [...]

  • ...3) Psycholinguistic Features: Linguistic Inquiry and Word Count (LIWC)(8) is a dictionarybased text mining software whose output has been explored in many classification tasks, including fake news detection.(4) We use its latest version (2015) to extract 44 features that capture additional signals of persuasive and biased language....

    [...]

  • ...Not surprisingly, recent research efforts are devoted not only to better comprehend this phenomenon(1) but also to automatize the detection of fake news.(2,3,4) While a fully automated approach for the fake news problem can be quite Digital Object Identifier 10....

    [...]

Journal ArticleDOI
TL;DR: This survey describes the modern-day problem of fake news and, in particular, highlights the technical challenges associated with it and comprehensively compile and summarize characteristic features of available datasets.
Abstract: The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users’ engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions.

280 citations


Cites background from "Separating Facts from Fiction: Ling..."

  • ...A specific variant called Long Short-TermMemory (LSTM) [42], which alleviates some of the training difficulties in RNN, is often used due to the its ability to effectively capture long-range dependencies in the text and has been applied to fake news detection, similarly to the use of convolutional neural networks in several works [91, 117]....

    [...]

References
More filters
Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations


"Separating Facts from Fiction: Ling..." refers methods in this paper

  • ...We train our models for 10 epochs using the ADAM optimization algorithm, and evaluate them using 10 fold crossvalidation (Kingma and Ba, 2014)....

    [...]

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations


"Separating Facts from Fiction: Ling..." refers methods in this paper

  • ...We initialize our embedding layer with pretrained GloVe embeddings (Pennington et al., 2014)....

    [...]

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Abstract: Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

4,876 citations


"Separating Facts from Fiction: Ling..." refers methods in this paper

  • ...For this we rely on the “late fusion” approach that has been shown to be effective in vision tasks (Karpathy et al., 2014; Park et al., 2016)....

    [...]

Journal ArticleDOI
TL;DR: Across 4 studies using multiple methods, liberals consistently showed greater endorsement and use of the Harm/care and Fairness/reciprocity foundations compared to the other 3 foundations, whereas conservatives endorsed and used the 5 foundations more equally.
Abstract: How and why do moral judgments vary across the political spectrum? To test moral foundations theory (J. Haidt & J. Graham, 2007; J. Haidt & C. Joseph, 2004), the authors developed several ways to measure people's use of 5 sets of moral intuitions: Harm/care, Fairness/reciprocity, Ingroup/loyalty, Authority/respect, and Purity/sanctity. Across 4 studies using multiple methods, liberals consistently showed greater endorsement and use of the Harm/care and Fairness/reciprocity foundations compared to the other 3 foundations, whereas conservatives endorsed and used the 5 foundations more equally. This difference was observed in abstract assessments of the moral relevance of foundation-related concerns such as violence or loyalty (Study 1), moral judgments of statements and scenarios (Study 2), "sacredness" reactions to taboo trade-offs (Study 3), and use of foundation-related words in the moral texts of religious sermons (Study 4). These findings help to illuminate the nature and intractability of moral disagreements in the American "culture war."

2,990 citations


"Separating Facts from Fiction: Ling..." refers background in this paper

  • ...Moral foundation cues According to Haidt and Graham (2007); Graham et al. (2009), there is a small number of basic widely supported moral values, and people differ in the way they endorse these values....

    [...]