scispace - formally typeset
Open AccessProceedings ArticleDOI

A Stylometric Inquiry into Hyperpartisan and Fake News

TLDR
The authors report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news, showing that 97% of the 299 fake news articles identified are also hyperpartisan.
Abstract
We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus of 1,627 articles from 9 political publishers, three each from the mainstream, the hyperpartisan left, and the hyperpartisan right, have been fact-checked by professional journalists at BuzzFeed: 97% of the 299 fake news articles identified are also hyperpartisan. We show how a style analysis can distinguish hyperpartisan news from the mainstream (F1 = 0.78), and satire from both (F1 = 0.81). But stylometry is no silver bullet as style-based fake news detection does not work (F1 = 0.46). We further reveal that left-wing and right-wing news share significantly more stylistic similarities than either does with the mainstream. This result is robust: it has been confirmed by three different modeling approaches, one of which employs Unmasking in a novel way. Applications of our results include partisanship detection and pre-screening for semi-automatic fake news detection.

read more

Content maybe subject to copyright    Report

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 231–240
Melbourne, Australia, July 15 - 20, 2018.
c
2018 Association for Computational Linguistics
231
A Stylometric Inquiry into Hyperpartisan and Fake News
Martin Potthast Johannes Kiesel Kevin Reinartz Janek Bevendorff Benno Stein
Leipzig University
martin.potthast@uni-leipzig.de
Bauhaus-Universität Weimar
<first>.<last>@uni-weimar.de
Abstract
We report on a comparative style analy-
sis of hyperpartisan (extremely one-sided)
news and fake news. A corpus of 1,627 ar-
ticles from 9 political publishers, three
each from the mainstream, the hyperpar-
tisan left, and the hyperpartisan right, have
been fact-checked by professional journal-
ists at BuzzFeed: 97% of the 299 fake news
articles identified are also hyperpartisan.
We show how a style analysis can distin-
guish hyperpartisan news from the main-
stream
(F
1
=0.78)
, and satire from both
(F
1
=0.81)
. But stylometry is no silver bul-
let as style-based fake news detection does
not work
(F
1
=0.46)
. We further reveal
that left-wing and right-wing news share
significantly more stylistic similarities than
either does with the mainstream. This re-
sult is robust: it has been confirmed by
three different modeling approaches, one
of which employs Unmasking in a novel
way. Applications of our results include
partisanship detection and pre-screening
for semi-automatic fake news detection.
1 Introduction
The media and the public are currently discussing
the recent phenomenon of “fake news” and its po-
tential role in swaying elections, how it may af-
fect society, and what can and should be done
about it. Prone to misunderstanding and misue, the
term “fake news” arose from the observation that,
in social media, a certain kind of ‘news’ spreads
much more successfully than others, and this kind
of ‘news’ is typically extremely one-sided (hyper-
partisan), inflammatory, emotional, and often rid-
dled with untruths. Although traditional yellow
press has been spreading ‘news’ of varying de-
grees of truthfulness long before the digital revolu-
tion, its amplification over real news within social
media gives many people pause. The fake news
hype caused a widespread disillusionment about so-
cial media, and many politicians, news publishers,
IT companies, activists, and scientists concur that
this is where to draw the line. For all their good in-
tentions, however, it must be drawn very carefully
(if at all), since nothing less than free speech is at
stake—a fundamental right of every free society.
Many favor a two-step approach where fake
news items are detected and then countermeasures
are implemented to foreclose rumors and to dis-
courage repetition. While some countermeasures
are already tried in practice, such as displaying
warnings and withholding ad revenue, fake news
detection is still in its infancy. At any rate, a near-
real time reaction is crucial: once a fake news item
begins to spread virally, the damage is done and un-
doing it becomes arduous. Since knowledge-based
and context-based approaches to fake news detec-
tion can only be applied after publication, i.e., as
news events unfold and as social interactions occur,
they may not be fast enough.
We have identified style-based approaches as a
viable alternative, allowing for instantaneous re-
actions, albeit not to fake news, but to hyperpar-
tisanship. In this regard we contribute (1) a large
news corpus annotated by experts with respect to
veracity and hyperpartisanship, (2) extensive exper-
iments on discriminating fake news, hyperpartisan
news, and satire based solely on writing style, and
(3) validation experiments to verify our finding that
the writing style of the left and the right have more
in common than any of the two have with the main-
stream, applying Unmasking in a novel way.
After a review of related work, Section 3 details
the corpus and its construction, Section 4 intro-
duces our methodology, and Section 5 reports the
results of the aforementioned experiments.

232
2 Related Work
Approaches to fake news detection divide into three
categories (Figure 1): they can be knowledge-based
(by relating to known facts), context-based (by an-
alyzing news spread in social media), and style-
based (by analyzing writing style).
Knowledge-based fake news detection. Methods
from information retrieval have been proposed
early on to determine the veracity of web docu-
ments. For example, Etzioni et al. (2008) propose
to identify inconsistencies by matching claims ex-
tracted from the web with those of a document
in question. Similarly, Magdy and Wanas (2010)
measure the frequency of documents that support a
claim. Both approaches face the challenges of web
data credibility, namely expertise, trustworthiness,
quality, and reliability (Ginsca et al., 2015).
Other approaches rely on knowledge bases, in-
cluding the semantic web and linked open data.
Wu et al. (2014) “perturb” a claim in question to
query knowledge bases, using the result variations
as indicator of the support a knowledge base of-
fers for the claim. Ciampaglia et al. (2015) use
the shortest path between concepts in a knowledge
graph, whereas Shi and Weninger (2016) use a link
prediction algorithm. However, these approaches
are unsuited for new claims without corresponding
entries in a knowledge base, whereas knowledge
bases can be manipulated (Heindorf et al., 2016).
Context-based fake news detection. Here, fake
news items are identified via meta information and
spread patterns. For example, Long et al. (2017)
show that author information can be a useful fea-
ture for fake news detection, and Derczynski et al.
(2017) attempt to determine the veracity of a claim
based on the conversation it sparks on Twitter as
one of the RumourEval tasks. The Facebook analy-
sis of Mocanu et al. (2015) shows that unsubstan-
tiated claims spread as widely as well-established
ones, and that user groups predisposed to conspir-
acy theories are more open to sharing the former.
Similarly, Acemoglu et al. (2010), Kwon et al.
(2013), Ma et al. (2017), and Volkova et al. (2017)
model the spread of (mis
-
)information, while Bu-
dak et al. (2011) and Nguyen et al. (2012) propose
algorithms to limit its spread. The efficacy of coun-
termeasures like debunking sites is studied by Tam-
buscio et al. (2015). While achieving good results,
context-based approaches suffer from working only
a posteriori, requiring large amounts of data, and
disregarding the actual news content.
Knowledge-based (also called fact checking)
Style-based
Information retrieval
Semantic web / LOD
Text categorization
Deception detection
Context-based
Social network analysis
Fake news detection
Long et al., 2017
Mocanu et al., 2015
Acemoglu et al., 2010
Kwon et al., 2013
Ma et al., 2017
Volkova et al., 2017
Budak et al., 2011
Nguyen et al. 2012
Derczynski et al., 2017
Tambuscio et al., 2015
Afroz et al., 2012
Badaskar et al., 2008
Rubin et al., 2016
Yang et al., 2017
Rashkin et al., 2017
Horne and Adali, 2017
Pérez-Rosas et al., 2017
Wei et al., 2013
Chen et al., 2015
Rubin et al., 2015
Wang et al., 2017
Bourgonje et al., 2017
Wu et al., 2014
Ciampaglia et al, 2015
Shi and Weninger, 2016
Etzioni et al., 2018
Magdy and Wanas, 2010
Ginsca et al., 2015
Figure 1: Taxonomy of paradigms for fake news detec-
tion alongside a selection of related work.
Style-based fake news detection. Deception detec-
tion originates from forensic linguistics and builds
on the Undeutsch hypothesis—a result from foren-
sic psychology which asserts that memories of real-
life, self-experienced events differ in content and
quality from imagined events (Undeutsch, 1967).
The hypothesis led to the development of forensic
tools to assess testimonies at the statement level.
Some approaches operationalize deception detec-
tion at scale to detect uncertainty in social media
posts, for example Wei et al. (2013) and Chen et al.
(2015). In this regard, Rubin et al. (2015) use
rhetorical structure theory as a measure of story
coherence and as an indicator for fake news. Re-
cently, Wang (2017) collected a large dataset con-
sisting of sentence-length statements along their
veracity from the fact-checking site PolitiFact.com,
and then used style features to detect false state-
ments. A related task is stance detection, where
the goal is to detect the relation between a claim
about an article, and the article itself (Bourgonje
et al., 2017). Most prominently, stance detection
was the task of the Fake News Challenge
1
which
ran in 2017 and received 50 submissions, albeit
hardly any participants published their approach.
1
http://www.fakenewschallenge.org/

233
Where deception detection focuses on single
statements, style-based text categorization as pro-
posed by Argamon-Engelson et al. (1998) assesses
entire texts. Common applications are author pro-
filing (age, gender, etc.) and genre classification.
Though susceptible to authors who can modify
their writing style, such obfuscations may be de-
tectable (e.g., Afroz et al. (2012)). As an early
precursor to fake news detection, Badaskar et al.
(2008) train models to identify news items that
were automatically generated. Currently, text cate-
gorization methods for fake news detection focus
mostly on satire detection (e.g., Rubin et al. (2016),
Yang et al. (2017)). Rashkin et al. (2017) perform
a statistical analysis of the stylistic differences be-
tween real, satire, hoax, and propaganda news. We
make use of their results by incorporating the best-
performing style features identified.
Finally, two preprint papers have been recently
shared. Horne and Adali (2017) use style features
for fake news detection. However, the relatively
high accuracies reported must be taken with a grain
of salt: their two datasets comprise only 70 news ar-
ticles each, whose ground-truth is based on where
an article came from, instead of resulting from a
per-article expert review as in our case; their final
classifier uses only 4 features (number of nouns,
type-token ratio, word count, number of quotes),
which can be easily manipulated; and based on
their experimental setup, it cannot be ruled out
that the classifier simply differentiates news por-
tals rather than fake and real articles. We avoid
this problem by testing our classifiers on articles
from portals which were not represented in the
training data. Similarly, Pérez-Rosas et al. (2017)
also report on constructing two datasets compris-
ing around 240 and 200 news article excerpts (i.e.,
the 5-sentence lead) with a balanced distribution of
fake vs. real. The former was collected via crowd-
sourcing, asking workers to write a fake news item
based on a real news item, the latter was collected
from the web. For style analysis, the former dataset
may not be suitable, since the authors note them-
selves that “workers succeeded in mimicking the
reporting style from the original news”. The lat-
ter dataset encompasses only celebrity news (i.e.,
yellow press), which introduces a bias. Their fea-
ture selection follows that of Rubin et al. (2016),
which is covered by our experiments, but also in-
corporates topic features, rendering the resulting
classifier not generalizable.
3 The BuzzFeed-Webis Fake News Corpus
This section introduces the BuzzFeed-Webis Fake
News Corpus 2016, detailing its construction and
annotation by professional journalists employed at
BuzzFeed, as well as key figures and statistics.
2
3.1 Corpus Construction
The corpus encompasses the output of 9 publish-
ers on 7 workdays close to the US presidential
elections 2016, namely September 19 to 23, 26,
and 27. Table 1 gives an overview. Among the
selected publishers are six prolific hyperpartisan
ones (three left-wing and three right-wing), and
three mainstream ones. All publishers earned Face-
book’s blue checkmark , indicating authenticity
and an elevated status within the network. Every
post and linked news article has been fact-checked
by 4 BuzzFeed journalists, including about 19% of
posts forwarded from third parties. Having checked
a total of 2,282 posts, 1,145 mainstream, 471 left-
wing, and 666 right-wing, Silverman et al. (2016)
reported key insights as a data journalism article.
The annotations were published alongside the ar-
ticle.
3
However, this data only comprises URLs
to the original Facebook posts. To construct our
corpus, we archived the posts, the linked articles,
and attached media as well as relevant meta data to
ensure long-term availability. Due to the rapid pace
at which the publishers change their websites, we
were able to recover only 1,627 articles, 826 main-
stream, 256 left-wing, and 545 right-wing.
Manual fact-checking. A binary distinction be-
tween fake and real news turned out to be infeasi-
ble, since hardly any piece of fake news is entirely
false, and pieces of real news may not be flawless.
Therefore, posts were rated “mostly true, “mixture
of true and false, “mostly false, or, if the post was
opinion-driven or otherwise lacked a factual claim,
“no factual content. Four BuzzFeed journalists
worked on the manual fact-checks of the news arti-
cles: to minimize costs, each article was reviewed
only once and articles were assigned round robin.
The ratings “mixture of true and false” and “mostly
false” had to be justified, and, when in doubt about
a rating, a second opinion was collected, whereas
disagreements were resolved by a third one. Fi-
nally, all news rated “mostly false” underwent a
final check to ensure the rating was justified, lest
the respective publishers would contest it.
2
Corpus download: https://doi.org/10.5281/zenodo.1239675
3
http://github.com/BuzzFeedNews/2016-10-facebook-fact-check

234
The journalists were given the following guidance:
Mostly true: The post and any related link or
image are based on factual information and por-
tray it accurately. The authors may interpret the
event/info in their own way, so long as they do not
misrepresent events, numbers, quotes, reactions,
etc., or make information up. This rating does not
allow for unsupported speculation or claims.
Mixture of true and false (mix, for short): Some
elements of the information are factually accurate,
but some elements or claims are not. This rating
should be used when speculation or unfounded
claims are mixed with real events, numbers, quotes,
etc., or when the headline of the link being shared
makes a false claim but the text of the story is
largely accurate. It should also only be used when
the unsupported or false information is roughly
equal to the accurate information in the post or link.
Finally, use this rating for news articles that are
based on unconfirmed information.
Mostly false: Most or all of the information in
the post or in the link being shared is inaccurate.
This should also be used when the central claim
being made is false.
No factual content (n/a, for short): This rating is
used for posts that are pure opinion, comics, satire,
or any other posts that do not make a factual claim.
This is also the category to use for posts that are of
the “Like this if you think... variety.
3.2 Limitations
Given the significant workload (i.e., costs) required
to carry out the aforementioned annotations, the
corpus is restricted to the given temporal period
and biased toward the US culture and political land-
scape, comprising only English news articles from
a limited number of publishers. Annotations were
recorded at the article level, not at statement level.
For text categorization, this is sufficient. At the
time of writing, our corpus is the largest of its kind
that has been annotated by professional journalists.
3.3 Corpus Statistics
Table 1 shows the fact-checking results and some
key statistics per article. Unsurprisingly, none of
the mainstream articles are mostly false, whereas
8 across all three publishers are a mixture of true
and false. Disregarding non-factual articles, a little
more than a quarter of all hyperpartisan left-wing
articles were found faulty: 15 articles mostly false,
and 51 a mixture of true and false. Publisher “The
Other 98%” sticks out by achieving an almost per-
Orientation Fact-checking results Key statistics per article
Publisher
true mix false n/a Σ Paras. Links Words
extern all quoted all
Mainstream 806 8 0 12 826 20.1 2.2 3.7 18.1 692.0
ABC News 90 2 0 3 95 21.1 1.0 4.8 21.0 551.9
CNN 295 4 0 8 307 19.3 2.4 2.5 15.3 588.3
Politico 421 2 0 1 424 20.5 2.3 4.3 19.9 798.5
Left-wing 182 51 15 8 256 14.6 4.5 4.9 28.6 423.2
Addicting Info 95 25 8 7 135 15.9 4.4 4.5 30.5 430.5
Occupy Democrats 55 23 6 0 91 10.9 4.1 4.7 29.0 421.7
The Other 98% 32 3 1 1 30 20.2 6.4 7.2 21.2 394.5
Right-wing 276 153 72 44 545 14.1 2.5 3.1 24.6 397.4
Eagle Rising 107 47 25 36 214 12.9 2.6 2.8 17.3 388.3
Freedom Daily 48 24 22 4 99 14.6 2.2 2.3 23.5 419.3
Right Wing News 121 82 25 4 232 15.0 2.5 3.6 33.6 396.6
Σ 1264 212 87 64 1627 17.2 2.7 3.7 20.6 551.0
Table 1: The BuzzFeed-Webis Fake News Corpus 2016
at a glance (“Paras. short for “paragraphs”).
fect score. By contrast, almost 45% of the right-
wing articles are a mixture of true and false (153)
or mostly false (72). Here, publisher “Right Wing
News” sticks out by supplying more than half of
mixtures of true and false alone, whereas mostly
false articles are equally distributed.
Regarding key statistics per article, it is interest-
ing that the articles from all mainstream publish-
ers are on average about 20 paragraphs long with
word counts ranging from 550 words on average at
ABC News to 800 at Politico. Except for one pub-
lisher, left-wing articles and right-wing articles are
shorter on average in terms of paragraphs as well as
word count, averaging at about 420 words and 400
words, respectively. Left-wing articles quote on
average about 10 words more than the mainstream,
and right-wing articles 6 words more. When arti-
cles comprise links, they are usually external ones,
whereas ABC News rather uses internal links, and
only half of the links found at Politico articles are
external. Left-wing news articles stick out by con-
taining almost double the amount of links across
publishers than mainstream and right-wing ones.
3.4 Operationalizing Fake News
In our experiments, we operationalize the category
of fake news by joining the articles that were rated
mostly false with those rated a mixture of true and
false. Arguably, the latter may not be exactly what
is deemed “fake news” (as in: a complete fabrica-
tion), however, practice shows fake news are hardly
ever devoid of truth. More often, true facts are mis-
construed or framed badly. In our experiments, we
hence call mostly true articles real news, mostly
false plus mixtures of true and false—except for
satire—fake news, and disregard all articles rated
non-factual.

235
4 Methodology
This section covers our methodology, including
our feature set to capture writing style, and a brief
recap of Unmasking by Koppel et al. (2007), which
we employ for the first time to distinguish genre
styles as opposed to author styles. For sake of
reproducibility, all our code has been published.
4
4.1 Style Features and Feature Selection
Our writing style model incorporates common fea-
tures as well as ones specific to the news domain.
The former are n-grams, n in
[1, 3]
, of characters,
stop words, and parts-of-speech. Further, we em-
ploy 10 readability scores
5
and dictionary features,
each indicating the frequency of words from a
tailor-made dictionary in a document, using the
General Inquirer Dictionaries as a basis (Stone
et al., 1966). The domain-specific features include
ratios of quoted words and external links, the num-
ber of paragraphs, and their average length.
In each of our experiments, we carefully select
from the aforementioned features the ones worth-
while using: all features are discarded that are
hardly represented in our corpus, namely word to-
kens that occur in less than 2.5% of the documents,
and n-gram features that occur in less than 10%
of the documents. Discarding these features pre-
vents overfitting and improves the chances that our
model will generalize.
If not stated otherwise, our experiments share
a common setup. In order to avoid biases from
the respective training sets, we balance them us-
ing oversampling. Furthermore, we perform 3-fold
cross-validation where each fold comprises one
publisher from each orientation, so that the clas-
sifier does not learn a publisher’s style. For non-
Unmasking experiments we use WEKAs random
forest implementation with default settings.
4.2 Unmasking Genre Styles
Unmasking, as proposed by Koppel et al. (2007),
is a meta learning approach for authorship verifi-
cation. We study for the first time whether it can
be used to assess the similarity of more broadly
defined style categories, such as left-wing vs. right-
wing vs. mainstream news. This way, we uncover
relations between the writing styles that people may
involuntarily adopt as per their political orientation.
4
Code download: http://www.github.com/webis-de/ACL-18
5
Automated Readability Index, Coleman Liau Index, Flesh Kin-
caid Grade Level and Reading Ease, Gunning Fog Index, LIX,
McAlpine EFLAW Score, RIX, SMOG Grade, Strain Index
Originally, Unmasking takes two documents as
input and outputs its confidence whether they have
been written by the same author. Three steps are
taken to accomplish this: first, each document is
chunked into a set of at least 500-word long chunks;
second, classification errors are measured while it-
eratively removing the most discriminative features
of a style model consisting of the 250 most fre-
quent words, separating the two chunk sets with a
linear classifier; and third, the resulting classifica-
tion accuracy curves are analyzed with regard to
their slope. A steep decrease is more likely than a
shallow decrease if the two documents have been
written by the same author, since there are pre-
sumably less discriminating features between docu-
ments written by the same author than between doc-
uments written by different authors. Training a clas-
sifier on many examples of error curves obtained
from same-author document pairs and different-
author document pairs yields an effective author-
ship verifier—at least for long documents that can
be split up into a sufficient number of chunks.
It turns out that what applies to the style of au-
thors also applies to genre styles. We adapt Un-
masking by skipping its first step and using two
sets of documents (e.g., left-wing articles and right-
wing articles) as input. When plotting classification
error curves for visual inspection, steeper decreases
in these plots, too, indicate higher style similarity
of the two input document sets, just as with chunk
sets of two documents written by the same author.
4.3 Baselines
We employ four baseline models: a topic-based bag
of words model, often used in the literature, but less
practical since news topics change frequently and
drastically; a model using only the domain-specific
news style features to check whether the differences
between categories measured as corpus statistics
play a significant role; and naive baselines that clas-
sify all items into one of the categories in question,
relating our results to the class distributions.
4.4 Performance Measures
Classification performance is measured as accuracy,
and class-wise precision, recall, and F
1
. We favor
these measures over, e.g., areas under the ROC
curve or the precision recall curve for simplicity
sake. Also, the tasks we are tackling are new, so
that little is known to date about user preferences.
This is also why we chose the evenly-balanced F
1
.

Citations
More filters
Journal ArticleDOI

Fake News Detection on Social Media: A Data Mining Perspective

TL;DR: Wang et al. as discussed by the authors presented a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets.
Journal ArticleDOI

A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities

TL;DR: In this article, the authors present a survey of methods that can detect fake news from four perspectives: the false knowledge it carries, its writing style, its propagation patterns, and the credibility of its source.
Journal ArticleDOI

Combating Fake News: A Survey on Identification and Mitigation Techniques

TL;DR: This survey describes the modern-day problem of fake news and, in particular, highlights the technical challenges associated with it and comprehensively compile and summarize characteristic features of available datasets.
Proceedings ArticleDOI

Understanding User Profiles on Social Media for Fake News Detection

TL;DR: A comparative analysis over explicit and implicit profile features between “experienced” users who are able to recognize fake news items as false and “naïve” Users who are more likely to believe fake news reveals their potential to differentiate fake news.
Proceedings ArticleDOI

GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media

TL;DR: This paper solves the fake news detection problem under a more realistic scenario on social media by developing a novel neural network-based model, Graph-aware Co-Attention Networks (GCAN), which can significantly outperform state-of-the-art methods by 16% in accuracy on average.
References
More filters
Proceedings ArticleDOI

Limiting the spread of misinformation in social networks

TL;DR: This work study the notion of competing campaigns in a social network and address the problem of influence limitation where a "bad" campaign starts propagating from a certain node in the network and use the concept of limiting campaigns to counteract the effect of misinformation.
Proceedings ArticleDOI

Prominent Features of Rumor Propagation in Online Social Media

TL;DR: A new periodic time series model that considers daily and external shock cycles, where the model demonstrates that rumor likely have fluctuations over time, and key structural and linguistic differences in the spread of rumors and non-rumors are identified.
Proceedings ArticleDOI

Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking

TL;DR: Experiments show that while media fact-checking remains to be an open research question, stylistic cues can help determine the truthfulness of text.
Journal ArticleDOI

Open information extraction from the web

TL;DR: In this paper, a self-supervised learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text.
Journal ArticleDOI

Computational Fact Checking from Knowledge Networks

TL;DR: It is shown that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs.
Related Papers (5)