A Stylometric Inquiry into Hyperpartisan and Fake News

doi:10.18653/V1/P18-1022

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 231–240

Melbourne, Australia, July 15 - 20, 2018.

c

2018 Association for Computational Linguistics

231

A Stylometric Inquiry into Hyperpartisan and Fake News

Martin Potthast Johannes Kiesel Kevin Reinartz Janek Bevendorff Benno Stein

Leipzig University

martin.potthast@uni-leipzig.de

Bauhaus-Universität Weimar

<first>.<last>@uni-weimar.de

Abstract

We report on a comparative style analy-

sis of hyperpartisan (extremely one-sided)

news and fake news. A corpus of 1,627 ar-

ticles from 9 political publishers, three

each from the mainstream, the hyperpar-

tisan left, and the hyperpartisan right, have

been fact-checked by professional journal-

ists at BuzzFeed: 97% of the 299 fake news

articles identiﬁed are also hyperpartisan.

We show how a style analysis can distin-

guish hyperpartisan news from the main-

stream

(F

1

=0.78)

, and satire from both

(F

1

=0.81)

. But stylometry is no silver bul-

let as style-based fake news detection does

not work

(F

1

=0.46)

. We further reveal

that left-wing and right-wing news share

signiﬁcantly more stylistic similarities than

either does with the mainstream. This re-

sult is robust: it has been conﬁrmed by

three different modeling approaches, one

of which employs Unmasking in a novel

way. Applications of our results include

partisanship detection and pre-screening

for semi-automatic fake news detection.

1 Introduction

The media and the public are currently discussing

the recent phenomenon of “fake news” and its po-

tential role in swaying elections, how it may af-

fect society, and what can and should be done

about it. Prone to misunderstanding and misue, the

term “fake news” arose from the observation that,

in social media, a certain kind of ‘news’ spreads

much more successfully than others, and this kind

of ‘news’ is typically extremely one-sided (hyper-

partisan), inﬂammatory, emotional, and often rid-

dled with untruths. Although traditional yellow

press has been spreading ‘news’ of varying de-

grees of truthfulness long before the digital revolu-

tion, its ampliﬁcation over real news within social

media gives many people pause. The fake news

hype caused a widespread disillusionment about so-

cial media, and many politicians, news publishers,

IT companies, activists, and scientists concur that

this is where to draw the line. For all their good in-

tentions, however, it must be drawn very carefully

(if at all), since nothing less than free speech is at

stake—a fundamental right of every free society.

Many favor a two-step approach where fake

news items are detected and then countermeasures

are implemented to foreclose rumors and to dis-

courage repetition. While some countermeasures

are already tried in practice, such as displaying

warnings and withholding ad revenue, fake news

detection is still in its infancy. At any rate, a near-

real time reaction is crucial: once a fake news item

begins to spread virally, the damage is done and un-

doing it becomes arduous. Since knowledge-based

and context-based approaches to fake news detec-

tion can only be applied after publication, i.e., as

news events unfold and as social interactions occur,

they may not be fast enough.

We have identiﬁed style-based approaches as a

viable alternative, allowing for instantaneous re-

actions, albeit not to fake news, but to hyperpar-

tisanship. In this regard we contribute (1) a large

news corpus annotated by experts with respect to

veracity and hyperpartisanship, (2) extensive exper-

iments on discriminating fake news, hyperpartisan

news, and satire based solely on writing style, and

(3) validation experiments to verify our ﬁnding that

the writing style of the left and the right have more

in common than any of the two have with the main-

stream, applying Unmasking in a novel way.

After a review of related work, Section 3 details

the corpus and its construction, Section 4 intro-

duces our methodology, and Section 5 reports the

results of the aforementioned experiments.

232

2 Related Work

Approaches to fake news detection divide into three

categories (Figure 1): they can be knowledge-based

(by relating to known facts), context-based (by an-

alyzing news spread in social media), and style-

based (by analyzing writing style).

Knowledge-based fake news detection. Methods

from information retrieval have been proposed

early on to determine the veracity of web docu-

ments. For example, Etzioni et al. (2008) propose

to identify inconsistencies by matching claims ex-

tracted from the web with those of a document

in question. Similarly, Magdy and Wanas (2010)

measure the frequency of documents that support a

claim. Both approaches face the challenges of web

data credibility, namely expertise, trustworthiness,

quality, and reliability (Ginsca et al., 2015).

Other approaches rely on knowledge bases, in-

cluding the semantic web and linked open data.

Wu et al. (2014) “perturb” a claim in question to

query knowledge bases, using the result variations

as indicator of the support a knowledge base of-

fers for the claim. Ciampaglia et al. (2015) use

the shortest path between concepts in a knowledge

graph, whereas Shi and Weninger (2016) use a link

prediction algorithm. However, these approaches

are unsuited for new claims without corresponding

entries in a knowledge base, whereas knowledge

bases can be manipulated (Heindorf et al., 2016).

Context-based fake news detection. Here, fake

news items are identiﬁed via meta information and

spread patterns. For example, Long et al. (2017)

show that author information can be a useful fea-

ture for fake news detection, and Derczynski et al.

(2017) attempt to determine the veracity of a claim

based on the conversation it sparks on Twitter as

one of the RumourEval tasks. The Facebook analy-

sis of Mocanu et al. (2015) shows that unsubstan-

tiated claims spread as widely as well-established

ones, and that user groups predisposed to conspir-

acy theories are more open to sharing the former.

Similarly, Acemoglu et al. (2010), Kwon et al.

(2013), Ma et al. (2017), and Volkova et al. (2017)

model the spread of (mis

-

)information, while Bu-

dak et al. (2011) and Nguyen et al. (2012) propose

algorithms to limit its spread. The efﬁcacy of coun-

termeasures like debunking sites is studied by Tam-

buscio et al. (2015). While achieving good results,

context-based approaches suffer from working only

a posteriori, requiring large amounts of data, and

disregarding the actual news content.

Knowledge-based (also called fact checking)

Style-based

Information retrieval

Semantic web / LOD

Text categorization

Deception detection

Context-based

Social network analysis

Fake news detection

Long et al., 2017

Mocanu et al., 2015

Acemoglu et al., 2010

Kwon et al., 2013

Ma et al., 2017

Volkova et al., 2017

Budak et al., 2011

Nguyen et al. 2012

Derczynski et al., 2017

Tambuscio et al., 2015

Afroz et al., 2012

Badaskar et al., 2008

Rubin et al., 2016

Yang et al., 2017

Rashkin et al., 2017

Horne and Adali, 2017

Pérez-Rosas et al., 2017

Wei et al., 2013

Chen et al., 2015

Rubin et al., 2015

Wang et al., 2017

Bourgonje et al., 2017

Wu et al., 2014

Ciampaglia et al, 2015

Shi and Weninger, 2016

Etzioni et al., 2018

Magdy and Wanas, 2010

Ginsca et al., 2015

Figure 1: Taxonomy of paradigms for fake news detec-

tion alongside a selection of related work.

Style-based fake news detection. Deception detec-

tion originates from forensic linguistics and builds

on the Undeutsch hypothesis—a result from foren-

sic psychology which asserts that memories of real-

life, self-experienced events differ in content and

quality from imagined events (Undeutsch, 1967).

The hypothesis led to the development of forensic

tools to assess testimonies at the statement level.

Some approaches operationalize deception detec-

tion at scale to detect uncertainty in social media

posts, for example Wei et al. (2013) and Chen et al.

(2015). In this regard, Rubin et al. (2015) use

rhetorical structure theory as a measure of story

coherence and as an indicator for fake news. Re-

cently, Wang (2017) collected a large dataset con-

sisting of sentence-length statements along their

veracity from the fact-checking site PolitiFact.com,

and then used style features to detect false state-

ments. A related task is stance detection, where

the goal is to detect the relation between a claim

about an article, and the article itself (Bourgonje

et al., 2017). Most prominently, stance detection

was the task of the Fake News Challenge

1

which

ran in 2017 and received 50 submissions, albeit

hardly any participants published their approach.

1

http://www.fakenewschallenge.org/

233

Where deception detection focuses on single

statements, style-based text categorization as pro-

posed by Argamon-Engelson et al. (1998) assesses

entire texts. Common applications are author pro-

ﬁling (age, gender, etc.) and genre classiﬁcation.

Though susceptible to authors who can modify

their writing style, such obfuscations may be de-

tectable (e.g., Afroz et al. (2012)). As an early

precursor to fake news detection, Badaskar et al.

(2008) train models to identify news items that

were automatically generated. Currently, text cate-

gorization methods for fake news detection focus

mostly on satire detection (e.g., Rubin et al. (2016),

Yang et al. (2017)). Rashkin et al. (2017) perform

a statistical analysis of the stylistic differences be-

tween real, satire, hoax, and propaganda news. We

make use of their results by incorporating the best-

performing style features identiﬁed.

Finally, two preprint papers have been recently

shared. Horne and Adali (2017) use style features

for fake news detection. However, the relatively

high accuracies reported must be taken with a grain

of salt: their two datasets comprise only 70 news ar-

ticles each, whose ground-truth is based on where

an article came from, instead of resulting from a

per-article expert review as in our case; their ﬁnal

classiﬁer uses only 4 features (number of nouns,

type-token ratio, word count, number of quotes),

which can be easily manipulated; and based on

their experimental setup, it cannot be ruled out

that the classiﬁer simply differentiates news por-

tals rather than fake and real articles. We avoid

this problem by testing our classiﬁers on articles

from portals which were not represented in the

training data. Similarly, Pérez-Rosas et al. (2017)

also report on constructing two datasets compris-

ing around 240 and 200 news article excerpts (i.e.,

the 5-sentence lead) with a balanced distribution of

fake vs. real. The former was collected via crowd-

sourcing, asking workers to write a fake news item

based on a real news item, the latter was collected

from the web. For style analysis, the former dataset

may not be suitable, since the authors note them-

selves that “workers succeeded in mimicking the

reporting style from the original news”. The lat-

ter dataset encompasses only celebrity news (i.e.,

yellow press), which introduces a bias. Their fea-

ture selection follows that of Rubin et al. (2016),

which is covered by our experiments, but also in-

corporates topic features, rendering the resulting

classiﬁer not generalizable.

3 The BuzzFeed-Webis Fake News Corpus

This section introduces the BuzzFeed-Webis Fake

News Corpus 2016, detailing its construction and

annotation by professional journalists employed at

BuzzFeed, as well as key ﬁgures and statistics.

2

3.1 Corpus Construction

The corpus encompasses the output of 9 publish-

ers on 7 workdays close to the US presidential

elections 2016, namely September 19 to 23, 26,

and 27. Table 1 gives an overview. Among the

selected publishers are six proliﬁc hyperpartisan

ones (three left-wing and three right-wing), and

three mainstream ones. All publishers earned Face-

book’s blue checkmark , indicating authenticity

and an elevated status within the network. Every

post and linked news article has been fact-checked

by 4 BuzzFeed journalists, including about 19% of

posts forwarded from third parties. Having checked

a total of 2,282 posts, 1,145 mainstream, 471 left-

wing, and 666 right-wing, Silverman et al. (2016)

reported key insights as a data journalism article.

The annotations were published alongside the ar-

ticle.

3

However, this data only comprises URLs

to the original Facebook posts. To construct our

corpus, we archived the posts, the linked articles,

and attached media as well as relevant meta data to

ensure long-term availability. Due to the rapid pace

at which the publishers change their websites, we

were able to recover only 1,627 articles, 826 main-

stream, 256 left-wing, and 545 right-wing.

Manual fact-checking. A binary distinction be-

tween fake and real news turned out to be infeasi-

ble, since hardly any piece of fake news is entirely

false, and pieces of real news may not be ﬂawless.

Therefore, posts were rated “mostly true,” “mixture

of true and false,” “mostly false,” or, if the post was

opinion-driven or otherwise lacked a factual claim,

“no factual content.” Four BuzzFeed journalists

worked on the manual fact-checks of the news arti-

cles: to minimize costs, each article was reviewed

only once and articles were assigned round robin.

The ratings “mixture of true and false” and “mostly

false” had to be justiﬁed, and, when in doubt about

a rating, a second opinion was collected, whereas

disagreements were resolved by a third one. Fi-

nally, all news rated “mostly false” underwent a

ﬁnal check to ensure the rating was justiﬁed, lest

the respective publishers would contest it.

2

Corpus download: https://doi.org/10.5281/zenodo.1239675

3

http://github.com/BuzzFeedNews/2016-10-facebook-fact-check

234

The journalists were given the following guidance:

Mostly true: The post and any related link or

image are based on factual information and por-

tray it accurately. The authors may interpret the

event/info in their own way, so long as they do not

misrepresent events, numbers, quotes, reactions,

etc., or make information up. This rating does not

allow for unsupported speculation or claims.

Mixture of true and false (mix, for short): Some

elements of the information are factually accurate,

but some elements or claims are not. This rating

should be used when speculation or unfounded

claims are mixed with real events, numbers, quotes,

etc., or when the headline of the link being shared

makes a false claim but the text of the story is

largely accurate. It should also only be used when

the unsupported or false information is roughly

equal to the accurate information in the post or link.

Finally, use this rating for news articles that are

based on unconﬁrmed information.

Mostly false: Most or all of the information in

the post or in the link being shared is inaccurate.

This should also be used when the central claim

being made is false.

No factual content (n/a, for short): This rating is

used for posts that are pure opinion, comics, satire,

or any other posts that do not make a factual claim.

This is also the category to use for posts that are of

the “Like this if you think...” variety.

3.2 Limitations

Given the signiﬁcant workload (i.e., costs) required

to carry out the aforementioned annotations, the

corpus is restricted to the given temporal period

and biased toward the US culture and political land-

scape, comprising only English news articles from

a limited number of publishers. Annotations were

recorded at the article level, not at statement level.

For text categorization, this is sufﬁcient. At the

time of writing, our corpus is the largest of its kind

that has been annotated by professional journalists.

3.3 Corpus Statistics

Table 1 shows the fact-checking results and some

key statistics per article. Unsurprisingly, none of

the mainstream articles are mostly false, whereas

8 across all three publishers are a mixture of true

and false. Disregarding non-factual articles, a little

more than a quarter of all hyperpartisan left-wing

articles were found faulty: 15 articles mostly false,

and 51 a mixture of true and false. Publisher “The

Other 98%” sticks out by achieving an almost per-

Orientation Fact-checking results Key statistics per article

Publisher

true mix false n/a Σ Paras. Links Words

extern all quoted all

Mainstream 806 8 0 12 826 20.1 2.2 3.7 18.1 692.0

ABC News 90 2 0 3 95 21.1 1.0 4.8 21.0 551.9

CNN 295 4 0 8 307 19.3 2.4 2.5 15.3 588.3

Politico 421 2 0 1 424 20.5 2.3 4.3 19.9 798.5

Left-wing 182 51 15 8 256 14.6 4.5 4.9 28.6 423.2

Addicting Info 95 25 8 7 135 15.9 4.4 4.5 30.5 430.5

Occupy Democrats 55 23 6 0 91 10.9 4.1 4.7 29.0 421.7

The Other 98% 32 3 1 1 30 20.2 6.4 7.2 21.2 394.5

Right-wing 276 153 72 44 545 14.1 2.5 3.1 24.6 397.4

Eagle Rising 107 47 25 36 214 12.9 2.6 2.8 17.3 388.3

Freedom Daily 48 24 22 4 99 14.6 2.2 2.3 23.5 419.3

Right Wing News 121 82 25 4 232 15.0 2.5 3.6 33.6 396.6

Σ 1264 212 87 64 1627 17.2 2.7 3.7 20.6 551.0

Table 1: The BuzzFeed-Webis Fake News Corpus 2016

at a glance (“Paras.” short for “paragraphs”).

fect score. By contrast, almost 45% of the right-

wing articles are a mixture of true and false (153)

or mostly false (72). Here, publisher “Right Wing

News” sticks out by supplying more than half of

mixtures of true and false alone, whereas mostly

false articles are equally distributed.

Regarding key statistics per article, it is interest-

ing that the articles from all mainstream publish-

ers are on average about 20 paragraphs long with

word counts ranging from 550 words on average at

ABC News to 800 at Politico. Except for one pub-

lisher, left-wing articles and right-wing articles are

shorter on average in terms of paragraphs as well as

word count, averaging at about 420 words and 400

words, respectively. Left-wing articles quote on

average about 10 words more than the mainstream,

and right-wing articles 6 words more. When arti-

cles comprise links, they are usually external ones,

whereas ABC News rather uses internal links, and

only half of the links found at Politico articles are

external. Left-wing news articles stick out by con-

taining almost double the amount of links across

publishers than mainstream and right-wing ones.

3.4 Operationalizing Fake News

In our experiments, we operationalize the category

of fake news by joining the articles that were rated

mostly false with those rated a mixture of true and

false. Arguably, the latter may not be exactly what

is deemed “fake news” (as in: a complete fabrica-

tion), however, practice shows fake news are hardly

ever devoid of truth. More often, true facts are mis-

construed or framed badly. In our experiments, we

hence call mostly true articles real news, mostly

false plus mixtures of true and false—except for

satire—fake news, and disregard all articles rated

non-factual.

235

4 Methodology

This section covers our methodology, including

our feature set to capture writing style, and a brief

recap of Unmasking by Koppel et al. (2007), which

we employ for the ﬁrst time to distinguish genre

styles as opposed to author styles. For sake of

reproducibility, all our code has been published.

4

4.1 Style Features and Feature Selection

Our writing style model incorporates common fea-

tures as well as ones speciﬁc to the news domain.

The former are n-grams, n in

[1, 3]

, of characters,

stop words, and parts-of-speech. Further, we em-

ploy 10 readability scores

5

and dictionary features,

each indicating the frequency of words from a

tailor-made dictionary in a document, using the

General Inquirer Dictionaries as a basis (Stone

et al., 1966). The domain-speciﬁc features include

ratios of quoted words and external links, the num-

ber of paragraphs, and their average length.

In each of our experiments, we carefully select

from the aforementioned features the ones worth-

while using: all features are discarded that are

hardly represented in our corpus, namely word to-

kens that occur in less than 2.5% of the documents,

and n-gram features that occur in less than 10%

of the documents. Discarding these features pre-

vents overﬁtting and improves the chances that our

model will generalize.

If not stated otherwise, our experiments share

a common setup. In order to avoid biases from

the respective training sets, we balance them us-

ing oversampling. Furthermore, we perform 3-fold

cross-validation where each fold comprises one

publisher from each orientation, so that the clas-

siﬁer does not learn a publisher’s style. For non-

Unmasking experiments we use WEKA’s random

forest implementation with default settings.

4.2 Unmasking Genre Styles

Unmasking, as proposed by Koppel et al. (2007),

is a meta learning approach for authorship veriﬁ-

cation. We study for the ﬁrst time whether it can

be used to assess the similarity of more broadly

deﬁned style categories, such as left-wing vs. right-

wing vs. mainstream news. This way, we uncover

relations between the writing styles that people may

involuntarily adopt as per their political orientation.

4

Code download: http://www.github.com/webis-de/ACL-18

5

Automated Readability Index, Coleman Liau Index, Flesh Kin-

caid Grade Level and Reading Ease, Gunning Fog Index, LIX,

McAlpine EFLAW Score, RIX, SMOG Grade, Strain Index

Originally, Unmasking takes two documents as

input and outputs its conﬁdence whether they have

been written by the same author. Three steps are

taken to accomplish this: ﬁrst, each document is

chunked into a set of at least 500-word long chunks;

second, classiﬁcation errors are measured while it-

eratively removing the most discriminative features

of a style model consisting of the 250 most fre-

quent words, separating the two chunk sets with a

linear classiﬁer; and third, the resulting classiﬁca-

tion accuracy curves are analyzed with regard to

their slope. A steep decrease is more likely than a

shallow decrease if the two documents have been

written by the same author, since there are pre-

sumably less discriminating features between docu-

ments written by the same author than between doc-

uments written by different authors. Training a clas-

siﬁer on many examples of error curves obtained

from same-author document pairs and different-

author document pairs yields an effective author-

ship veriﬁer—at least for long documents that can

be split up into a sufﬁcient number of chunks.

It turns out that what applies to the style of au-

thors also applies to genre styles. We adapt Un-

masking by skipping its ﬁrst step and using two

sets of documents (e.g., left-wing articles and right-

wing articles) as input. When plotting classiﬁcation

error curves for visual inspection, steeper decreases

in these plots, too, indicate higher style similarity

of the two input document sets, just as with chunk

sets of two documents written by the same author.

4.3 Baselines

We employ four baseline models: a topic-based bag

of words model, often used in the literature, but less

practical since news topics change frequently and

drastically; a model using only the domain-speciﬁc

news style features to check whether the differences

between categories measured as corpus statistics

play a signiﬁcant role; and naive baselines that clas-

sify all items into one of the categories in question,

relating our results to the class distributions.

4.4 Performance Measures

Classiﬁcation performance is measured as accuracy,

and class-wise precision, recall, and F

1

. We favor

these measures over, e.g., areas under the ROC

curve or the precision recall curve for simplicity

sake. Also, the tasks we are tackling are new, so

that little is known to date about user preferences.

This is also why we chose the evenly-balanced F

1

.

A Stylometric Inquiry into Hyperpartisan and Fake News

Citations

Fake News Detection on Social Media: A Data Mining Perspective

A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities

Combating Fake News: A Survey on Identification and Mitigation Techniques

Understanding User Profiles on Social Media for Fake News Detection

GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media

References

Limiting the spread of misinformation in social networks

Prominent Features of Rumor Propagation in Online Social Media

Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking

Open information extraction from the web

Computational Fact Checking from Knowledge Networks

Related Papers (5)

Fake News Detection on Social Media: A Data Mining Perspective

"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection

The spread of true and false news online

Information credibility on twitter

Social Media and Fake News in the 2016 Election