Information credibility on twitter

doi:10.1145/1963405.1963500

Home
/
Papers
/
Information credibility on twitter

Proceedings Article•DOI•

Information credibility on twitter

Carlos Castillo¹, Marcelo Mendoza², Barbara Poblete³•Institutions (3)

Yahoo!¹, Federico Santa María Technical University², University of Chile³

28 Mar 2011-pp 675-684

TL;DR: There are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

read less

Abstract: We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally.On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to "trending" topics, and classify them as credible or not credible, based on features extracted from them. We use features from users' posting and re-posting ("re-tweeting") behavior, from the text of the posts, and from citations to external sources.We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Automatic detection of rumor on Sina Weibo

[...]

Fan Yang¹, Yang Liu¹, Xiaohui Yu², Min Yang¹•Institutions (2)

Shandong University¹, York University²

12 Aug 2012

TL;DR: This is the first study on rumor analysis and detection on Sina Weibo, China's leading micro-blogging service provider, and examines an extensive set of features that can be extracted from the microblogs, and trains a classifier to automatically detect the rumors from a mixed set of true information and false information.

...read moreread less

Abstract: The problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift the premise and study the problem of information credibility on Sina Weibo, China's leading micro-blogging service provider. With eight times more users than Twitter, Sina Weibo is more of a Facebook-Twitter hybrid than a pure Twitter clone, and exhibits several important characteristics that distinguish it from Twitter. We collect an extensive set of microblogs which have been confirmed to be false rumors based on information from the official rumor-busting service provided by Sina Weibo. Unlike previous studies on Twitter where the labeling of rumors is done manually by the participants of the experiments, the official nature of this service ensures the high quality of the dataset. We then examine an extensive set of features that can be extracted from the microblogs, and train a classifier to automatically detect the rumors from a mixed set of true information and false information. The experiments show that some of the new features we propose are indeed effective in the classification, and even the features considered in previous studies have different implications with Sina Weibo than with Twitter. To the best of our knowledge, this is the first study on rumor analysis and detection on Sina Weibo.

...read moreread less

495 citations

Cites background or methods from "Information credibility on twitter"

...As the content-based features, account-based features, and propagation-based features have been studied in the previous works [3] [11], we here just identify the effectiveness of the two new features that we proposed....
[...]
...[3] use four types of features: (1) message-based features, which consider characteristics of the tweet content, which can be categorized as Twitter-independent and Twitter-dependent; (2) user-based features, which consider characteristics of Twitter users, such as registration age, number of followers, number of friends, and number of user posted tweets; (3) topicbased features, which are aggregates computed from messagebased features and user-based features; and (4) propagationbased features, which consider attributes related to the propagation tree that can be built from the retweets of a specific tweet....
[...]
...[3] focus on automatically assessing the credibility of a given set of tweets....
[...]
...[3] use keyword-based query interface provided by Twitter Monitor to collect data....
[...]
...Some of the features have been studied in previous works [3] [11] [9]....
[...]

Journal Article•DOI•

Social media, political polarization, and political disinformation: a review of the scientific literature

[...]

Joshua A. Tucker¹, Andrew M. Guess², Pablo Barberá³, Cristian Vaccari⁴, Cristian Vaccari⁵, Alexandra A. Siegel⁶, Alexandra A. Siegel¹, Sergey Sanovich¹, Denis Stukal¹, Brendan Nyhan⁷ - Show less +6 more•Institutions (7)

New York University¹, Princeton University², London School of Economics and Political Science³, Royal Holloway, University of London⁴, University of Bologna⁵, Stanford University⁶, Ford Motor Company⁷

19 Mar 2018-Social Science Research Network

TL;DR: The authors provide an overview of the current state of the literature on the relationship between social media; political polarization; and political "disinformation", a term used to encompass a wide range of types of information about politics found online.

...read moreread less

Abstract: The following report is intended to provide an overview of the current state of the literature on the relationship between social media; political polarization; and political “disinformation,” a term used to encompass a wide range of types of information about politics found online, including “fake news,” rumors, deliberately factually incorrect information, inadvertently factually incorrect information, politically slanted information, and “hyperpartisan” news. The review of the literature is provided in six separate sections, each of which can be read individually but that cumulatively are intended to provide an overview of what is known — and unknown — about the relationship between social media, political polarization, and disinformation. The report concludes by identifying key gaps in our understanding of these phenomena and the data that are needed to address them.

...read moreread less

494 citations

Cites background from "Information credibility on twitter"

...Examples of real-time systems are Botometer (formerly, BotOrNot, Davis et al. 2016), Hoaxy (Shao et al. 2016), TwitterTrails (Finn et al. 2014), RumorLens (Resnick et al. 2014), TweetCred (Castillo et al. 2011), and Truthy (Ratkiewicz et al. 2011b)....
[...]

Journal Article•DOI•

Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set.

[...]

Emily Chen¹, Kristina Lerman¹, Emilio Ferrara¹•Institutions (1)

University of Southern California¹

29 May 2020-JMIR public health and surveillance

TL;DR: The COVID-19-TweetIDs GitHub repository as mentioned in this paper provides a multilingual COVID19 Twitter data set that is made available to the research community via a GitHub repository, with over 60% of the tweets in English.

...read moreread less

Abstract: Background: At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much of the conversation about these phenomena now occurs online on social media platforms like Twitter. Objective: In this paper, we describe a multilingual COVID-19 Twitter data set that we are making available to the research community via our COVID-19-TweetIDs GitHub repository. Methods: We started this ongoing data collection on January 28, 2020, leveraging Twitter’s streaming application programming interface (API) and Tweepy to follow certain keywords and accounts that were trending at the time data collection began. We used Twitter’s search API to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020. Results: Since the inception of our collection, we have actively maintained and updated our GitHub repository on a weekly basis. We have published over 123 million tweets, with over 60% of the tweets in English. This paper also presents basic statistics that show that Twitter activity responds and reacts to COVID-19-related events. Conclusions: It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This data set could also help track COVID-19-related misinformation and unverified rumors or enable the understanding of fear and panic—and undoubtedly more.

...read moreread less

477 citations

Proceedings Article•DOI•

Tweeting is believing?: understanding microblog credibility perceptions

[...]

Meredith Ringel Morris¹, Scott Counts¹, Asta Roseway¹, Aaron Hoff¹, Julia Schwarz² - Show less +1 more•Institutions (2)

Microsoft¹, Carnegie Mellon University²

11 Feb 2012

TL;DR: It is shown that users are poor judges of truthfulness based on content alone, and instead are influenced by heuristics such as user name when making credibility assessments.

...read moreread less

Abstract: Twitter is now used to distribute substantive content such as breaking news, increasing the importance of assessing the credibility of tweets. As users increasingly access tweets through search, they have less information on which to base credibility judgments as compared to consuming content from direct social network connections. We present survey results regarding users' perceptions of tweet credibility. We find a disparity between features users consider relevant to credibility assessment and those currently revealed by search engines. We then conducted two experiments in which we systematically manipulated several features of tweets to assess their impact on credibility ratings. We show that users are poor judges of truthfulness based on content alone, and instead are influenced by heuristics such as user name when making credibility assessments. Based on these findings, we discuss strategies tweet authors can use to enhance their credibility with readers (and strategies astute readers should be aware of!). We propose design improvements for displaying social search results so as to better convey credibility.

...read moreread less

466 citations

Journal Article•DOI•

An overview of online fake news: Characterization, detection, and discussion

[...]

Xichen Zhang¹, Ali A. Ghorbani¹•Institutions (1)

University of New Brunswick¹

01 Mar 2020-Information Processing and Management

TL;DR: A comprehensive overview of the finding to date relating to fake news is presented, characterized the negative impact of online fake news, and the state-of-the-art in detection methods are characterized.

...read moreread less

Abstract: Over the recent years, the growth of online social media has greatly facilitated the way people communicate with each other. Users of online social media share information, connect with other people and stay informed about trending events. However, much recent information appearing on social media is dubious and, in some cases, intended to mislead. Such content is often called fake news. Large amounts of online fake news has the potential to cause serious problems in society. Many point to the 2016 U.S. presidential election campaign as having been influenced by fake news. Subsequent to this election, the term has entered the mainstream vernacular. Moreover it has drawn the attention of industry and academia, seeking to understand its origins, distribution and effects. Of critical interest is the ability to detect when online content is untrue and intended to mislead. This is technically challenging for several reasons. Using social media tools, content is easily generated and quickly spread, leading to a large volume of content to analyse. Online information is very diverse, covering a large number of subjects, which contributes complexity to this task. The truth and intent of any statement often cannot be assessed by computers alone, so efforts must depend on collaboration between humans and technology. For instance, some content that is deemed by experts of being false and intended to mislead are available. While these sources are in limited supply, they can form a basis for such a shared effort. In this survey, we present a comprehensive overview of the finding to date relating to fake news. We characterize the negative impact of online fake news, and the state-of-the-art in detection methods. Many of these rely on identifying features of the users, content, and context that indicate misinformation. We also study existing datasets that have been used for classifying fake news. Finally, we propose promising research directions for online fake news analysis.

...read moreread less

449 citations

…
1
2
3
4
5
6
7
…
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

What is Twitter, a social network or a news media?

[...]

Haewoon Kwak¹, Changhyun Lee¹, Hosung Park¹, Sue Moon¹•Institutions (1)

KAIST¹

26 Apr 2010

TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.

...read moreread less

Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

...read moreread less

6,108 citations

Proceedings Article•DOI•

Earthquake shakes Twitter users: real-time event detection by social sensors

[...]

Takeshi Sakaki¹, Makoto Okazaki¹, Yutaka Matsuo¹•Institutions (1)

University of Tokyo¹

26 Apr 2010

TL;DR: This paper investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event and produces a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location.

...read moreread less

Abstract: Twitter, a popular microblogging service, has received much attention recently. An important characteristic of Twitter is its real-time nature. For example, when an earthquake occurs, people make many Twitter posts (tweets) related to the earthquake, which enables detection of earthquake occurrence promptly, simply by observing the tweets. As described in this paper, we investigate the real-time interaction of events such as earthquakes in Twitter and propose an algorithm to monitor tweets and to detect a target event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of words, and their context. Subsequently, we produce a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location. We consider each Twitter user as a sensor and apply Kalman filtering and particle filtering, which are widely used for location estimation in ubiquitous/pervasive computing. The particle filter works better than other comparable methods for estimating the centers of earthquakes and the trajectories of typhoons. As an application, we construct an earthquake reporting system in Japan. Because of the numerous earthquakes and the large number of Twitter users throughout the country, we can detect an earthquake with high probability (96% of earthquakes of Japan Meteorological Agency (JMA) seismic intensity scale 3 or more are detected) merely by monitoring tweets. Our system detects earthquakes promptly and sends e-mails to registered users. Notification is delivered much faster than the announcements that are broadcast by the JMA.

...read moreread less

3,976 citations

"Information credibility on twitter" refers background in this paper

...to track epidemics [17], detect news events [28], geolocate such events [27], and find controversial emerging controversial topics [24]....
[...]

Proceedings Article•DOI•

Why we twitter: understanding microblogging usage and communities

[...]

Akshay Java¹, Xiaodan Song, Tim Finin¹, Belle L. Tseng•Institutions (1)

University of Maryland, Baltimore County¹

12 Aug 2007

TL;DR: It is found that people use microblogging to talk about their daily activities and to seek or share information and the user intentions associated at a community level are analyzed to show how users with similar intentions connect with each other.

...read moreread less

Abstract: Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. Twitter, a popular microblogging tool has seen a lot of growth since it launched in October, 2006. In this paper, we present our observations of the microblogging phenomena by studying the topological and geographical properties of Twitter's social network. We find that people use microblogging to talk about their daily activities and to seek or share information. Finally, we analyze the user intentions associated at a community level and show how users with similar intentions connect with each other.

...read moreread less

3,025 citations

"Information credibility on twitter" refers background in this paper

...In the table we have separated two broad types of topics: news and conversation, following the broad categories found in [13, 22]....
[...]
...While most messages on Twitter are conversation and chatter, people also use it to share relevant information and to report news [13, 22, 21]....
[...]

Proceedings Article•DOI•

Microblogging during two natural hazards events: what twitter may contribute to situational awareness

[...]

Sarah Vieweg¹, Amanda Lee Hughes¹, Kate Starbird¹, Leysia Palen¹•Institutions (1)

University of Colorado Boulder¹

10 Apr 2010

TL;DR: Analysis of microblog posts generated during two recent, concurrent emergency events in North America via Twitter, a popular microblogging service, aims to inform next steps for extracting useful, relevant information during emergencies using information extraction (IE) techniques.

...read moreread less

Abstract: We analyze microblog posts generated during two recent, concurrent emergency events in North America via Twitter, a popular microblogging service. We focus on communications broadcast by people who were "on the ground" during the Oklahoma Grassfires of April 2009 and the Red River Floods that occurred in March and April 2009, and identify information that may contribute to enhancing situational awareness (SA). This work aims to inform next steps for extracting useful, relevant information during emergencies using information extraction (IE) techniques.

...read moreread less

1,479 citations

Additional excerpts

...Twitter has been used widely during emergency situations, such as wildfires [6], hurricanes [12], floods [32, 33, 31] and earthquakes [15, 7]....
[...]

Proceedings Article•DOI•

Finding high-quality content in social media

[...]

Eugene Agichtein¹, Carlos Castillo², Debora Donato², Aristides Gionis², Gilad Mishne² - Show less +1 more•Institutions (2)

Emory University¹, Yahoo!²

11 Feb 2008

TL;DR: This paper introduces a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition, and shows that its system is able to separate high-quality items from the rest with an accuracy close to that of humans.

...read moreread less

Abstract: The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans

...read moreread less

1,300 citations