Information credibility on twitter

doi:10.1145/1963405.1963500

Home
/
Papers
/
Information credibility on twitter

Proceedings Article•DOI•

Information credibility on twitter

Carlos Castillo¹, Marcelo Mendoza², Barbara Poblete³•Institutions (3)

Yahoo!¹, Federico Santa María Technical University², University of Chile³

28 Mar 2011-pp 675-684

TL;DR: There are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

read less

Abstract: We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally.On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to "trending" topics, and classify them as credible or not credible, based on features extracted from them. We use features from users' posting and re-posting ("re-tweeting") behavior, from the text of the posts, and from citations to external sources.We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Processing Social Media Messages in Mass Emergency: A Survey

[...]

Muhammad Imran¹, Carlos Castillo¹, Fernando Diaz², Sarah Vieweg¹•Institutions (2)

Qatar Computing Research Institute¹, Microsoft²

26 Jun 2015-ACM Computing Surveys

TL;DR: This survey surveys the state of the art regarding computational methods to process social media messages and highlights both their contributions and shortcomings, and methodically examines a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries.

...read moreread less

Abstract: Social media platforms provide active communication channels during mass convergence and emergency events such as disasters caused by natural hazards. As a result, first responders, decision makers, and the public can use this information to gain insight into the situation as it unfolds. In particular, many social media messages communicated during emergencies convey timely, actionable information. Processing social media messages to obtain such information, however, involves solving multiple challenges including: parsing brief and informal messages, handling information overload, and prioritizing different types of information found in messages. These challenges can be mapped to classical information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. We survey the state of the art regarding computational methods to process social media messages and highlight both their contributions and shortcomings. In addition, we examine their particularities, and methodically examine a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries. Research thus far has, to a large extent, produced methods to extract situational awareness information from social media. In this survey, we cover these various approaches, and highlight their benefits and shortcomings. We conclude with research challenges that go beyond situational awareness, and begin to look at supporting decision making and coordinating emergency-response actions.

...read moreread less

710 citations

Cites background from "Information credibility on twitter"

...e.g. the public and formal response organizations) [Hiltz et al. 2011]. Automatic classiﬁcation can be used to ﬁlter out content that is unlikely to be considered credible [Gupta and Kumaraguru 2012; Castillo et al. 2011]. Additionally, the public itself can be mobilized to conﬁrm or discredit a claim through crowdsourcing [Popoola et al. 2013]. 8.2. Beyond Data Processing Designing with the users. Once social media ...
[...]

Journal Article•DOI•

A Survey of Techniques for Event Detection in Twitter

[...]

Farzindar Atefeh, Wael Khreich

01 Feb 2015

TL;DR: A survey of techniques for event detection from Twitter streams aimed at finding real‐world occurrences that unfold over space and time and highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features.

...read moreread less

Abstract: Twitter is among the fastest-growing microblogging and online social networking services. Messages posted on Twitter tweets have been reporting everything from daily life stories to the latest local and global news and events. Monitoring and analyzing this rich and continuous user-generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire actionable knowledge. This article provides a survey of techniques for event detection from Twitter streams. These techniques aim at finding real-world occurrences that unfold over space and time. In contrast to conventional media, event detection from Twitter streams poses new challenges. Twitter streams contain large amounts of meaningless messages and polluted content, which negatively affect the detection performance. In addition, traditional text mining techniques are not suitable, because of the short length of tweets, the large number of spelling and grammatical errors, and the frequent use of informal and mixed language. Event detection techniques presented in literature address these issues by adapting techniques from various fields to the uniqueness of Twitter. This article classifies these techniques according to the event type, detection task, and detection method and discusses commonly used features. Finally, it highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features.

...read moreread less

710 citations

Cites background from "Information credibility on twitter"

...However, Twitter streams contain large amounts of meaningless messages (pointless babbles) (Hurlock and Wilson 2011) and rumors (Castillo et al. 2011)....
[...]
...In addition, Twitter streams contain large amounts of meaningless messages (Hurlock and Wilson 2011), polluted content (Lee et al. 2011), and rumors (Castillo et al. 2011), which negatively affect the performance of the detection algorithms....
[...]
...2011), and rumors (Castillo et al. 2011), which negatively affect the performance of the detection algorithms....
[...]
...This assumption is clearly violated in Twitter data streams, where relevant events are buried in large amounts of noisy data (Becker et al. 2011b; Castillo et al. 2011; Hurlock and Wilson 2011; Lee et al. 2011)....
[...]

Proceedings Article•DOI•

Prominent Features of Rumor Propagation in Online Social Media

[...]

Sejeong Kwon, Meeyoung Cha, Kyomin Jung¹, Wei Chen², Yajun Wang² - Show less +1 more•Institutions (2)

Seoul National University¹, Microsoft²

01 Dec 2013

TL;DR: A new periodic time series model that considers daily and external shock cycles, where the model demonstrates that rumor likely have fluctuations over time, and key structural and linguistic differences in the spread of rumors and non-rumors are identified.

...read moreread less

Abstract: The problem of identifying rumors is of practical importance especially in online social networks, since information can diffuse more rapidly and widely than the offline counterpart. In this paper, we identify characteristics of rumors by examining the following three aspects of diffusion: temporal, structural, and linguistic. For the temporal characteristics, we propose a new periodic time series model that considers daily and external shock cycles, where the model demonstrates that rumor likely have fluctuations over time. We also identify key structural and linguistic differences in the spread of rumors and non-rumors. Our selected features classify rumors with high precision and recall in the range of 87% to 92%, that is higher than other states of the arts on rumor classification.

...read moreread less

699 citations

Cites background or methods from "Information credibility on twitter"

...In contrast, our proposed features are drawn from extensive theories in social psychology and are almost non-overlapping with the feature set of [3], hence providing a complementary view to the problem....
[...]
...Table VI: The average performance for each classification method: B (baseline with 15 features from [3]), S1 (proposed in this paper with 11 features), and C (combinaton of baseline and our proposed method with 27 features)...
[...]
...In order to test whether the selected features are effective classifiers, we adopted 15 features that were used in [3] as described in Table V....
[...]
...Table V: Features for determining credibility of information described in [3], used as baseline....
[...]
...[3], which proposed a set of features to assess the credibility of social media content....
[...]

Proceedings Article•DOI•

EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection

[...]

Yaqing Wang¹, Fenglong Ma¹, Zhiwei Jin², Ye Yuan³, Guangxu Xun¹, Kishlay Jha¹, Lu Su¹, Jing Gao¹ - Show less +4 more•Institutions (3)

University at Buffalo¹, Chinese Academy of Sciences², Beijing University of Technology³

19 Jul 2018

TL;DR: An end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events, is proposed.

...read moreread less

Abstract: As news reading on social media becomes more and more popular, fake news becomes a major issue concerning the public and government. The fake news can take advantage of multimedia content to mislead readers and get dissemination, which can cause negative effects or even manipulate the public events. One of the unique challenges for fake news detection on social media is how to identify fake news on newly emerged events. Unfortunately, most of the existing approaches can hardly handle this challenge, since they tend to learn event-specific features that can not be transferred to unseen events. In order to address this issue, we propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events. It consists of three main components: the multi-modal feature extractor, the fake news detector, and the event discriminator. The multi-modal feature extractor is responsible for extracting the textual and visual features from posts. It cooperates with the fake news detector to learn the discriminable representation for the detection of fake news. The role of event discriminator is to remove the event-specific features and keep shared features among events. Extensive experiments are conducted on multimedia datasets collected from Weibo and Twitter. The experimental results show our proposed EANN model can outperform the state-of-the-art methods, and learn transferable feature representations.

...read moreread less

627 citations

Cites background from "Information credibility on twitter"

...Textual features are statistical or semantic features extracted from text content of posts, which have been explored in many literatures of fake news detection [4, 11, 19, 27]....
[...]

Proceedings Article•DOI•

Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy

[...]

Aditi Gupta¹, Hemank Lamba², Ponnurangam Kumaraguru¹, Anupam Joshi³•Institutions (3)

Indraprastha Institute of Information Technology¹, IBM², University of Maryland, Baltimore County³

13 May 2013

TL;DR: The role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster was highlighted, and automated techniques can be used in identifying real images from fake images posted on Twitter.

...read moreread less

Abstract: In today's world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events, it can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper, is to highlight the role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter, during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that top thirty users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very less (only 11%) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97% accuracy in predicting fake images from real. Also, tweet based features were very effective in distinguishing fake images tweets from real, while the performance of user based features was very poor. Our results, showed that, automated techniques can be used in identifying real images from fake images posted on Twitter.

...read moreread less

586 citations

Cites background from "Information credibility on twitter"

...showed that automated classification techniques can be used to detect news topics from conversational topics and assessed their credibility based on various Twitter features [5]....
[...]

1
2
3
4
5
…
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

What is Twitter, a social network or a news media?

[...]

Haewoon Kwak¹, Changhyun Lee¹, Hosung Park¹, Sue Moon¹•Institutions (1)

KAIST¹

26 Apr 2010

TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.

...read moreread less

Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

...read moreread less

6,108 citations

Proceedings Article•DOI•

Earthquake shakes Twitter users: real-time event detection by social sensors

[...]

Takeshi Sakaki¹, Makoto Okazaki¹, Yutaka Matsuo¹•Institutions (1)

University of Tokyo¹

26 Apr 2010

TL;DR: This paper investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event and produces a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location.

...read moreread less

Abstract: Twitter, a popular microblogging service, has received much attention recently. An important characteristic of Twitter is its real-time nature. For example, when an earthquake occurs, people make many Twitter posts (tweets) related to the earthquake, which enables detection of earthquake occurrence promptly, simply by observing the tweets. As described in this paper, we investigate the real-time interaction of events such as earthquakes in Twitter and propose an algorithm to monitor tweets and to detect a target event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of words, and their context. Subsequently, we produce a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location. We consider each Twitter user as a sensor and apply Kalman filtering and particle filtering, which are widely used for location estimation in ubiquitous/pervasive computing. The particle filter works better than other comparable methods for estimating the centers of earthquakes and the trajectories of typhoons. As an application, we construct an earthquake reporting system in Japan. Because of the numerous earthquakes and the large number of Twitter users throughout the country, we can detect an earthquake with high probability (96% of earthquakes of Japan Meteorological Agency (JMA) seismic intensity scale 3 or more are detected) merely by monitoring tweets. Our system detects earthquakes promptly and sends e-mails to registered users. Notification is delivered much faster than the announcements that are broadcast by the JMA.

...read moreread less

3,976 citations

"Information credibility on twitter" refers background in this paper

...to track epidemics [17], detect news events [28], geolocate such events [27], and find controversial emerging controversial topics [24]....
[...]

Proceedings Article•DOI•

Why we twitter: understanding microblogging usage and communities

[...]

Akshay Java¹, Xiaodan Song, Tim Finin¹, Belle L. Tseng•Institutions (1)

University of Maryland, Baltimore County¹

12 Aug 2007

TL;DR: It is found that people use microblogging to talk about their daily activities and to seek or share information and the user intentions associated at a community level are analyzed to show how users with similar intentions connect with each other.

...read moreread less

Abstract: Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. Twitter, a popular microblogging tool has seen a lot of growth since it launched in October, 2006. In this paper, we present our observations of the microblogging phenomena by studying the topological and geographical properties of Twitter's social network. We find that people use microblogging to talk about their daily activities and to seek or share information. Finally, we analyze the user intentions associated at a community level and show how users with similar intentions connect with each other.

...read moreread less

3,025 citations

"Information credibility on twitter" refers background in this paper

...In the table we have separated two broad types of topics: news and conversation, following the broad categories found in [13, 22]....
[...]
...While most messages on Twitter are conversation and chatter, people also use it to share relevant information and to report news [13, 22, 21]....
[...]

Proceedings Article•DOI•

Microblogging during two natural hazards events: what twitter may contribute to situational awareness

[...]

Sarah Vieweg¹, Amanda Lee Hughes¹, Kate Starbird¹, Leysia Palen¹•Institutions (1)

University of Colorado Boulder¹

10 Apr 2010

TL;DR: Analysis of microblog posts generated during two recent, concurrent emergency events in North America via Twitter, a popular microblogging service, aims to inform next steps for extracting useful, relevant information during emergencies using information extraction (IE) techniques.

...read moreread less

Abstract: We analyze microblog posts generated during two recent, concurrent emergency events in North America via Twitter, a popular microblogging service. We focus on communications broadcast by people who were "on the ground" during the Oklahoma Grassfires of April 2009 and the Red River Floods that occurred in March and April 2009, and identify information that may contribute to enhancing situational awareness (SA). This work aims to inform next steps for extracting useful, relevant information during emergencies using information extraction (IE) techniques.

...read moreread less

1,479 citations

Additional excerpts

...Twitter has been used widely during emergency situations, such as wildfires [6], hurricanes [12], floods [32, 33, 31] and earthquakes [15, 7]....
[...]

Proceedings Article•DOI•

Finding high-quality content in social media

[...]

Eugene Agichtein¹, Carlos Castillo², Debora Donato², Aristides Gionis², Gilad Mishne² - Show less +1 more•Institutions (2)

Emory University¹, Yahoo!²

11 Feb 2008

TL;DR: This paper introduces a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition, and shows that its system is able to separate high-quality items from the rest with an accuracy close to that of humans.

...read moreread less

Abstract: The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans

...read moreread less

1,300 citations