scispace - formally typeset

Book ChapterDOI

A Comparative Study of Microblogs Features Effectiveness for the Identification of Prominent Microblog Users During Unexpected Disaster

28 Oct 2015-pp 15-26

TL;DR: Results show that on- and off-topical user activities features are the most representative features for identifying prominent users in a disaster context and that SVM outperforms the ANN learning algorithm for this classification context especially when it is trained with additional spatial features.

AbstractThis paper presents a learning-based approach for the selection of relevant feature categories in the context of information retrieval from microblogs during unexpected disasters. Our information retrieval strategy consists of identifying prominent microblog users who are susceptible to share relevant and exclusive information in a disaster case. To identify these users, we evaluate the effectiveness of the state-of-the-art features characterizing microblog users for the identification of prominent users in a specific context. We experimented with a different sets of feature categories to determine those that discriminate prominent users sets from non-prominent ones interacting in Twitter during the 2014 Herault floods that occurred in France. The achieved results show that on- and off-topical user activities features are the most representative features for identifying prominent users in a disaster context. We also note that SVM outperforms the ANN learning algorithm for this classification context especially when it is trained with additional spatial features.

Topics: Context (language use) (52%), Microblogging (51%)

Summary (3 min read)

1 Introduction

  • The climate change unleashes a multitude of unexpected disaster characteristics and effects that have never been perceived in their planet.
  • Heat waves in summer, winter without snow, climate disruption, floods in some regions of Europe while other neighboring regions suffer from terrible droughts.
  • People from surrounding areas can provide nearly real-time observations about disaster scenes by interacting in microblogs.
  • The authors aim at evaluating the effectiveness of both the state-of-theart and their prior proposed features describing microblog users for the identification of prominent microblog users in the context of unexpected disasters.
  • Experimental results are presented in Section 4.

3 User Modeling Using Microblogs Features

  • The identification of prominent users problem in the context of disasters can be casted into a binary classification problem.
  • Many supervised learning algorithms can be used to learn the classification model for this purpose.
  • The performance of the used algorithms is potentially associated to the strength of the selected features to model the user behavior in a disaster context.
  • The authors have split these features into five broad categories: profile features (PrF), on-topical features (OnAF), off-topical features (OfAF), spatial features (SpF) and social network structure features (SnF).
  • The rest of this section describes in depth these main categories of user features.

3.1 Profile Features

  • Profile Features (PrF) characterize the user profile description.
  • These features are extractable from any user profile using Twitter APIs.
  • At the first sight, the authors can note that P2 and P1 may be descriptive for prominent users during disasters.
  • P7 and P8 are generally used in order to detect celebrities and domain experts.
  • Moreover, P4 and P5 which refer to the user activeness in the network are studied in order to evaluate if users who are generally active in the network may be prominent during unexpected disasters or not and vice versa.

3.2 User Activity Features

  • Microblog users can express what they are seeing, hearing and experiencing during a disaster using different nature of tweets:.
  • These tweets are expressed by a simple content which do not include any retweet or mention symbols.
  • Moreover, in order to differentiate prominent users activities from nonprominent ones, the authors analyze both the user’s on-topic tweets related to the disaster and the off-topic ones.
  • The categorization of on and off-topic user activities was proposed in their prior work [14] under the assumption that users affected by the disaster would be interested only by the disaster news and would neglect any other off-topical information diffused in the network.
  • Thus, users who share non valuable contents could be penalized.

3.3 Spatial Features

  • Spatial Features (SpF) characterize microblog users according to their assigned location and geolocation regarding the threatened disaster zone.
  • Such features may be essential to determine which are the users geolocated in the disaster zone and who may play the role of sensors to diffuse information about what is really happening on the ground.
  • The following spatial features described in Table 3 are studied in the context of disasters : S2 measures the inclusion rate of the geo-coordinates related to the user shared tweets are included in the territory threatened by the disaster or not.

3.4 Network Structure Features

  • Many works have explored the microblogs structure features in order to identify mostly influential and popular users.
  • All of these works have used mainly time consuming algorithms that are not feasible in real time and unsuitable for the disaster context.
  • Moreover, the used profile features referring to the number of user’s followers and followees may promote popular users who are toggling between several topic and who are sharing outdated information and neglect real prominent users having a small number of connections in the network.
  • Thus, in this category of features, the authors focus only on the user followers and followees who are interacting about the disaster.
  • Table 4 presents the network structure features studied in this paper.

4.1 Dataset

  • To conduct experimental performance evaluation on real data, the authors collected most of the tweets shared during the floods that have occurred from 29th to 30th September 2014 in the Herault area, situated in the south of France.
  • The flooded area witnessed record-shattering 252mm of rainfall in just three hours, causing important damages estimated between 500 and 600 million Euros.
  • Data collection was processed using their multi-agent System called MASIR [18].
  • The system then crawls all the on-topic and off-topic tweets shared by the detected users from 29th September at 00:00AM to 1st October at 00:00AM.
  • These volunteers labeled each user as one of the two classes: C1 for prominent users, or C2 for non-prominent ones.

4.2 Evaluation Tools and Metrics

  • The authors describe in this section the methodologies used to evaluate the effectiveness of the different categories of features for identifying prominent microblog users in a disaster context.
  • Support Vector Machine (SVM) [19] and Artificial Neural Networks (ANN) [20] are used for this study.
  • Using these algorithms, the authors tested all the combination of feature categories that may represent prominent microblog users interacting during a disaster.
  • In order to deal with the unbalanced data classification problem, the authors gave a more important weight to the class C1 of prominent users (W1 = 10) than the class C2 of non prominent users (W2 = 1).

4.3 Feature Categories Effectiveness

  • In order to select the most representative feature categories for microblog prominent users in a disaster, the authors evaluate the effectiveness of each category of the state-of-the-art features separately.
  • The remaining categories of features have yielded poor results and failed to identify microblog prominent users.
  • Therefore, the authors study the effectiveness of these categories when they are combined with the OnAF feature category.
  • The authors note that combining the two categories of features OnAF and OfAF improves the identification results using both ANN and SVM.
  • The identification of prominent users results using the new categories of features combination are reported in Table 8.

5 Discussion

  • The obtained results in this study have led us to conclude on the importance of using the on-topical and off-topical activities features categories and the spatial features category to learn an efficient classification SVM model identifying prominent microblog users during real-time disasters.
  • On- and off-topical features are extremely useful in disaster management scenarios where prominent users focus mainly in sharing disaster-related information.
  • Thus, using off-topical activity features, users toggling between different topics will be penalized.
  • These results have been validated using the Herault Floods database.
  • An open access to Twitter would be necessary to evaluate these features using more microblogs disasters databases which could not be afforded.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

A Comparative Study of Microblogs Features
Effectiveness for the Identification of Prominent
Microblog Users during Unexpected Disaster
Imen Bizid
1,2
, Nibal Nayef
1
, Oumayma Naoui
2
, Patrice Boursier
1,3
, and Sami Faiz
2
1
L3i Laboratory, University of La Rochelle, La Rochelle, France
{imen.bizid,patrice.boursier,nibal.nayef}@univ-lr.fr
2
LTSIRS Laboratory, Tunis, Tunisia
sami.faiz@insat.rnu.tn
3
IUMW, Kuala Lumpur, Malaysia
patrice@iumw.edu.my
Abstract. This paper presents a learning-based approach for the selec-
tion of relevant feature categories in the context of information retrieval
from microblogs during unexpected disasters. Our information retrieval
strategy consists of identifying prominent microblog users who are sus-
ceptible to share relevant and exclusive information in a disaster case. To
identify these users, we evaluate the effectiveness of the state-of-the-art
features characterizing microblog users for the identification of promi-
nent users in a specific context. We experimented with a different sets of
feature categories to determine those that discriminate prominent users
sets from non-prominent ones interacting in Twitter during the 2014 Her-
ault floods that occurred in France. The achieved results show that on-
and off-topical user activities features are the most representative fea-
tures for identifying prominent users in a disaster context. We also note
that SVM outperforms the ANN learning algorithm for this classification
context especially when it is trained with additional spatial features.
Key words: Effectiveness of feature categories, prominent microblog
users, disaster management
1 Introduction
The climate change unleashes a multitude of unexpected disaster characteristics
and effects that have never been perceived in our planet. Heat waves in summer,
winter without snow, climate disruption, floods in some regions of Europe while
other neighboring regions suffer from terrible droughts. Climate change manifests
itself in diverse unexpected forms. Such phenomena still turn into disasters,
causing irreversible damages in many places of our planet.
The challenges of managing such disasters are related especially to situation
awareness and real-time information collection. The need for emergency teams to
go on the disaster affected zones, risking their lives, in order to collect information

2 Imen Bizid, Nibal Nayef, Oumayma Naoui, Patrice Boursier, and Sami Faiz
about what is taking place diminishes greatly. People from surrounding areas
can provide nearly real-time observations about disaster scenes by interacting in
microblogs. Citizens in the affected zones can share information about what they
are experiencing; watching or hearing during a disaster. These microblogging
platforms represent a rich source of information fundamental to have an accurate
insight into what is happening on the ground in order to efficiently manage these
unexpected disasters.
Although these microblogs such as Twitter provide many specificities (e.g.
number of favorites, number of retweets of a tweet, etc.) reflecting other mi-
croblog users feedback regarding the shared information, it is still challenging to
retrieve relevant and exclusive information from the huge amount of shared data.
These microblogs specificities remain inaccurate as they refer mostly to the in-
formation shared by popular users independently of the relevance and freshness
of their content. Therefore, it is more rational to associate the relevance and the
quality of the shared information with user’s prominence during the disaster.
By tracking prominent microblog users who are sharing relevant and exclu-
sive information during an unexpected event, emergency first responders can
have a real-time global view of what is happening in the threatened or affected
areas. The identification of these key users have been widely explored in the
context of influencers and domain experts identification. However, it has never
been explored in the context of prominent users identification during unexpected
disasters.
Prominent users, in the context of this paper, refer to microblog users who are
susceptible to share exclusive and relevant information during a given unexpected
event. Finding such users depends generally on the effectiveness of the selected
categories of features describing these microblog users according to the specific
context.
In this paper, we aim at evaluating the effectiveness of both the state-of-the-
art and our prior proposed features describing microblog users for the identifica-
tion of prominent microblog users in the context of unexpected disasters. This
study focuses on the selection of the most descriptive categories of features that
may lead us to the identification of these microblogs key users during disasters.
The rest of this paper is organized as follows. Section 2 reviews related works
for identifying prominent microblog users. Section 3 describes the different cate-
gories of features evaluated in this paper. Experimental results are presented in
Section 4. The experiments are discussed in Section 5. Finally, we conclude this
paper with direction to future work in Section 6.
2 Related Work
Current Information retrieval systems in microblogs for disaster management are
mostly based on the content analysis of microbogs posts. Tweak the tweet sys-
tem [1] provided a hashtag based syntax to make text mining of the huge amount
of information shared in microblogs during disasters easily processed. Imran et
al. [2] proposed a classification model for disaster-related information extraction

Prominent microblog users Identification 3
by analyzing tweets text content. MicroFilters system [3] extracted the valuable
disaster-related-images shared in microblogs based on image analysis techniques.
These systems have yielded promising results for the identification and classifi-
cation of disaster-related-content. However, they are computationally expensive,
on the one hand, and they are still sensitive to redundant and outdated informa-
tion on the other hand. Moreover, it is more logical to identify prominent users
that may share relevant and exclusive information during the disaster and track
them in order to access in real time to their shared disaster-related-information.
To the best of our knowledge, the issue of prominent users’ identification
has never been studied in the context of disasters. However, it has been widely
explored in the different contexts defining key users as influential users in the
network or as domain experts who are active and popular in a specific topic or
domain [4, 5, 6].
Existing approaches for the identification of social media influencers are based
on standard centrality measures such as eigenvector centrality and its variants
HITS [7] and PageRank [8]. These adapted measures to microblogs specificities
(e.g. number of tweets, mentions, retweets ...) are computationally expensive and
sensitive to well-connected users (e.g. celebrities, communication channels...) [6].
Therefore, these approaches could not be used in real time scenarios, on the one
hand, and they could not lead us to identify users sharing fresh information
during unexpected disasters on the other hand.
Apart from the above research studies, domain experts identification has
been explored using supervised and unsupervised learning techniques based on
a set of features describing the activities of users regarding only the particular
analyzed topics [9, 10, 11]. IA-Rank [12] ranked users based on the features
characterizing how the user name is amplified via mentions, replies or retweets
by other users. Pal et al. [9] proposed a new identification model using a set of
features characterizing microblog users according to the different nature of their
activities and their social position in the network. Xianlei et al. [10] proposed
a Gradient Boosted Decision Tree to identify domain experts in Sina Microblog
based on profile and tweeting behavior features.
These features have yielded promising results in the identification of domain
experts. However, they have never been explored in the context of the identifi-
cation of prominent users during disasters. Hence, in this paper, we evaluate the
effectiveness of the different categories of both state-of-the-art features and our
prior proposed features [13, 14] in a disaster context.
3 User Modeling Using Microblogs Features
The identification of prominent users problem in the context of disasters can be
casted into a binary classification problem. Many supervised learning algorithms
can be used to learn the classification model for this purpose. The performance
of the used algorithms is potentially associated to the strength of the selected
features to model the user behavior in a disaster context. The more the features

4 Imen Bizid, Nibal Nayef, Oumayma Naoui, Patrice Boursier, and Sami Faiz
are representative for the prominent and non-prominent users behavior during
a disaster the more the learned classifiers are efficient.
To learn the classification model, we study a large set of the state-of-the-art
features and some new features proposed in our prior work [14, 13]. The listed
features may reflect the behavior and the importance of each user interacting
about the disaster. We have split these features into five broad categories: profile
features (PrF), on-topical features (OnAF), off-topical features (OfAF), spatial
features (SpF) and social network structure features (SnF). The rest of this
section describes in depth these main categories of user features.
3.1 Profile Features
Profile Features (PrF) characterize the user profile description. This description
is registered by the user himself (e.g. location, domains of interest...) or generated
automatically by the microblogging service in order to report the user activeness
rate in the network (e.g. Number of collected favorites, Number of followers...).
Table 1 presents the set of user profile features. These features are extractable
from any user profile using Twitter APIs.
Table 1. Profile Features (PrF) from the microblogging platform Twitter.
Name Features
P1 Certified user [10]
P2 Enabled geolocation [14]
P3 Protected [10]
P4 Number of produced tweets [10]
P5 Number of collected favorites [14]
P6 Creation date of the Twitter account [14]
P7 Number of followers [10]
P8 Number of followees [10]
PrF give a general representation of each user independently of his activeness
rate during the disaster. At the first sight, we can note that P2 and P1 may be
descriptive for prominent users during disasters. P2 refers to the user information
precision during the disaster where the geographic information is important.
P1 can be used as a strong proof or indicator to evaluate the veracity of the
information shared by each user. P7 and P8 are generally used in order to detect
celebrities and domain experts. These features are evaluated in order to study
if there is a correlation between the user’s popularity in the network and their
prominence during unexpected disasters. Moreover, P4 and P5 which refer to the
user activeness in the network are studied in order to evaluate if users who are
generally active in the network may be prominent during unexpected disasters
or not and vice versa.

Prominent microblog users Identification 5
3.2 User Activity Features
Microblog users can express what they are seeing, hearing and experiencing dur-
ing a disaster using different nature of tweets:
User’s own produced tweets are original tweets shared by the profile owner. These
tweets are expressed by a simple content which do not include any retweet or
mention symbols.
Mention tweets are tweets destined specially to particular users to make them
aware about a particular information. These tweets include the @ symbol fol-
lowed by the name of users to whom the tweet content is destined.
Repeated tweets are original tweets posted by someone else and rebroadcasted by
the user in order to share it with his followers. These tweets are informally called
retweets and can be identified by the RT@username label that is automatically
inserted at the beginning of the tweet.
All these three TYPES of tweets can refer to valuable contents that are
indispensable to manage unexpected disasters. Hence, we need to analyze any
nature of tweets shared by users interacting during a disaster in order to identify
the prominent ones.
Moreover, in order to differentiate prominent users activities from non-
prominent ones, we analyze both the user’s on-topic tweets related to the disaster
and the off-topic ones. The categorization of on and off-topic user activities was
proposed in our prior work [14] under the assumption that users affected by the
disaster would be interested only by the disaster news and would neglect any
other off-topical information diffused in the network.
Thus, we divide the different user activities features during the disaster into
two categories On-topic Activities Features (OnAF) and Off-topic ones (OfAF).
These feature categories are measured respectively according to the user on-topic
and off-topic activities:
On-topic an activity is considered on-topic when it contains a subset of a
list of keywords and hashtags which are defined to describe the unexpected
disaster under consideration
Off-topic an off-topic activity refers to any activity that was not recorded as
an on-topic one
Additionally, in this paper we assume that tweets referring to the disaster and
including at least one keyword reflecting non-serious or non-valuable contents
(e.g. advertising or joke words and symbols such as sale, rent, pub, lol and so
on), it will be directly recorded as an off-topic one. Thus, users who share non
valuable contents could be penalized.
Our rationale behind the extraction of on-topic and off-topic activities is
based on penalizing users who are toggling among several topics, and who may
share outdated information. Using this strategy, users are evaluated based on
their impact on the analyzed disaster, and on the strength of their attachment
to that disaster. For example, Top news outlets sharing news about several topics
are penalized as they do not focus mainly on the analyzed disaster.

Citations
More filters

Journal ArticleDOI
Qi Li1, Cong Wei1, Jianning Dang1, Lei Cao2, Li Liu1 
TL;DR: The findings suggest that the public’s feedback on COVID-19 predated official accounts on the microblog platform, and there were clear differences in the trending events that large users (users with many fans and readings) and common users paid attention to during each phase of CO VID-19.
Abstract: Objective: Coronavirus disease 2019 (COVID-19) has caused substantial panic worldwide since its outbreak in December 2019. This study uses social networks to track the evolution of public emotion during COVID-19 in China and analyzes the root causes of these public emotions from an event-driven perspective. Methods: A dataset was constructed using microblogs (n = 125,672) labeled with COVID-19-related super topics (n = 680) from 40,891 users from 1 December 2019 to 17 February 2020. Based on the skeleton and key change points of COVID-19 extracted from microblogging contents, we tracked the public’s emotional evolution modes (accumulated emotions, emotion covariances, and emotion transitions) by time phase and further extracted the details of dominant social events. Results: Public emotions showed different evolution modes during different phases of COVID-19. Events about the development of COVID-19 remained hot, but generally declined, and public attention shifted to other aspects of the epidemic (e.g., encouragement, support, and treatment). Conclusions: These findings suggest that the public’s feedback on COVID-19 predated official accounts on the microblog platform. There were clear differences in the trending events that large users (users with many fans and readings) and common users paid attention to during each phase of COVID-19.

10 citations


Cites background from "A Comparative Study of Microblogs F..."

  • ...In addition, Reference [18] presented a learning-based approach to identify prominent microblog users susceptible to sharing relevant and exclusive information in a disaster case....

    [...]


Dissertation
13 Dec 2016
TL;DR: The different proposed approaches leading to the prediction of prominent users who are susceptible to share the targeted relevant and exclusive information on one hand and enabling emergency responders to have a real-time access to the required information in all formats on the other hand are detailed.
Abstract: During crisis events such as disasters, the need of real-time information retrieval (IR) from microblogs remains inevitable. However, the huge amount and the variety of the shared information in real time during such events over-complicate this task. Unlike existing IR approaches based on content analysis, we propose to tackle this problem by using user-centricIR approaches with solving the wide spectrum of methodological and technological barriers inherent to : 1) the collection of the evaluated users data, 2) the modeling of user behavior, 3) the analysis of user behavior, and 4) the prediction and tracking of prominent users in real time. In this context, we detail the different proposed approaches in this dissertation leading to the prediction of prominent users who are susceptible to share the targeted relevant and exclusive information on one hand and enabling emergency responders to have a real-time access to the required information in all formats (i.e. text, image, video, links) on the other hand. These approaches focus on three key aspects of prominent users identification. Firstly, we have studied the efficiency of state-of-the-art and new proposed raw features for characterizing user behavior during crisis events. Based on the selected features, we have designed several engineered features qualifying user activities by considering both their on-topic and off-topic shared information. Secondly, we have proposed a phase-aware user modeling approach taking into account the user behavior change according to the event evolution over time. This user modeling approach comprises the following new novel aspects (1) Modeling microblog users behavior evolution by considering the different event phases (2) Characterizing users activity over time through a temporal sequence representation (3) Time-series-based selection of the most discriminative features characterizing users at each event phase. Thirdly, based on this proposed user modeling approach, we train various prediction models to learn to differentiate between prominent and non-prominent users behavior during crisis event. The learning task has been performed using SVM and MoG-HMMs supervised machine learning algorithms. The efficiency and efficacy of these prediction models have been validated thanks to the data collections extracted by our multi-agents system MASIR during two flooding events who have occured in France and the different ground-truths related to these collections.

7 citations


Cites methods from "A Comparative Study of Microblogs F..."

  • ...First, we describe the MASIR extraction module designed for boosting historic Twitter data access....

    [...]

  • ...Second, we present the MASIR tracking module for real-time...

    [...]

  • ...66 3.4.1 MASIR Tracking Principle . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.2 MASIR Tracking Agents Role . . . . . . . . . . . . . . . . . . . . ....

    [...]

  • ...66 3.4 MASIR for Real-time Tracking of Key Microblog Users . . . . . . . . . . ....

    [...]

  • ...In chapter 3, we present a modular Multi-Agent System for Information extraction and Retrieval (MASIR)....

    [...]


Journal ArticleDOI
Abstract: The purpose of this paper is to investigate how government affairs micro-blog (also referred to as GAM) are applied to the disclosure of government emergency information in China, to identify its existing problems and to provide solutions.,In this paper, online research, case analysis and other methods were used to analyze the application status of China’s Government micro-blog in emergency information disclosure in recent years. Based on the relevant data and cases, a systematic theoretical research is conducted according to the established research framework.,There are some problems in the application of GAM to crisis management, such as insufficient information dissemination, incomplete information disclosure, fragmentation of information and lack of dynamic updating and communication. So, it is necessary to strengthen the organization and management of GAM, establish a perfect emergency management mechanism of GAM, increase the positive influence of GAM on public opinions and establish an evaluation accountability system of administrative micro-blog management.,The analysis of the application of GAM to the disclosure of government emergency information and the proposed strategies for improving its performance are all original, and they are both meaningful to more effective usage of GAM and facilitation of government emergency information disclosure.

3 citations


Book ChapterDOI
01 Jan 2017
TL;DR: This chapter proposes a new approach for microblog information retrieval during unexpected disasters by identifying prominent microblog users who are susceptible to share relevant and exclusive information during a specific disaster.
Abstract: This chapter proposes a new approach for microblog information retrieval during unexpected disasters. This approach consists of identifying prominent microblog users who are susceptible to share relevant and exclusive information during a specific disaster. By tracking these users, emergency first responders would benefit from a direct access to the valuable information shared in real time in microblogs. In order to identify such users, we represent each microblog user according to his behavior at each particular disaster phase. Through the proposed users’ representation, different prediction models are learned in order to identify prominent users at an early stage of each disaster phase. We experimented with different user representations, taking into account both the microblog user behavior and disaster context specificities. We also analyzed the importance of the different microblog users’ features categories according to the disaster phase context. The achieved experimental results show the efficiency of our phase-aware-user characterization approach. Microblogs Information Retrieval for Disaster Management: Identification of Prominent Microblog Users in the Context of Disasters

2 citations


Cites background from "A Comparative Study of Microblogs F..."

  • ...The already collected information related to standard disasters is not anymore sufficient to deal with these unanticipated disasters patterns (Bizid et al., 2015)....

    [...]


References
More filters

Proceedings ArticleDOI
26 Apr 2010
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

5,761 citations


Proceedings ArticleDOI
04 Feb 2010
TL;DR: Experimental results show that TwitterRank outperforms the one Twitter currently uses and other related algorithms, including the original PageRank and Topic-sensitive PageRank, which is proposed to measure the influence of users in Twitter.
Abstract: This paper focuses on the problem of identifying influential users of micro-blogging services. Twitter, one of the most notable micro-blogging services, employs a social-networking model called "following", in which each user can choose who she wants to "follow" to receive tweets from without requiring the latter to give permission first. In a dataset prepared for this study, it is observed that (1) 72.4% of the users in Twitter follow more than 80% of their followers, and (2) 80.5% of the users have 80% of users they are following follow them back. Our study reveals that the presence of "reciprocity" can be explained by phenomenon of homophily. Based on this finding, TwitterRank, an extension of PageRank algorithm, is proposed to measure the influence of users in Twitter. TwitterRank measures the influence taking both the topical similarity between users and the link structure into account. Experimental results show that TwitterRank outperforms the one Twitter currently uses and other related algorithms, including the original PageRank and Topic-sensitive PageRank.

1,864 citations


"A Comparative Study of Microblogs F..." refers background in this paper

  • ...However, it has been widely explored in the different contexts defining key users as influential users in the network or as domain experts who are active and popular in a specific topic or domain [4, 5, 6]....

    [...]


Proceedings ArticleDOI
05 Jan 2010
TL;DR: This paper examines the practice of retweeting as a way by which participants can be "in a conversation" and highlights how authorship, attribution, and communicative fidelity are negotiated in diverse ways.
Abstract: Twitter - a microblogging service that enables users to post messages ("tweets") of up to 140 characters - supports a variety of communicative practices; participants use Twitter to converse with individuals, groups, and the public at large, so when conversations emerge, they are often experienced by broader audiences than just the interlocutors. This paper examines the practice of retweeting as a way by which participants can be "in a conversation." While retweeting has become a convention inside Twitter, participants retweet using different styles and for diverse reasons. We highlight how authorship, attribution, and communicative fidelity are negotiated in diverse ways. Using a series of case studies and empirical data, this paper maps out retweeting as a conversational practice.

1,820 citations


"A Comparative Study of Microblogs F..." refers background in this paper

  • ...T8 Number of unique users who retweeted author’s tweets [16, 14] + +...

    [...]

  • ...T5 Number of retweets of other’s tweets [16, 10, 14] + + T6 Number of unique users retweeted by the user [14] + +...

    [...]


Journal ArticleDOI
01 Nov 2000
TL;DR: The issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined.
Abstract: Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes some of the most important developments in neural network classification research. Specifically, the issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. Our purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics.

1,602 citations


"A Comparative Study of Microblogs F..." refers methods in this paper

  • ...However, training an ANN classification model based on these same categories, decreases the identification performance compared to the previous resulted ANN learned based on OnAF and OfAF categories....

    [...]

  • ...We tested different combinations that may lead to an efficient classification model using two different learning algorithms ANN and SVM....

    [...]

  • ...We note that combining the two categories of features OnAF and OfAF improves the identification results using both ANN and SVM....

    [...]

  • ...As there is no parameters to tune the class weights using ANN, we have duplicated the dataset of prominent users by 30 in order to balance the two datasets of prominent and non prominent users in the training phase of ANN....

    [...]

  • ...Support Vector Machine (SVM) [19] and Artificial Neural Networks (ANN) [20] are used for this study....

    [...]


Proceedings ArticleDOI
11 Feb 2008
TL;DR: This paper introduces a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition, and shows that its system is able to separate high-quality items from the rest with an accuracy close to that of humans.
Abstract: The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans

1,248 citations


"A Comparative Study of Microblogs F..." refers background in this paper

  • ...Existing approaches for the identification of social media influencers are based on standard centrality measures such as eigenvector centrality and its variants HITS [7] and PageRank [8]....

    [...]


Frequently Asked Questions (2)
Q1. What have the authors contributed in "A comparative study of microblogs features effectiveness for the identification of prominent microblog users during unexpected disaster" ?

This paper presents a learning-based approach for the selection of relevant feature categories in the context of information retrieval from microblogs during unexpected disasters. The authors also note that SVM outperforms the ANN learning algorithm for this classification context especially when it is trained with additional spatial features. 

For future work, the authors aim to analyze the effectiveness of each feature characterizing prominent users independently of their category using different feature selection algorithms. Moreover, the authors wish to propose additional engineered features which are more representative for active microblog users in the context of disaster management.