scispace - formally typeset
Search or ask a question

Showing papers on "Microblogging published in 2013"


Journal ArticleDOI
TL;DR: It is found that emotionally charged Twitter messages tend to be retweeted more often and more quickly compared to neutral ones, and companies should pay more attention to the analysis of sentiment related to their brands and products in social media communication as well as in designing advertising content that triggers emotions.
Abstract: As a new communication paradigm, social media has promoted information dissemination in social networks. Previous research has identified several content-related features as well as user and network characteristics that may drive information diffusion. However, little research has focused on the relationship between emotions and information diffusion in a social media setting. In this paper, we examine whether sentiment occurring in social media content is associated with a user's information sharing behavior. We carry out our research in the context of political communication on Twitter. Based on two data sets of more than 165,000 tweets in total, we find that emotionally charged Twitter messages tend to be retweeted more often and more quickly compared to neutral ones. As a practical implication, companies should pay more attention to the analysis of sentiment related to their brands and products in social media communication as well as in designing advertising content that triggers emotions.

1,146 citations


Posted Content
TL;DR: Data collected using Twitter's sampled API service is compared with data collected using the full, albeit costly, Firehose stream that includes every single published tweet to help researchers and practitioners understand the implications of using the Streaming API.
Abstract: Twitter is a social media giant famous for the exchange of short, 140-character messages called "tweets". In the scientific community, the microblogging site is known for openness in sharing its data. It provides a glance into its millions of users and billions of tweets through a "Streaming API" which provides a sample of all tweets matching some parameters preset by the API user. The API service has been used by many researchers, companies, and governmental institutions that want to extract knowledge in accordance with a diverse array of questions pertaining to social media. The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter's sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet. We compare both datasets using common statistical metrics as well as metrics that allow us to compare topics, networks, and locations of tweets. The results of our work will help researchers and practitioners understand the implications of using the Streaming API.

848 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper empirically establishes that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a range of pooling schemes.
Abstract: Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.

475 citations


Proceedings Article
21 Jun 2013
TL;DR: In this paper, the authors compare data collected using Twitter's sampled API service with data collected from the full, albeit costly, Firehose stream that includes every single published tweet, using common statistical metrics as well as metrics that allow them to compare topics, networks, and locations of tweets.
Abstract: Twitter is a social media giant famous for the exchange of short, 140-character messages called "tweets". In the scientific community, the microblogging site is known for openness in sharing its data. It provides a glance into its millions of users and billions of tweets through a "Streaming API" which provides a sample of all tweets matching some parameters preset by the API user. The API service has been used by many researchers, companies, and governmental institutions that want to extract knowledge in accordance with a diverse array of questions pertaining to social media. The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter's sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet. We compare both datasets using common statistical metrics as well as metrics that allow us to compare topics, networks, and locations of tweets. The results of our work will help researchers and practitioners understand the implications of using the Streaming API.

469 citations


Proceedings Article
01 Jan 2013
TL;DR: This paper focuses on extracting valuable “information nuggets”, brief, self-contained information items relevant to disaster response, using automatic methods for extracting information from microblog posts that leverage machine learning methods for classifying posts and information extraction.
Abstract: Microblogging sites such as Twitter can play a vital role in spreading information during “natural” or man-made disasters. But the volume and velocity of tweets posted during crises today tend to be extremely high, making it hard for disaster-affected communities and professional emergency responders to process the information in a timely manner. Furthermore, posts tend to vary highly in terms of their subjects and usefulness; from messages that are entirely off-topic or personal in nature, to messages containing critical information that augments situational awareness. Finding actionable information can accelerate disaster response and alleviate both property and human losses. In this paper, we describe automatic methods for extracting information from microblog posts. Specifically, we focus on extracting valuable “information nuggets”, brief, self-contained information items relevant to disaster response. Our methods leverage machine learning methods for classifying posts and information extraction. Our results, validated over one large disaster-related dataset, reveal that a careful design can yield an effective system, paving the way for more sophisticated data analysis and visualization systems.

404 citations


Journal ArticleDOI
TL;DR: This work empirically study the motivations of users to contribute content to social media in the context of the popular microblogging site Twitter, focusing on noncommercial users who do not benefit financially from their contributions.
Abstract: We empirically study the motivations of users to contribute content to social media in the context of the popular microblogging site Twitter. We focus on noncommercial users who do not benefit financially from their contributions. Previous literature suggests that there are two main types of utility that motivate these users to post content: intrinsic utility and image-related utility. We leverage the fact that these two types of utility give rise to different predictions as to whether users should increase their contributions when their number of followers increases. To address the issue that the number of followers is endogenous, we conducted a field experiment in which we exogenously added followers or follow requests, in the case of protected accounts to a set of users over a period of time and compared their posting activities to those of a control group. We estimated each treated user's utility function using a dynamic discrete choice model. Although our results are consistent with both types of utility being at play, our model suggests that image-related utility is larger for most users. We discuss the implications of our findings for the evolution of Twitter and the type of value firms may derive from such platforms in the future.

372 citations


Proceedings ArticleDOI
04 Feb 2013
TL;DR: This work proposes a Sociological Approach to handling Noisy and short Texts (SANT) for sentiment classification and presents a mathematical optimization formulation that incorporates the sentiment consistency and emotional contagion theories into the supervised learning process.
Abstract: Microblogging, like Twitter and Sina Weibo, has become a popular platform of human expressions, through which users can easily produce content on breaking news, public events, or products. The massive amount of microblogging data is a useful and timely source that carries mass sentiment and opinions on various topics. Existing sentiment analysis approaches often assume that texts are independent and identically distributed (i.i.d.), usually focusing on building a sophisticated feature space to handle noisy and short texts, without taking advantage of the fact that the microblogs are networked data. Inspired by the social sciences findings that sentiment consistency and emotional contagion are observed in social networks, we investigate whether social relations can help sentiment analysis by proposing a Sociological Approach to handling Noisy and short Texts (SANT) for sentiment classification. In particular, we present a mathematical optimization formulation that incorporates the sentiment consistency and emotional contagion theories into the supervised learning process; and utilize sparse learning to tackle noisy texts in microblogging. An empirical study of two real-world Twitter datasets shows the superior performance of our framework in handling noisy and short tweets.

361 citations


Journal ArticleDOI
TL;DR: This paper proposes the deployment of original ontology-based techniques towards a more efficient sentiment analysis of Twitter posts, where posts are not simply characterized by a sentiment score, as is the case with machine learning-based classifiers, but instead receive a sentiment grade for each distinct notion in the post.
Abstract: The emergence of Web 2.0 has drastically altered the way users perceive the Internet, by improving information sharing, collaboration and interoperability. Micro-blogging is one of the most popular Web 2.0 applications and related services, like Twitter, have evolved into a practical means for sharing opinions on almost all aspects of everyday life. Consequently, micro-blogging web sites have since become rich data sources for opinion mining and sentiment analysis. Towards this direction, text-based sentiment classifiers often prove inefficient, since tweets typically do not consist of representative and syntactically consistent words, due to the imposed character limit. This paper proposes the deployment of original ontology-based techniques towards a more efficient sentiment analysis of Twitter posts. The novelty of the proposed approach is that posts are not simply characterized by a sentiment score, as is the case with machine learning-based classifiers, but instead receive a sentiment grade for each distinct notion in the post. Overall, our proposed architecture results in a more detailed analysis of post opinions regarding a specific topic.

345 citations


Journal ArticleDOI
TL;DR: The purpose of the research is to establish if an automatic discovery process of relevant and credible news events can be achieved and to focus on the analysis of information credibility on Twitter.
Abstract: Purpose – Twitter is a popular microblogging service which has proven, in recent years, its potential for propagating news and information about developing events. The purpose of this paper is to focus on the analysis of information credibility on Twitter. The purpose of our research is to establish if an automatic discovery process of relevant and credible news events can be achieved. Design/methodology/approach – The paper follows a supervised learning approach for the task of automatic classification of credible news events. A first classifier decides if an information cascade corresponds to a newsworthy event. Then a second classifier decides if this cascade can be considered credible or not. The paper undertakes this effort training over a significant amount of labeled data, obtained using crowdsourcing tools. The paper validates these classifiers under two settings: the first, a sample of automatically detected Twitter “trends” in English, and second, the paper tests how well this model transfers to...

319 citations


Journal ArticleDOI
18 Apr 2013-PLOS ONE
TL;DR: It is shown that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods, and highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.
Abstract: Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data “proxies” of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.

293 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide an empirical test of the Twitter effect, which postulates that micro bloggingging word of mouth (MWOM) shared through Twitter and similar services affects early product adoption behaviors by immediately disseminating consumers' post-purchase quality evaluations.
Abstract: This research provides an empirical test of the “Twitter effect,” which postulates that microblogging word of mouth (MWOM) shared through Twitter and similar services affects early product adoption behaviors by immediately disseminating consumers’ post-purchase quality evaluations. This is a potentially crucial factor for the success of experiential media products and other products whose distribution strategy relies on a hyped release. Studying the four million MWOM messages sent via Twitter concerning 105 movies on their respective opening weekends, the authors find support for the Twitter effect and report evidence of a negativity bias. In a follow-up incident study of 600 Twitter users who decided not to see a movie based on negative MWOM, the authors shed additional light on the Twitter effect by investigating how consumers use MWOM information in their decision-making processes and describing MWOM’s defining characteristics. They use these insights to position MWOM in the word-of-mouth landscape, to identify future word-of-mouth research opportunities based on this conceptual positioning, and to develop managerial implications.

Journal ArticleDOI
TL;DR: It is found that Twitter opinion leaders have higher motivations of information seeking, mobilization, and public expression than nonleaders, and mobilization and public-expression motivations mediate the association between perceived opinion leadership and Twitter use frequency.

Journal ArticleDOI
TL;DR: This study explores how candidates running for the European Parliament in 2009 used micro-blogging and online social networks – in this case Twitter in the early stage of its adoption – to communicate and connect with citizens.
Abstract: This study explores how candidates running for the European Parliament (EP) in 2009 used micro-blogging and online social networks – in this case Twitter (www.twitter.com) in the early stage of its adoption – to communicate and connect with citizens. Micro-blogging in general, and Twitter in particular, is one of the new and popular Web 2.0 applications, yet there has been little research focusing on the use of Twitter by politicians. After reviewing different types of campaigning strategies and introducing a new and distinct strategy, this descriptive and exploratory study focuses on political candidates' use of micro-blogging and online social networking (i.e. Twitter) from a longitudinal, social network, and ideological perspective. The results clearly show that most candidates in 2009 still used Twitter reluctantly. Those who used Twitter did so predominantly for electoral campaigning and only sparingly for continuous campaigning. Candidates from progressive parties are the most active users of Twitte...

Journal ArticleDOI
TL;DR: This work makes the first empirical study and evaluation of the effect of evasion tactics utilized by Twitter spammers and is a valuable supplement to this line of research.
Abstract: To date, as one of the most popular online social networks (OSNs), Twitter is paying its dues as more and more spammers set their sights on this microblogging site. Twitter spammers can achieve their malicious goals such as sending spam, spreading malware, hosting botnet command and control (C&C) channels, and launching other underground illicit activities. Due to the significance and indispensability of detecting and suspending those spam accounts, many researchers along with the engineers at Twitter Inc. have devoted themselves to keeping Twitter as spam-free online communities. Most of the existing studies utilize machine learning techniques to detect Twitter spammers. “While the priest climbs a post, the devil climbs ten.” Twitter spammers are evolving to evade existing detection features. In this paper, we first make a comprehensive and empirical analysis of the evasion tactics utilized by Twitter spammers. We further design several new detection features to detect more Twitter spammers. In addition, to deeply understand the effectiveness and difficulties of using machine learning features to detect spammers, we analyze the robustness of 24 detection features that are commonly utilized in the literature as well as our proposed ones. Through our experiments, we show that our new designed features are much more effective to be used to detect (even evasive) Twitter spammers. According to our evaluation, while keeping an even lower false positive rate, the detection rate using our new feature set is also significantly higher than that of existing work. To the best of our knowledge, this work is the first empirical study and evaluation of the effect of evasion tactics utilized by Twitter spammers and is a valuable supplement to this line of research.

Journal ArticleDOI
TL;DR: This paper used a mixed-methods approach, incorporating descriptive statistics, content analysis, and a case study of the author's learning process to examine the existence of informal learning about the Occupy Wall Street movement.
Abstract: Recent events suggest that social media, also called web 2.0, can support mass social change. Although some critics have lamented how social media are eroding people’s ability to communicate, others have argued that social media may allow individuals to leverage their individual voices against authoritarian leaders. This article seeks to understand the ways in which individuals can use a particular social media platform, the microblog Twitter, to learn about the Occupy Wall Street movement. This article uses a mixed-methods approach, incorporating descriptive statistics, content analysis, and a case study of the author’s learning process to examine the existence of informal learning about the Occupy Wall Street movement. Scholars have proposed that informal learning about a social movement is associated with participation in the movement. This study suggests that Twitter supports multiple opportunities for participation in the Occupy movement—from creating, tagging, and sharing content to reading, watchin...

Proceedings ArticleDOI
01 Sep 2013
TL;DR: This paper introduces each stage of the TwitIE pipeline, which is a modification of the GATE ANNIE open-source pipeline for news text, and an evaluation against some state-of-the-art systems is presented.
Abstract: Twitter is the largest source of microblog text, responsible for gigabytes of human discourse every day. Processing microblog text is difficult: the genre is noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets and other microblog text. We present TwitIE, an open-source NLP pipeline customised to microblog text at every stage. Additionally, it includes Twitter-specific data import and metadata handling. This paper introduces each stage of the TwitIE pipeline, which is a modification of the GATE ANNIE open-source pipeline for news text. An evaluation against some state-of-the-art systems is also presented.

Proceedings ArticleDOI
13 May 2013
TL;DR: This paper proposes a novel method for unsupervised and content-based hashtag recommendation for tweets that relies on Latent Dirichlet Allocation (LDA) to model the underlying topic assignment of language classified tweets.
Abstract: Since the introduction of microblogging services, there has been a continuous growth of short-text social networking on the Internet. With the generation of large amounts of microposts, there is a need for effective categorization and search of the data. Twitter, one of the largest microblogging sites, allows users to make use of hashtags to categorize their posts. However, the majority of tweets do not contain tags, which hinders the quality of the search results. In this paper, we propose a novel method for unsupervised and content-based hashtag recommendation for tweets. Our approach relies on Latent Dirichlet Allocation (LDA) to model the underlying topic assignment of language classified tweets. The advantage of our approach is the use of a topic distribution to recommend general hashtags.

Journal ArticleDOI
TL;DR: The majority of published work relating to Twitter concentrates on aspects of the messages sent and details of the users, and a variety of methodological approaches is used across a range of identified domains.
Abstract: Purpose – Since its introduction in 2006, messages posted to the microblogging system Twitter have provided a rich dataset for researchers, leading to the publication of over a thousand academic papers. This paper aims to identify this published work and to classify it in order to understand Twitter based research.Design/methodology/approach – Firstly the papers on Twitter were identified. Secondly, following a review of the literature, a classification of the dimensions of microblogging research was established. Thirdly, papers were qualitatively classified using open coded content analysis, based on the paper's title and abstract, in order to analyze method, subject, and approach.Findings – The majority of published work relating to Twitter concentrates on aspects of the messages sent and details of the users. A variety of methodological approaches is used across a range of identified domains.Research limitations/implications – This work reviewed the abstracts of all papers available via database search...

Proceedings ArticleDOI
23 Oct 2013
TL;DR: This paper presents a detailed study of Twitter follower markets, reports in detail on both the static and dynamic properties of customers of these markets, and develops and evaluates multiple techniques for detecting these activities.
Abstract: The users of microblogging services, such as Twitter, use the count of followers of an account as a measure of its reputation or influence. For those unwilling or unable to attract followers naturally, a growing industry of "Twitter follower markets" provides followers for sale. Some markets use fake accounts to boost the follower count of their customers, while others rely on a pyramid scheme to turn non-paying customers into followers for each other, and into followers for paying customers. In this paper, we present a detailed study of Twitter follower markets, report in detail on both the static and dynamic properties of customers of these markets, and develop and evaluate multiple techniques for detecting these activities. We show that our detection system is robust and reliable, and can detect a significant number of customers in the wild.

Proceedings Article
28 Jun 2013
TL;DR: It is indicated that Twitter reports the same events as newswire providers, in addition to a long tail of minor events ignored by mainstream media, indicating that the value that Twitter can bring in a news setting comes predominantly from increased event coverage, not timeliness of reporting.
Abstract: Twitter is often considered to be a useful source of real-time news, potentially replacing newswire for this purpose. But is this true? In this paper, we examine the extent to which news reporting in newswire and Twitter overlap and whether Twitter often reports news faster than traditional newswire providers. In particular, we analyse 77 days worth of tweet and newswire articles with respect to both manually identified major news events and larger volumes of automatically identified news events. Our results indicate that Twitter reports the same events as newswire providers, in addition to a long tail of minor events ignored by mainstream media. However, contrary to popular belief, neither stream leads the other when dealing with major news events, indicating that the value that Twitter can bring in a news setting comes predominantly from increased event coverage, not timeliness of reporting.

Journal ArticleDOI
TL;DR: This study explains why some candidates are more likely to adopt Twitter, have larger networks, and show more reciprocation than other candidates, and shows that being an early adopter of these new technologies is more effective than adoption shortly before Election Day.
Abstract: The present study focuses on how candidates in the Dutch general elections of 2010 use Twitter, a popular microblogging and social networking service. Specifically the study focuses on explaining why some candidates are more likely to adopt Twitter, have larger networks, and show more reciprocation than other candidates. The innovation hypothesis, predicting that candidates from less established and smaller parties will use Twitter more extensively, is unsupported. This suggests that normalization of campaign practices is present on Twitter, not changing existing communication practices. The findings do show that being an early adopter of these new technologies is more effective than adoption shortly before Election Day.

Journal ArticleDOI
01 Jan 2013
TL;DR: Methodologies of detecting and identifying trending topics from streaming data from Twitter's streaming API were outlined, and term frequency-inverse document frequency analysis identified unigrams, bigrams, and trigrams as trending topics.
Abstract: As social media continue to grow, the zeitgeist of society is increasingly found not in the headlines of traditional media institutions, but in the activity of ordinary individuals. The identification of trending topics utilises social media (such as Twitter) to provide an overview of the topics and issues that are currently popular within the online community. In this paper, we outline methodologies of detecting and identifying trending topics from streaming data. Data from Twitter's streaming API was collected and put into documents of equal duration using data collection procedures that allow for analysis over multiple timespans, including those not currently associated with Twitter-identified trending topics. Term frequency-inverse document frequency analysis and relative normalised term frequency analysis were performed on the documents to identify the trending topics. Relative normalised term frequency analysis identified unigrams, bigrams, and trigrams as trending topics, while term frequency-inverse document frequency analysis identified unigrams as trending topics. Application of these methodologies to streaming data resulted in F-measures ranging from 0.1468 to 0.7508.

Proceedings ArticleDOI
13 May 2013
TL;DR: An extensive analysis of a wide range of tweet and user features regarding their influence on the spread of tweets is provided and the most impactful features are chosen to build a learning model that predicts viral tweets with high accuracy.
Abstract: Twitter and other microblogging services have become indispensable sources of information in today's web. Understanding the main factors that make certain pieces of information spread quickly in these platforms can be decisive for the analysis of opinion formation and many other opinion mining tasks.This paper addresses important questions concerning the spread of information on Twitter. What makes Twitter users retweet a tweet? Is it possible to predict whether a tweet will become "viral", i.e., will be frequently retweeted? To answer these questions we provide an extensive analysis of a wide range of tweet and user features regarding their influence on the spread of tweets. The most impactful features are chosen to build a learning model that predicts viral tweets with high accuracy. All experiments are performed on a real-world dataset, extracted through a public Twitter API based on user IDs from the TREC 2011 microblog corpus.

Journal ArticleDOI
TL;DR: This article investigated how an online community of teachers engaged in professional development using collaborative Web (Web 20) technologies This community of practice (CoP) consisted of world language (WL) teachers using the micro blogging platform, Twitter.
Abstract: This study investigated how an online community of teachers engaged in professional development using collaborative Web (Web 20) technologies This community of practice (CoP) consisted of world language (WL) teachers using the microblogging platform, Twitter The study approached teacher learning from a sociocultural perspective Its central questions were as follows: What are the characteristics of this CoP of WL educators on Twitter? How do those characteristics relate to or reflect teacher learning? With a qualitative, netnographic approach, data sources included over a year of participant observation, nine interviews with community members, and numerous online documents from blogs, wikis, and other sources Findings demonstrated how the domain, community, and practice characteristics of this online CoP could also be linked to sustained and significant teacher learning The study concludes with considerations for the future of similar online communities

Proceedings Article
03 Aug 2013
TL;DR: An optimization formulation is presented that models the social network and content information in a unified framework that can effectively utilize both kinds of information for social spammer detection in microblogging.
Abstract: The availability of microblogging, like Twitter and Sina Weibo, makes it a popular platform for spammers to unfairly overpower normal users with unwanted content via social networks, known as social spamming. The rise of social spamming can significantly hinder the use of microblogging systems for effective information dissemination and sharing. Distinct features of microblogging systems present new challenges for social spammer detection. First, unlike traditional social networks, microblogging allows to establish some connections between two parties without mutual consent, which makes it easier for spammers to imitate normal users by quickly accumulating a large number of "human" friends. Second, microblogging messages are short, noisy, and unstructured. Traditional social spammer detection methods are not directly applicable to microblogging. In this paper, we investigate how to collectively use network and content information to perform effective social spammer detection in microblogging. In particular, we present an optimization formulation that models the social network and content information in a unified framework. Experiments on a real-world Twitter dataset demonstrate that our proposed method can effectively utilize both kinds of information for social spammer detection.

Journal ArticleDOI
TL;DR: The study confirms the role of similarity in personality traits between Twitter users and the Twitter brand in engendering trust in Twitter and suggests the salience of different personality traits in the ''personality match - Twitter trust'' link for different cultures suggests important implications for global marketers.

Journal ArticleDOI
01 Apr 2013
TL;DR: It is found that the consideration of user credibility and opinion subjectivity is essential for aggregating microblog opinions and the proposed mechanism can effectively discover market intelligence (MI) for supporting decision-makers.
Abstract: Given their rapidly growing popularity, microblogs have become great sources of consumer opinions. However, in the face of unique properties and the massive volume of posts on microblogs, this paper proposes a framework that provides a compact numeric summarization of opinions on such platforms. The proposed framework is designed to cope with the following tasks: trendy topics detection, opinion classification, credibility assessment, and numeric summarization. An experiment is carried out on Twitter, the largest microblog website, to prove the effectiveness of the proposed framework. We find that the consideration of user credibility and opinion subjectivity is essential for aggregating microblog opinions. The proposed mechanism can effectively discover market intelligence (MI) for supporting decision-makers.

Proceedings Article
28 Jun 2013
TL;DR: This work proposes the first multi-indicator method for determining the location where a tweet was created as well as the location of the user's residence, based on various weighted indicators, including the names of places that appear in the text message, dedicated location entries, and additional information from the user profile.
Abstract: Real-time information from microblogs like Twitter is useful for different applications such as market research, opinion mining, and crisis management. For many of those messages, location information is required to derive useful insights. Today, however, only around 1% of all tweets are explicitly geotagged. We propose the first multi-indicator method for determining (1) the location where a tweet was created as well as (2) the location of the user's residence. Our method is based on various weighted indicators, including the names of places that appear in the text message, dedicated location entries, and additional information from the user profile. An evaluation shows that our method is capable of locating 92% of all tweets with a median accuracy of below 30km, as well as predicting the user's residence with a median accuracy of below 5.1km. With that level of accuracy, our approach significantly outperforms existing work.

Proceedings ArticleDOI
13 May 2013
TL;DR: This work investigates the feasibility of applying Named Entity Recognizers to extract locations from microblogs, at the level of both geo-location and point-of-interest, and shows that such tools once retrained on microblog data have great potential to detect the where information, even at the granularity of point- of-interest.
Abstract: Location information is critical to understanding the impact of a disaster, including where the damage is, where people need assistance and where help is available. We investigate the feasibility of applying Named Entity Recognizers to extract locations from microblogs, at the level of both geo-location and point-of-interest. Our experimental results show that such tools once retrained on microblog data have great potential to detect the where information, even at the granularity of point-of-interest.

Journal ArticleDOI
TL;DR: This special section on Twitter and Microblogging Services, which features five articles on different aspects of microblogging and related topics, proposes a supervised learning method for personalized tweens reordering based on users’ preferences and interests by minimizing the pairwise loss of relevant and irrelevant tweets.
Abstract: Welcome to this special section on Twitter and Microblogging Services, which features five articles on different aspects of microblogging and related topics. We are putting forward this special section because, in recent years, we have witnessed a dramatic increase in the amount of research done on Twitter and other microblogging services, and we believe that a special journal section on this topic is timely and will serve our community well. The special section comes out with high-quality selected articles that were originally presented in various top international conferences. These articles have been expanded and extended with more detailed contents from the authors to ensure a deeper understanding of their respective work. A brief introduction of the five articles follows. A Content-Driven Framework for Geolocating Microblog Users by Zhiyuan Cheng, James Caverlee, and Kyumin Lee investigates the use of a probabilistic framework for estimating a microblogger’s location based on the content of the microblog. The framework has to overcome the geodata sparsity problem and is capable of estimating the user’s location within a radius. The second article is Named Entity Recognition for Tweets by Xiaohua Liu, Furu Wei, Shaodian Zhang, and Ming Zhou. Named Entity Recognition (NER) is an active and challenging research topic in microblogging due to insufficient content and lack of training data. This article proposes a combination of machine learning techniques to tackle this problem with good and effective results. In the third article, Improving Recency Ranking Using Twitter Data, Yi Chang, Anlei Dong, Pranam Kolari, Ruiqiang Zhang, Yoshiyuki Inagaki, Fernando Diaz, Hongyuan Zha, and Yan Liu examine the use of Recency ranking, which incorporates relevancy and freshness in overcoming the lack of in-links and click information issue. Their approach utilizes Twitter TinyURL to detect fresh and high-quality tweets for generating ranking. Lexical Normalization for Social Media Text by Bo Han, Paul Cook, and Timothy Baldwin targets out-of-vocabulary words in tweets in order to tackle word noise in brief messages. Based on morphophonemic similarity, their approach detects lexical variants in order to generate the correct candidates for correcting words. The final article is Reorder User’s Tweets by Keyi Shen, Jianmin Wu, Ya Zhang, Yiping Han, Xiaokang Yang, Li Song, and Xiao Gu. Typically microblogs are displayed in a reversed chronological order. This article proposes a supervised learning method for personalized tweens reordering based on users’ preferences and interests by minimizing the pairwise loss of relevant and irrelevant tweets. The guest editors would like to thank all the authors and the reviewers for their contributions to this special section. Special thanks go to Weike Pan and Xiaofeng Yu for their administrative assistances. Finally, we would like to thank ACM TIST and