scispace - formally typeset
Search or ask a question
Author

Beaux Sharifi

Bio: Beaux Sharifi is an academic researcher from University of Colorado Colorado Springs. The author has contributed to research in topics: Microblogging & Automatic summarization. The author has an hindex of 3, co-authored 3 publications receiving 384 citations.

Papers
More filters
Proceedings Article
02 Jun 2010
TL;DR: An algorithm is developed that takes a trending phrase or any phrase specified by a user, collects a large number of posts containing the phrase, and provides an automatically created summary of the posts related to the term.
Abstract: In this paper, we focus on a recent Web trend called microblogging, and in particular a site called Twitter. The content of such a site is an extraordinarily large number of small textual messages, posted by millions of users, at random or in response to perceived events or situations. We have developed an algorithm that takes a trending phrase or any phrase specified by a user, collects a large number of posts containing the phrase, and provides an automatically created summary of the posts related to the term. We present examples of summaries we produce along with initial evaluation.

203 citations

Proceedings ArticleDOI
20 Aug 2010
TL;DR: The goal is to produce summaries that are similar to what a human would produce for the same collection of posts on a specific topic, and evaluate the summaries produced by the summarizing algorithms, compare them with human-produced summaries and obtain excellent results.
Abstract: —This paper presents algorithms for summarizingmicroblog posts. In particular, our algorithms process collectionsof short posts on specific topics on the well-known site calledTwitter and create short summaries from these collections ofposts on a specific topic. The goal is to produce summariesthat are similar to what a human would produce for the samecollection of posts on a specific topic. We evaluate the summariesproduced by the summarizing algorithms, compare them withhuman-produced summaries and obtain excellent results. I. I NTRODUCTION Twitter, the microblogging site started in 2006, has becomea social phenomenon, with more than 20 million visitors eachmonth. While the majority posts are conversational or notvery meaningful, about 3.6% of the posts concern topics ofmainstream news 1 . At the end of 2009, Twitter had 75 millionaccount holders, of which about 20% are active 2 . There areapproximately 2.5 million Twitter posts per day 3 . To helppeople who read Twitter posts or tweets, Twitter provides ashort list of popular topics called

148 citations

Journal ArticleDOI
TL;DR: This paper presents algorithms that produce single-document summaries but later extend them to produce summaries containing multiple documents, and evaluates the generated summaries by comparing them to both manually produced summaries and to the summarization results of some of the leading traditional summarization systems.
Abstract: Owing to the sheer volume of text generated by a microblog site like Twitter, it is often difficult to fully understand what is being said about various topics. This paper presents algorithms for summarizing microblog documents. Initially, we present algorithms that produce single-document summaries but later extend them to produce summaries containing multiple documents. We evaluate the generated summaries by comparing them to both manually produced summaries and, for the multiple-post summaries, to the summarization results of some of the leading traditional summarization systems.

43 citations


Cited by
More filters
Proceedings ArticleDOI
19 Jun 2011
TL;DR: A tagset is developed, data is annotated, features are developed, and results nearing 90% accuracy are reported on the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter.
Abstract: We address the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

1,053 citations

Journal ArticleDOI
01 Feb 2015
TL;DR: A survey of techniques for event detection from Twitter streams aimed at finding real‐world occurrences that unfold over space and time and highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features.
Abstract: Twitter is among the fastest-growing microblogging and online social networking services. Messages posted on Twitter tweets have been reporting everything from daily life stories to the latest local and global news and events. Monitoring and analyzing this rich and continuous user-generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire actionable knowledge. This article provides a survey of techniques for event detection from Twitter streams. These techniques aim at finding real-world occurrences that unfold over space and time. In contrast to conventional media, event detection from Twitter streams poses new challenges. Twitter streams contain large amounts of meaningless messages and polluted content, which negatively affect the detection performance. In addition, traditional text mining techniques are not suitable, because of the short length of tweets, the large number of spelling and grammatical errors, and the frequent use of informal and mixed language. Event detection techniques presented in literature address these issues by adapting techniques from various fields to the uniqueness of Twitter. This article classifies these techniques according to the event type, detection task, and detection method and discusses commonly used features. Finally, it highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features.

710 citations

Proceedings ArticleDOI
07 May 2011
TL;DR: TwitInfo allows users to browse a large collection of tweets using a timeline-based display that highlights peaks of high tweet activity, and can identify 80-100% of manually labeled peaks, facilitating a relatively complete view of each event studied.
Abstract: Microblogs are a tremendous repository of user-generated content about world events. However, for people trying to understand events by querying services like Twitter, a chronological log of posts makes it very difficult to get a detailed understanding of an event. In this paper, we present TwitInfo, a system for visualizing and summarizing events on Twitter. TwitInfo allows users to browse a large collection of tweets using a timeline-based display that highlights peaks of high tweet activity. A novel streaming algorithm automatically discovers these peaks and labels them meaningfully using text from the tweets. Users can drill down to subevents, and explore further via geolocation, sentiment, and popular URLs. We contribute a recall-normalized aggregate sentiment visualization to produce more honest sentiment overviews. An evaluation of the system revealed that users were able to reconstruct meaningful summaries of events in a small amount of time. An interview with a Pulitzer Prize-winning journalist suggested that the system would be especially useful for understanding a long-running event and for identifying eyewitnesses. Quantitatively, our system can identify 80-100% of manually labeled peaks, facilitating a relatively complete view of each event studied.

652 citations

Proceedings Article
01 Jun 2013
TL;DR: A critical review of the NLP community's response to the landscape of bad language is offered, and a quantitative analysis of the lexical diversity of social media text, and its relationship to other corpora is presented.
Abstract: The rise of social media has brought computational linguistics in ever-closer contact with bad language: text that defies our expectations about vocabulary, spelling, and syntax. This paper surveys the landscape of bad language, and offers a critical review of the NLP community’s response, which has largely followed two paths: normalization and domain adaptation. Each approach is evaluated in the context of theoretical and empirical work on computer-mediated communication. In addition, the paper presents a quantitative analysis of the lexical diversity of social media text, and its relationship to other corpora.

383 citations

Proceedings Article
05 Jul 2011
TL;DR: It is argued that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets, and a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models is given.
Abstract: Twitter has become exceedingly popular, with hundreds of millions of tweets being posted every day on a wide variety of topics. This has helped make real-time search applications possible with leading search engines routinely displaying relevant tweets in response to user queries. Recent research has shown that a considerable fraction of these tweets are about "events," and the detection of novel events in the tweet-stream has attracted a lot of research interest. However, very little research has focused on properly displaying this real-time information about events. For instance, the leading search engines simply display all tweets matching the queries in reverse chronological order. In this paper we argue that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets. We formalize the problem of summarizing event-tweets and give a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models. In addition, through extensive experiments on real-world data we show that our model significantly outperforms some intuitive and competitive baselines.

331 citations