scispace - formally typeset
Search or ask a question
Proceedings Article

QuickView: NLP-based Tweet Search

10 Jul 2012-pp 13-18
TL;DR: QuickView is presented, an NLP-based tweet search platform that exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information from a large volume of tweets.
Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.
Citations
More filters
01 Jun 2013
TL;DR: The Pageranklike ranking algorithm is extended from previous work to partition event graphs and thereby detect finegrained aspects of the event to be summarized and show that summaries created by this method are more concise and news-worthy than SumBasic according to human judges.
Abstract: Although the ideal length of summaries differs greatly from topic to topic on Twitter, previous work has only generated summaries of a pre-fixed length. In this paper, we propose an event-graph based method using information extraction techniques that is able to create summaries of variable length for different topics. In particular, we extend the Pageranklike ranking algorithm from previous work to partition event graphs and thereby detect finegrained aspects of the event to be summarized. Our preliminary results show that summaries created by our method are more concise and news-worthy than SumBasic according to human judges. We also provide a brief survey of datasets and evaluation design used in previous work to highlight the need of developing a standard evaluation for automatic tweet summarization task.

41 citations


Cites background from "QuickView: NLP-based Tweet Search"

  • ...Proceedings of the Workshop on Language in Social Media (LASM 2013), pages 20–29, Atlanta, Georgia, June 13 2013. c©2013 Association for Computational Linguistics Although the ideal length of summaries differs greatly from topic to topic on Twitter, previous work has only generated summaries of a pre-fixed length....

    [...]

  • ...As a first step towards summarizing popular events discussed on Twitter, we need a way to identify events from Tweets....

    [...]

  • ...We also consider extending this graph-based approach to disambiguate named entities or resolve event coreference in Twitter data....

    [...]

  • ...We gathered tweets over a 4-month period spanning November 2012 to February 2013 using the Twitter Streaming API....

    [...]

  • ...All parameters are set experimentally over a small development dataset consisting of 10 events in Twitter data of September 2012....

    [...]

Journal Article
TL;DR: SRES is a self-supervised Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target elations and their attributes.
Abstract: Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional IE methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping the precision of the resulting list reasonably high. SRES is a self-supervised Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target elations and their attributes. SRES automatically generates the training data needed for its pattern-learning component. We also compare the performance of SRES to the performance of the state-of-the-art KnowItAll system, and to the performance of its pattern learning component, which uses a simpler and less powerful pattern language than SRES.

38 citations

Proceedings ArticleDOI
01 Jul 2021
TL;DR: In this paper, a heterogeneous resource management framework for all-in-the-air social airborne sensing (SAS) in disaster response applications is presented, which exploits the complementary strengths of different UAV models to accomplish all stages of sensing tasks (i.e., data capturing, maneuvering, and computation).
Abstract: Social airborne sensing (SAS) is emerging as a new sensing paradigm that leverages the complementary aspects of social sensing and airborne sensing (i.e., UAVs) for reliable information collection. In this paper, we present HeteroSAS, a heterogeneous resource management framework for "all-in-the-air" SAS in disaster response applications. Current SAS approaches use UAVs to only capture data, but carry out computation on ground-based processing nodes that may be unavailable in disaster scenarios and thus consider a single model of UAV along with only one type of task (i.e., data capture). In this paper, we explore the opportunity to exploit the complementary strengths of different UAV models to accomplish all stages of sensing tasks (i.e., data capturing, maneuvering, and computation) exclusively "in-the-air". However, several challenges exist in developing such a resource management framework: i) handling the uncertain social signals in presence of the heterogeneity of UAVs and tasks; and ii) adapting to constantly changing cyber-physical-social environments. The HeteroSAS framework addresses these challenges by building a novel resource management framework that observes the environment and learns the optimal strategy for each UAV using techniques from multi-agent reinforcement learning, game theory, and ensemble learning. The evaluation with a real-world case study shows that HeteroSAS outperforms the state-of-the-art in terms of detection effectiveness, deadline hit rate, and robustness on heterogeneity.

3 citations

References
More filters
Proceedings Article
01 Jan 2002
TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.

4,904 citations


"QuickView: NLP-based Tweet Search" refers methods in this paper

  • ...The translation table includes manually compiled ill/good form pairs, and the language model is a trigram trained on LDC data 4 using SRILM (Stolcke, 2002)....

    [...]

Proceedings ArticleDOI
25 Jun 2005
TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.
Abstract: Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.

3,209 citations

Proceedings ArticleDOI
04 Jun 2009
TL;DR: Some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system are analyzed, and several solutions to these challenges are developed.
Abstract: We analyze some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system. In particular, we address issues such as the representation of text chunks, the inference approach needed to combine local NER decisions, the sources of prior knowledge and how to use them within an NER system. In the process of comparing several solutions to these challenges we reach some surprising conclusions, as well as develop an NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task, the best reported result for this dataset.

1,539 citations

Proceedings Article
Long Jiang1, Mo Yu2, Ming Zhou1, Xiaohua Liu1, Tiejun Zhao2 
19 Jun 2011
TL;DR: This paper proposes to improve target-dependent Twitter sentiment classification by incorporating target- dependent features; and taking related tweets into consideration; and according to the experimental results, this approach greatly improves the performance of target- dependence sentiment classification.
Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-of-the-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification.

911 citations


"QuickView: NLP-based Tweet Search" refers background in this paper

  • ...The SA component is implemented according to Jiang et al. (2011), which incorporates target-dependent features and considers related tweets by utilizing a graph-based optimization....

    [...]

Proceedings Article
01 Jul 2004
TL;DR: A critical look at the features used in the semantic role tagging literature is taken and it is shown that the information in the input, generally a syntactic parse tree, has yet to be fully exploited.
Abstract: This paper takes a critical look at the features used in the semantic role tagging literature and show that the information in the input, generally a syntactic parse tree, has yet to be fully exploited. We propose an additional set of features and our experiments show that these features lead to fairly significant improvements in the tasks we performed. We further show that different features are needed for different subtasks. Finally, we show that by using a Maximum Entropy classifier and fewer features, we achieved results comparable with the best previously reported results obtained with SVM models. We believe this is a clear indication that developing features that capture the right kind of information is crucial to advancing the stateof-the-art in semantic analysis.

347 citations


"QuickView: NLP-based Tweet Search" refers methods in this paper

  • ..., and conquering them individually (Xue, 2004; Koomen et al., 2005); 2) sequentially labeling based approach (Màrquez et al....

    [...]