QuickView: NLP-based Tweet Search

Home
/
Papers
/
QuickView: NLP-based Tweet Search

Proceedings Article•

QuickView: NLP-based Tweet Search

Xiaohua Liu¹, Furu Wei², Ming Zhou²•Institutions (2)

Harbin Institute of Technology¹, Microsoft²

10 Jul 2012-pp 13-18

TL;DR: QuickView is presented, an NLP-based tweet search platform that exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information from a large volume of tweets.

read less

Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.

...read moreread less

Citations

PDF

Open Access

More filters

A Preliminary Study of Tweet Summarization using Information Extraction

[...]

Wei Xu¹, Ralph Grishman¹, Adam Meyers¹, Alan Ritter²•Institutions (2)

New York University¹, University of Washington²

01 Jun 2013

TL;DR: The Pageranklike ranking algorithm is extended from previous work to partition event graphs and thereby detect finegrained aspects of the event to be summarized and show that summaries created by this method are more concise and news-worthy than SumBasic according to human judges.

...read moreread less

Abstract: Although the ideal length of summaries differs greatly from topic to topic on Twitter, previous work has only generated summaries of a pre-fixed length. In this paper, we propose an event-graph based method using information extraction techniques that is able to create summaries of variable length for different topics. In particular, we extend the Pageranklike ranking algorithm from previous work to partition event graphs and thereby detect finegrained aspects of the event to be summarized. Our preliminary results show that summaries created by our method are more concise and news-worthy than SumBasic according to human judges. We also provide a brief survey of datasets and evaluation design used in previous work to highlight the need of developing a standard evaluation for automatic tweet summarization task.

...read moreread less

41 citations

Cites background from "QuickView: NLP-based Tweet Search"

...Proceedings of the Workshop on Language in Social Media (LASM 2013), pages 20–29, Atlanta, Georgia, June 13 2013. c©2013 Association for Computational Linguistics Although the ideal length of summaries differs greatly from topic to topic on Twitter, previous work has only generated summaries of a pre-fixed length....
[...]
...As a first step towards summarizing popular events discussed on Twitter, we need a way to identify events from Tweets....
[...]
...We also consider extending this graph-based approach to disambiguate named entities or resolve event coreference in Twitter data....
[...]
...We gathered tweets over a 4-month period spanning November 2012 to February 2013 using the Twitter Streaming API....
[...]
...All parameters are set experimentally over a small development dataset consisting of 10 events in Twitter data of September 2012....
[...]

Journal Article•

Self-supervised Relation Extraction from the Web

[...]

Ronen Feldman, Benjamin Rosenfled, Stephen Soderland, Oren Etzioni

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: SRES is a self-supervised Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target elations and their attributes.

...read moreread less

Abstract: Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional IE methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping the precision of the resulting list reasonably high. SRES is a self-supervised Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target elations and their attributes. SRES automatically generates the training data needed for its pattern-learning component. We also compare the performance of SRES to the performance of the state-of-the-art KnowItAll system, and to the performance of its pattern learning component, which uses a simpler and less powerful pattern language than SRES.

...read moreread less

38 citations

Proceedings Article•DOI•

HeteroSAS: A Heterogeneous Resource Management Framework for "All-in-the-Air" Social Airborne Sensing in Disaster Response

[...]

Tahmid Rashid¹, Daniel Yue Zhang¹, Dong Wang¹•Institutions (1)

University of Notre Dame¹

01 Jul 2021

TL;DR: In this paper, a heterogeneous resource management framework for all-in-the-air social airborne sensing (SAS) in disaster response applications is presented, which exploits the complementary strengths of different UAV models to accomplish all stages of sensing tasks (i.e., data capturing, maneuvering, and computation).

...read moreread less

Abstract: Social airborne sensing (SAS) is emerging as a new sensing paradigm that leverages the complementary aspects of social sensing and airborne sensing (i.e., UAVs) for reliable information collection. In this paper, we present HeteroSAS, a heterogeneous resource management framework for "all-in-the-air" SAS in disaster response applications. Current SAS approaches use UAVs to only capture data, but carry out computation on ground-based processing nodes that may be unavailable in disaster scenarios and thus consider a single model of UAV along with only one type of task (i.e., data capture). In this paper, we explore the opportunity to exploit the complementary strengths of different UAV models to accomplish all stages of sensing tasks (i.e., data capturing, maneuvering, and computation) exclusively "in-the-air". However, several challenges exist in developing such a resource management framework: i) handling the uncertain social signals in presence of the heterogeneity of UAVs and tasks; and ii) adapting to constantly changing cyber-physical-social environments. The HeteroSAS framework addresses these challenges by building a novel resource management framework that observes the environment and learns the optimal strategy for each UAV using techniques from multi-agent reinforcement learning, game theory, and ensemble learning. The evaluation with a real-world case study shows that HeteroSAS outperforms the state-of-the-art in terms of detection effectiveness, deadline hit rate, and robustness on heterogeneity.

...read moreread less

3 citations

References

PDF

Open Access

More filters

Proceedings Article•

SRILM – An Extensible Language Modeling Toolkit

[...]

Andreas Stolcke

01 Jan 2002

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

4,904 citations

"QuickView: NLP-based Tweet Search" refers methods in this paper

...The translation table includes manually compiled ill/good form pairs, and the language model is a trigram trained on LDC data 4 using SRILM (Stolcke, 2002)....
[...]

Proceedings Article•DOI•

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

[...]

Jenny Rose Finkel¹, Trond Grenager¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

25 Jun 2005

TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.

...read moreread less

Abstract: Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.

...read moreread less

3,209 citations

Proceedings Article•DOI•

Design Challenges and Misconceptions in Named Entity Recognition

[...]

Lev Ratinov¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

04 Jun 2009

TL;DR: Some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system are analyzed, and several solutions to these challenges are developed.

...read moreread less

Abstract: We analyze some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system. In particular, we address issues such as the representation of text chunks, the inference approach needed to combine local NER decisions, the sources of prior knowledge and how to use them within an NER system. In the process of comparing several solutions to these challenges we reach some surprising conclusions, as well as develop an NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task, the best reported result for this dataset.

...read moreread less

1,539 citations

Proceedings Article•

Target-dependent Twitter Sentiment Classification

[...]

Long Jiang¹, Mo Yu², Ming Zhou¹, Xiaohua Liu¹, Tiejun Zhao² - Show less +1 more•Institutions (2)

Microsoft¹, Harbin Institute of Technology²

19 Jun 2011

TL;DR: This paper proposes to improve target-dependent Twitter sentiment classification by incorporating target- dependent features; and taking related tweets into consideration; and according to the experimental results, this approach greatly improves the performance of target- dependence sentiment classification.

...read moreread less

Abstract: Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-of-the-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. Moreover, the state-of-the-art approaches only take the tweet to be classified into consideration when classifying the sentiment; they ignore its context (i.e., related tweets). However, because tweets are usually short and more ambiguous, sometimes it is not enough to consider only the current tweet for sentiment classification. In this paper, we propose to improve target-dependent Twitter sentiment classification by 1) incorporating target-dependent features; and 2) taking related tweets into consideration. According to the experimental results, our approach greatly improves the performance of target-dependent sentiment classification.

...read moreread less

911 citations

"QuickView: NLP-based Tweet Search" refers background in this paper

...The SA component is implemented according to Jiang et al. (2011), which incorporates target-dependent features and considers related tweets by utilizing a graph-based optimization....
[...]

Proceedings Article•

Calibrating Features for Semantic Role Labeling

[...]

Nianwen Xue¹, Martha Palmer•Institutions (1)

University of Pennsylvania¹

01 Jul 2004

TL;DR: A critical look at the features used in the semantic role tagging literature is taken and it is shown that the information in the input, generally a syntactic parse tree, has yet to be fully exploited.

...read moreread less

Abstract: This paper takes a critical look at the features used in the semantic role tagging literature and show that the information in the input, generally a syntactic parse tree, has yet to be fully exploited. We propose an additional set of features and our experiments show that these features lead to fairly significant improvements in the tasks we performed. We further show that different features are needed for different subtasks. Finally, we show that by using a Maximum Entropy classifier and fewer features, we achieved results comparable with the best previously reported results obtained with SVM models. We believe this is a clear indication that developing features that capture the right kind of information is crucial to advancing the stateof-the-art in semantic analysis.

...read moreread less

347 citations