Processing and visualizing the data in tweets
Adam Marcus,Michael S. Bernstein,Osama Badar,David R. Karger,Samuel Madden,Robert C. Miller +5 more
- Vol. 40, Iss: 4, pp 21-27
TLDR
The first, TweeQL, provides a streaming SQL-like interface to the Twitter API, making common tweet processing tasks simpler, and the second, TwitInfo, shows how end-users can interact with and understand aggregated data from the tweet stream, in addition to showcasing the power of theTweeQL language.Abstract:
Microblogs such as Twitter provide a valuable stream of diverse user-generated data. While the data extracted from Twitter is generally timely and accurate, the process by which developers extract structured data from the tweet stream is ad-hoc and requires reimplementation of common data manipulation primitives. In this paper, we present two systems for querying and extracting structure from Twitter-embedded data. The first, TweeQL, provides a streaming SQL-like interface to the Twitter API, making common tweet processing tasks simpler. The second, TwitInfo, shows how end-users can interact with and understand aggregated data from the tweet stream, in addition to showcasing the power of the TweeQL language. Together these systems show the richness of content that can be extracted from Twitter.read more
Citations
More filters
Journal ArticleDOI
Open challenges for data stream mining research
Georg Krempl,Indre Žliobaite,Dariusz Brzezinski,Eyke Hüllermeier,Vincent Lemaire,Tino Noack,Ammar Shaker,Sonja Sievi,Myra Spiliopoulou,Jerzy Stefanowski +9 more
TL;DR: This article presents a discussion on eight open challenges for data stream mining, which cover the full cycle of knowledge discovery and involve such problems as protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms.
Proceedings ArticleDOI
STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream
TL;DR: This paper focuses on hierarchical spatio-temporal hashtag clustering techniques and proposes a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy.
Journal ArticleDOI
An algorithm for local geoparsing of microtext
Judith Gelernter,Shilpa Balaji +1 more
TL;DR: The geo-parser is a method to geo-parse the short, informal messages known as microtext that uses Natural Language Processing methods to identify references to streets and addresses, buildings and urban spaces, and toponyms, and place acronyms and abbreviations.
Journal ArticleDOI
Geosocial gauge: a system prototype for knowledge discovery from social media
TL;DR: This article presents a system prototype for harvesting, processing, modeling, and integrating heterogeneous social media feeds towards the generation of geosocial knowledge, and addresses primarily two key components of this system prototype: a novel data model for heterogeneoussocial media feeds and a corresponding general system architecture.
Proceedings ArticleDOI
Mercury: A memory-constrained spatio-temporal real-time search on microblogs
TL;DR: Mercury is a system for real-time support of top-k spatio-temporal queries on microblogs, where users are able to browse recent microblogs near their locations, and employs a scalable dynamic in-memory index structure that is capable of digesting all incoming microblogs.
References
More filters
Proceedings ArticleDOI
Pig latin: a not-so-foreign language for data processing
TL;DR: A new language called Pig Latin is described, designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce, which is an open-source, Apache-incubator project, and available for general use.
Journal ArticleDOI
Aurora: a new model and architecture for data stream management
Daniel J. Abadi,Don Carney,Uğur Çetintemel,Mitch Cherniack,Christian Convey,Sangdon Lee,Michael Stonebraker,Nesime Tatbul,Stan Zdonik +8 more
TL;DR: The basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications, are described and a stream-oriented set of operators are described.
Journal ArticleDOI
The CQL continuous query language: semantic foundations and query execution
TL;DR: This paper presents the structure of CQL's query execution plans as well as details of the most important components: operators, interoperator queues, synopses, and sharing of components among multiple operators and queries.
Journal Article
The CQL Continuous Query Language : Semantic Foundations and Query Execution
Journal ArticleDOI
Eddies: continuously adaptive query processing
Ron Avnur,Joseph M. Hellerstein +1 more
TL;DR: This paper introduces a query processing mechanism called an eddy, which continuously reorders operators in a query plan as it runs, and describes the moments of symmetry during which pipelined joins can be easily reordered, and the synchronization barriers that require inputs from different sources to be coordinated.