scispace - formally typeset
Search or ask a question
Topic

Data stream

About: Data stream is a research topic. Over the lifetime, 15863 publications have been published within this topic receiving 231672 citations.


Papers
More filters
Proceedings ArticleDOI
03 Jun 2002
TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Abstract: In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.

2,933 citations

Patent
06 Jun 1995
TL;DR: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context, is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback as mentioned in this paper.
Abstract: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context. The apparatus receives an input from the user and other data. A predicted input is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback. Also provided is a pattern recognition system for a multimedia device, wherein a user input is matched to a video stream on a conceptual basis, allowing inexact programming of a multimedia device. The system analyzes a data stream for correspondence with a data pattern for processing and storage. The data stream is subjected to adaptive pattern recognition to extract features of interest to provide a highly compressed representation which may be efficiently processed to determine correspondence. Applications of the interface and system include a VCR, medical device, vehicle control system, audio device, environmental control system, securities trading terminal, and smart house. The system optionally includes an actuator for effecting the environment of operation, allowing closed-loop feedback operation and automated learning.

1,976 citations

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.

1,939 citations

Book ChapterDOI
09 Sep 2003
TL;DR: A fundamentally different philosophy for data stream clustering is discussed which is guided by application-centered requirements and uses the concepts of a pyramidal time frame in conjunction with a microclustering approach.
Abstract: The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inefficient. In recent years, a few one-pass clustering algorithms have been developed for the data stream problem. Although such methods address the scalability issues of the clustering problem, they are generally blind to the evolution of the data and do not address the following issues: (1) The quality of the clusters is poor when the data evolves considerably over time. (2) A data stream clustering algorithm requires much greater functionality in discovering and exploring clusters over different portions of the stream. The widely used practice of viewing data stream clustering algorithms as a class of one-pass clustering algorithms is not very useful from an application point of view. For example, a simple one-pass clustering algorithm over an entire data stream of a few years is dominated by the outdated history of the stream. The exploration of the stream over different time windows can provide the users with a much deeper understanding of the evolving behavior of the clusters. At the same time, it is not possible to simultaneously perform dynamic clustering over all possible time horizons for a data stream of even moderately large volume. This paper discusses a fundamentally different philosophy for data stream clustering which is guided by application-centered requirements. The idea is divide the clustering process into an online component which periodically stores detailed summary statistics and an offine component which uses only this summary statistics. The offine component is utilized by the analyst who can use a wide variety of inputs (such as time horizon or number of clusters) in order to provide a quick understanding of the broad clusters in the data stream. The problems of efficient choice, storage, and use of this statistical data for a fast data stream turns out to be quite tricky. For this purpose, we use the concepts of a pyramidal time frame in conjunction with a microclustering approach. Our performance experiments over a number of real and synthetic data sets illustrate the effectiveness, efficiency, and insights provided by our approach.

1,836 citations

Journal ArticleDOI
TL;DR: Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.
Abstract: In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [1].

1,598 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
87% related
Network packet
159.7K papers, 2.2M citations
84% related
Wireless sensor network
142K papers, 2.4M citations
84% related
Artificial neural network
207K papers, 4.5M citations
83% related
Wireless network
122.5K papers, 2.1M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202396
2022243
2021337
2020695
2019843
2018759