scispace - formally typeset
Journal ArticleDOI

Advances in data stream mining

Reads0
Chats0
TLDR
This paper reviews key milestones and state of the art in the data stream mining area and future insights are presented.
Abstract
Mining data streams has been a focal point of research interest over the past decade. Hardware and software advances have contributed to the significance of this area of research by introducing faster than ever data generation. This rapidly generated data has been termed as data streams. Credit card transactions, Google searches, phone calls in a city, and many others are typical data streams. In many important applications, it is inevitable to analyze this streaming data in real time. Traditional data mining techniques have fallen short in addressing the needs of data stream mining. Randomization, approximation, and adaptation have been used extensively in developing new techniques or adopting exiting ones to enable them to operate in a streaming environment. This paper reviews key milestones and state of the art in the data stream mining area. Future insights are also be presented.

read more

Citations
More filters
Journal ArticleDOI

Smart Electricity Meter Data Intelligence for Future Energy Systems: A Survey

TL;DR: A comprehensive survey of smart electricity meters and their utilization is presented focusing on key aspects of the metering process, different stakeholder interests, and the technologies used to satisfy stakeholder interest.
Journal ArticleDOI

A survey on data preprocessing for data stream mining

TL;DR: This survey summarizes, categorize and analyze those contributions on data preprocessing that cope with streaming data, and takes into account the existing relationships between the different families of methods (feature and instance selection, and discretization).
Journal ArticleDOI

Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach

TL;DR: In this paper, the Gaussian Hellinger Very Fast Decision Tree (GH-VFDT) was used to select promising candidates using a purpose-built tree-based machine learning classifier.
Journal ArticleDOI

Data mining tools

TL;DR: This paper attempts to support the decision‐making process by discussing the historical development and presenting a range of existing state‐of‐the‐art data mining and related tools, and proposes criteria for the tool categorization based on different user groups, data structures, data mining tasks and methods.
Journal ArticleDOI

Kappa Updated Ensemble for drifting data stream mining

TL;DR: KUE is a combination of online and block-based ensemble approaches that uses Kappa statistic for dynamic weighting and selection of base classifiers and is capable of outperforming state-of-the-art ensembles on standard and imbalanced drifting data streams while having a low computational complexity.
References
More filters
Book ChapterDOI

Probability Inequalities for sums of Bounded Random Variables

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Proceedings ArticleDOI

BIRCH: an efficient data clustering method for very large databases

TL;DR: Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) as discussed by the authors is a data clustering method that is especially suitable for very large databases.
Proceedings ArticleDOI

Models and issues in data stream systems

TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Proceedings ArticleDOI

Mining high-speed data streams

TL;DR: This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example, and applies it to mining the continuous stream of Web access data from the whole University of Washington main campus.
Proceedings ArticleDOI

A symbolic representation of time series, with implications for streaming algorithms

TL;DR: A new symbolic representation of time series is introduced that is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.
Related Papers (5)