# Adaptive Drift Detection Mechanism for Non-Stationary Data Stream

VIT University

^{1}12 Mar 2021-Journal of Information & Knowledge Management (World Scientific Publishing Company)-Vol. 20, Iss: 01, pp 2150008

TL;DR: The proposed Adaptive Drift Detection Method (ADDM) uses a new parameter to detect the gradual drift in order to reduce the detection delay and false-positive rate (FPR) while preserving high classification accuracy.

Abstract: Mining is a challenging and important task in a non-stationary data stream. It is used in financial sectors, web log analysis, sensor networks, network traffic management, etc. In this environment,...

##### Citations

More filters

••

TL;DR: This study proposes a SNN-RODE based LapRLS heterogeneous network data classification algorithm to achieve deep embedding of structure and semantics among nodes by constructing a multitask SNN and selecting dead song datasets to perform mining tasks to train the neural network.

Abstract: Data classification is one of the main tasks in the current data mining field, and the existing network data triage algorithms have problems such as too small a proportion of labeled samples, a large amount of noise, and redundant data, which lead to low classification accuracy of data stream implementation. Network embedding can effectively improve these problems, but the network embedding itself has problems such as capturing relational honor and ambiguity. This study proposes a SNN-RODE based LapRLS heterogeneous network data classification algorithm to achieve deep embedding of structure and semantics among nodes by constructing a multitask SNN and selecting dead song datasets to perform mining tasks to train the neural network. Then a semisupervised learning classifier based on Laplace regular least squares regression model is designed to use the relative support difference function as the decision method and optimize the function. The simulation experimental results show that the SNN-RODE-LapRLS algorithm improves the performance by 14%-51% over the mainstream classification algorithms, and the consumption time meets the demand of real-time classification.

##### References

More filters

•

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.

Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

10,306 citations

••

TL;DR: This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task and measures the power (ability to detect algorithm differences when they do exist) of these tests.

Abstract: This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist)...

3,356 citations

••

TL;DR: The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art and aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

Abstract: Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

2,374 citations

••

01 Aug 2000TL;DR: This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example, and applies it to mining the continuous stream of Web access data from the whole University of Washington main campus.

Abstract: Many organizations today have more than very large databases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example. VFDT can incorporate tens of thousands of examples per second using off-the-shelf hardware. It uses Hoeffding bounds to guarantee that its output is asymptotically nearly identical to that of a conventional learner. We study VFDT's properties and demonstrate its utility through an extensive set of experiments on synthetic data. We apply VFDT to mining the continuous stream of Web access data from the whole University of Washington main campus.

2,171 citations

••

TL;DR: This work presents a comprehensive survey of the advances with ABC and its applications and it is hoped that this survey would be very beneficial for the researchers studying on SI, particularly ABC algorithm.

Abstract: Swarm intelligence (SI) is briefly defined as the collective behaviour of decentralized and self-organized swarms. The well known examples for these swarms are bird flocks, fish schools and the colony of social insects such as termites, ants and bees. In 1990s, especially two approaches based on ant colony and on fish schooling/bird flocking introduced have highly attracted the interest of researchers. Although the self-organization features are required by SI are strongly and clearly seen in honey bee colonies, unfortunately the researchers have recently started to be interested in the behaviour of these swarm systems to describe new intelligent approaches, especially from the beginning of 2000s. During a decade, several algorithms have been developed depending on different intelligent behaviours of honey bee swarms. Among those, artificial bee colony (ABC) is the one which has been most widely studied on and applied to solve the real world problems, so far. Day by day the number of researchers being interested in ABC algorithm increases rapidly. This work presents a comprehensive survey of the advances with ABC and its applications. It is hoped that this survey would be very beneficial for the researchers studying on SI, particularly ABC algorithm.

1,645 citations