scispace - formally typeset
Search or ask a question
Author

Amineh Amini

Bio: Amineh Amini is an academic researcher from Information Technology University. The author has contributed to research in topics: Cluster analysis & Data stream clustering. The author has an hindex of 12, co-authored 23 publications receiving 640 citations. Previous affiliations of Amineh Amini include Islamic Azad University & University of Malaya.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper summarizes the main density-based clustering algorithms on data streams, discusses their uniqueness and limitations, but also explains how they address the challenges in clustering data streams and investigates the evaluation metrics used in validating cluster quality and measuring algorithms’ performance.
Abstract: Clustering data streams has drawn lots of attention in the last few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the number of clusters in advance. Due to data stream characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density-based methods in the clustering process and at the same time overcoming the constraints, which are put out by data stream’s nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms’ performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms.

183 citations

Journal ArticleDOI
01 Apr 2014-Energy
TL;DR: In this paper, the polynomial and radial basis function (RBF) are applied as the kernel function of Support Vector Regression (SVR) for prediction of wind turbine reaction torque.

113 citations

Journal ArticleDOI
TL;DR: The imperialist competitive algorithm (ICA) is modified with a density-based algorithm and fuzzy logic for optimum clustering in WSNs and achieves higher detection accuracy 87% and clustering quality 0.99 compared to existing approaches.

106 citations

Proceedings ArticleDOI
26 Jul 2011
TL;DR: This paper reviews the grid based clustering algorithms that use density-based algorithms or density concept for the clustering and discusses about how well the algorithms address the challenging issues in the clustered data streams.
Abstract: Clustering data streams attracted many researchers since the applications that generate data streams have become more popular. Several clustering algorithms have been introduced for data streams based on distance which are incompetent to find clusters of arbitrary shapes and cannot handle the outliers. Density-based clustering algorithms are remarkable not only to find arbitrarily shaped clusters but also to deal with noise in data. In density-based clustering algorithms, dense areas of objects in the data space are considered as clusters which are segregated by low-density area. Another group of the clustering methods for data streams is grid-based clustering where the data space is quantized into finite number of cells which form the grid structure and perform clustering on the grids. Grid-based clustering maps the infinite number of data records in data streams to finite numbers of grids. In this paper we review the grid based clustering algorithms that use density-based algorithms or density concept for the clustering. We called them density-grid clustering algorithms. We explore the algorithms in details and the merits and limitations of them. The algorithms are also summarized in a table based on the important features. Besides that, we discuss about how well the algorithms address the challenging issues in the clustering data streams.

79 citations

Journal ArticleDOI
TL;DR: The proposed MuDi-Stream algorithm improves clustering quality in multi-density environments and is evaluated on various synthetic and real-world datasets using different quality metrics and further, scalability results are compared.

64 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: This review will expose four main components of time-series clustering and is aimed to represent an updated investigation on the trend of improvements in efficiency, quality and complexity of clustering time- series approaches during the last decade and enlighten new paths for future works.

1,235 citations

Journal ArticleDOI
TL;DR: A survey of data stream clustering algorithms is presented, providing a thorough discussion of the main design components of state-of-the-art algorithms and an overview of the usually employed experimental methodologies.
Abstract: Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.

479 citations

Journal ArticleDOI
TL;DR: This work presents the adaptive random forest (ARF) algorithm, which includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets.
Abstract: Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms. In this work, we present the adaptive random forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating random forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources.

442 citations

Journal Article
TL;DR: In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.
Abstract: Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

321 citations