Showing papers by "James C. Bezdek published in 2017"

PDF

Open Access

Proceedings Article•DOI•

Fast Memory Efficient Local Outlier Detection in Data Streams (Extended Abstract)

[...]

Mahsa Salehi¹, Christopher Leckie², James C. Bezdek², Tharshan Vaithianathan², Xuyun Zhang³ - Show less +1 more•Institutions (3)

IBM¹, University of Melbourne², University of Auckland³

01 Apr 2017

TL;DR: In this paper, the authors propose a memory efficient incremental local outlier detection algorithm for data streams, and a more flexible version (MiLOF F), both have an accuracy close to iLOF but within a fixed memory bound.

...read moreread less

Abstract: Outlier detection is an important task in data mining. With the growing need to analyze high speed data streams, the task of outlier detection becomes even more challenging as traditional outlier detection techniques can no longer assume that all the data can be stored for processing. While the wellknown Local Outlier Factor (LOF) algorithm has an incremental version (called iLOF), it assumes unbounded memory to keep all previous data points. In this paper, we propose a memory efficient incremental local outlier (MiLOF) detection algorithm for data streams, and a more flexible version (MiLOF F), both have an accuracy close to iLOF but within a fixed memory bound. In addition MiLOF F is robust to changes in the number of data points, underlying clusters and dimensions in the data stream.

...read moreread less

55 citations

Journal Article•DOI•

A visual-numeric approach to clustering and anomaly detection for trajectory data

[...]

Dheeraj Kumar¹, James C. Bezdek¹, Sutharshan Rajasegarar¹, Christopher Leckie¹, Marimuthu Palaniswami¹ - Show less +1 more•Institutions (1)

University of Melbourne¹

01 Mar 2017-The Visual Computer

TL;DR: It is shown that the novel two-stage clusiVAT approach can produce natural and informative trajectory clusters on this real life dataset while finding representative anomalies.

...read moreread less

Abstract: This paper proposes a novel application of Visual Assessment of Tendency (VAT)-based hierarchical clustering algorithms (VAT, iVAT, and clusiVAT) for trajectory analysis. We introduce a new clustering based anomaly detection framework named iVAT+ and clusiVAT+ and use it for trajectory anomaly detection. This approach is based on partitioning the VAT-generated Minimum Spanning Tree based on an efficient thresholding scheme. The trajectories are classified as normal or anomalous based on the number of paths in the clusters. On synthetic datasets with fixed and variable numbers of clusters and anomalies, we achieve 98 % classification accuracy. Our two-stage clusiVAT method is applied to 26,039 trajectories of vehicles and pedestrians from a parking lot scene from the real life MIT trajectories dataset. The first stage clusters the trajectories ignoring directionality. The second stage divides the clusters obtained from the first stage by considering trajectory direction. We show that our novel two-stage clusiVAT approach can produce natural and informative trajectory clusters on this real life dataset while finding representative anomalies.

...read moreread less

54 citations

Journal Article•DOI•

Extending Information-Theoretic Validity Indices for Fuzzy Clustering

[...]

Yang Lei¹, James C. Bezdek¹, Jeffrey Chan¹, Nguyen Xuan Vinh¹, Simone Romano¹, James Bailey¹ - Show less +2 more•Institutions (1)

University of Melbourne¹

01 Aug 2017-IEEE Transactions on Fuzzy Systems

TL;DR: A normalized version of the soft mutual information cluster validity index (NMI sM) is advocated as the best overall choice, as it outperforms the other seven indices for both FCM and EM according to tests on synthetic and real data.

...read moreread less

Abstract: Previously, eight popular information-theoretic-based cluster validity indices have been generalized and tested for probabilistic partitions built by the expectation-maximization (EM) algorithm for the Gaussian mixture model. However, the analysis was limited to probabilistic clusters, and there were limited explanations for differences in the performance of the indices. In this paper, we extend the tests to partitions found by fuzzy c-means (FCM) and provide further explanations and insights about the performance of these indices. Of the eight generalized indices, we advocate a normalized version of the soft mutual information cluster validity index (NMI sM) as the best overall choice, as it outperforms the other seven indices for both FCM and EM according to our tests on synthetic and real data. The superiority of NMIsM is most pronounced for datasets with overlapped and/or varying-sized clusters. Finally, we provide a theoretical analysis, which helps explain the superior performance of NMIsM compared with the other three normalizations of soft mutual information.

...read moreread less

35 citations

Journal Article•DOI•

Ground truth bias in external cluster validity indices

[...]

Yang Lei¹, James C. Bezdek¹, Simone Romano¹, Nguyen Xuan Vinh¹, Jeffrey Chan², James Bailey¹ - Show less +2 more•Institutions (2)

University of Melbourne¹, RMIT University²

01 May 2017-Pattern Recognition

TL;DR: In this paper, a new type of bias arising from the distribution of the ground truth (reference) partition against which candidate partitions are compared is identified, and theoretical explanations for understanding why and when GT bias happens.

...read moreread less

31 citations

Journal Article•DOI•

Exponentially Weighted Ellipsoidal Model for Anomaly Detection

[...]

Masud Moshtaghi¹, Sarah M. Erfani¹, Christopher Leckie¹, James C. Bezdek¹•Institutions (1)

University of Melbourne¹

01 Sep 2017-International Journal of Intelligent Systems

TL;DR: This paper generalizes an online efficient anomaly detection technique called iterative data capture anomaly detection to adapt to changes in the data stream by exponentially weighting past observations and illustrates the efficiency and accuracy of the approach compared to existing methods.

...read moreread less

Abstract: Efficient localized data modeling techniques in Internet of Things (IoT) applications enable the nodes to change their behavior upon observing events of interest. Additionally, battery-powered IoT nodes can conserve their energy resources by limiting their data communications to specific events. Despite the real-time nature of the data collected in the IoT and limited memory and computational resources, most of the current data modeling approaches for the IoT involve batch training. Recently, an online efficient anomaly detection technique called iterative data capture anomaly detection has been proposed for environmental sensing and monitoring applications. However, this approach cannot handle changing environments. So far, efforts in extending this algorithm to adapt to changes in the environment have met with limited success. In this paper, we generalize this algorithm to adapt to changes in the data stream by exponentially weighting past observations. We illustrate the proposed algorithm with numerical results on both real-life and simulated data sets, which demonstrate the efficiency and accuracy of our approach compared to existing methods.

...read moreread less

7 citations

Proceedings Article•DOI•

Fuzzy c-Shape: A new algorithm for clustering finite time series waveforms

[...]

Fateme Fahiman¹, James C. Bezdek¹, Sarah M. Erfani¹, Marimuthu Palaniswami¹, Christopher Leckie¹ - Show less +1 more•Institutions (1)

University of Melbourne¹

01 Jul 2017

TL;DR: In this paper, two new fuzzy c-means derivatives, Fuzzy c-shapes plus (FCS+) and FuzzY c-Shapes double plus, were proposed.

...read moreread less

Abstract: The existence of large volumes of time series data in many applications has motivated data miners to investigate specialized methods for mining time series data. Clustering is a popular data mining method due to its powerful exploratory nature and its usefulness as a preprocessing step for other data mining techniques. This article develops two novel clustering algorithms for time series data that are extensions of a crisp c-shapes algorithm. The two new algorithms are heuristic derivatives of fuzzy c-means (FCM). Fuzzy c-Shapes plus (FCS+) replaces the inner product norm in the FCM model with a shape-based distance function. Fuzzy c-Shapes double plus (FCS++) uses the shape-based distance, and also replaces the FCM cluster centers with shape-extracted prototypes. Numerical experiments on 48 real time series data sets show that the two new algorithms outperform state-of-the-art shape-based clustering algorithms in terms of accuracy and efficiency. Four external cluster validity indices (the Rand index, Adjusted Rand Index, Variation of Information, and Normalized Mutual Information) are used to match candidate partitions generated by each of the studied algorithms. All four indices agree that for these finite waveform data sets, FCS++ gives a small improvement over FCS+, and in turn, FCS+ is better than the original crisp c-shapes method. Finally, we apply two tests of statistical significance to the three algorithms. The Wilcoxon and Friedman statistics both rank the three algorithms in exactly the same way as the four cluster validity indices.

...read moreread less

5 citations

Posted Content•

An ensemble-based online learning algorithm for streaming data.

[...]

Tien Thanh Nguyen, Thi Thu Thuy Nguyen, Xuan Cuong Pham, Alan Wee-Chung Liew, James C. Bezdek - Show less +1 more

26 Apr 2017-arXiv: Learning

TL;DR: An ensemble of base classifiers in this approach is obtained by learning Naive Bayes classifiers on different training sets which are generated by projecting the original training set to lower dimensional space.

...read moreread less

Abstract: In this study, we introduce an ensemble-based approach for online machine learning. The ensemble of base classifiers in our approach is obtained by learning Naive Bayes classifiers on different training sets which are generated by projecting the original training set to lower dimensional space. We propose a mechanism to learn sequences of data using data chunks paradigm. The experiments conducted on a number of UCI datasets and one synthetic dataset demonstrate that the proposed approach performs significantly better than some well-known online learning algorithms.

...read moreread less

4 citations