scispace - formally typeset
Search or ask a question

Showing papers by "Anthony J. Bagnall published in 2005"


Book ChapterDOI
18 May 2005
TL;DR: This work introduces a new technique based on a bit level approximation of the data that allows raw data to be directly compared to the reduced representation, while still guaranteeing lower bounds to Euclidean distance.
Abstract: Because time series are a ubiquitous and increasingly prevalent type of data, there has been much research effort devoted to time series data mining recently. As with all data mining problems, the key to effective and scalable algorithms is choosing the right representation of the data. Many high level representations of time series have been proposed for data mining. In this work, we introduce a new technique based on a bit level approximation of the data. The representation has several important advantages over existing techniques. One unique advantage is that it allows raw data to be directly compared to the reduced representation, while still guaranteeing lower bounds to Euclidean distance. This fact can be exploited to produce faster exact algorithms for similarly search. In addition, we demonstrate that our new representation allows time series clustering to scale to much larger datasets.

124 citations


Journal ArticleDOI
TL;DR: An agent based computational economics approach for studying the effect of alternative structures and mechanisms on behavior in electricity markets and the potential benefit of an evolutionary economics approach to market modeling is demonstrated.
Abstract: The deregulation of electricity markets has continued apace around the globe. The best structure for deregulated markets is a subject of much debate, and the consequences of poor structural choices can be dramatic. Understanding the effect of structure on behavior is essential, but the traditional economics approaches of field studies and experimental studies are particularly hard to conduct in relation to electricity markets. This paper describes an agent based computational economics approach for studying the effect of alternative structures and mechanisms on behavior in electricity markets. Autonomous adaptive agents, using hierarchical learning classifier systems, learn through competition in a simulated model of the UK market in electricity generation. The complex agent structure was developed through a sequence of experimentation to test whether it was capable of meeting the following requirements: first, that the agents are able to learn optimal strategies when competing against nonadaptive agents; second, that the agents are able to learn strategies observable in the real world when competing against other adaptive agents; and third, that cooperation without explicit communication can evolve in certain market situations. The potential benefit of an evolutionary economics approach to market modeling is demonstrated by examining the effects of alternative payment mechanisms on the behavior of agents.

98 citations


Journal ArticleDOI
TL;DR: It is shown that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy.
Abstract: Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. Time series data are often large and may contain outliers. We show that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy. We also demonstrate that clipping increases clustering accuracy when there are outliers in the data, thus serving as a means of outlier detection and a method of identifying model misspecification. We consider simulated data from polynomial, autoregressive moving average and hidden Markov models and show that the estimated parameters of the clipped data used in clustering tend, asymptotically, to those of the unclipped data. We also demonstrate experimentally that, if the series are long enough, the accuracy on clipped data is not significantly less than the accuracy on unclipped data, and if the series contain outliers then clipping results in significantly better clusterings. We then illustrate how using clipped series can be of practical benefit in detecting model misspecification and outliers on two real world data sets: an electricity generation bid data set and an ECG data set.

77 citations


Book ChapterDOI
01 Jan 2005
TL;DR: A maze is a grid-like two-dimensional area of any size, usually rectangular as mentioned in this paper, where the goal is to learn a policy to reach food as fast as possible from any square.
Abstract: A maze is a grid-like two-dimensional area of any size, usually rectangular. A maze consists of cells. A cell is an elementary maze item, a formally bounded space, interpreted as a single site. The maze may contain different obstacles in any quantity. Some may be significant for learning purposes, like virtual food. The agent is randomly placed in the maze on an empty cell. The agent is allowed to move in all directions, but only through empty space. The task is to learn a policy to reach food as fast as possible from any square. Once the food is reached, the agent position is reset to a random one and the task repeated.

42 citations


Book ChapterDOI
18 May 2005
TL;DR: This paper describes an alternative distance measure based on the likelihood ratio statistic to test the hypothesis of difference between series, and compares the new distance measure to Euclidean distance on five types of data with varying levels of compression.
Abstract: Fast Fourier Transforms (FFTs) have been a popular transformation and compression technique in time series data mining since first being proposed for use in this context in [1]. The Euclidean distance between coefficients has been the most commonly used distance metric with FFTs. However, on many problems it is not the best measure of similarity available. In this paper we describe an alternative distance measure based on the likelihood ratio statistic to test the hypothesis of difference between series. We compare the new distance measure to Euclidean distance on five types of data with varying levels of compression. We show that the likelihood ratio measure is better at discriminating between series from different models and grouping series from the same model.

32 citations


Proceedings ArticleDOI
12 Dec 2005
TL;DR: The results show that AgentP often outperforms (and always at least matches) the performance of other techniques and, on the large majority of mazes used, learns optimal or near optimal solutions with fewer trials and a smaller classifier population.
Abstract: Learning classifier systems belong to the class of algorithms based on the principle of self-organization and evolution and have frequently been applied to mazes, an important type of reinforcement learning problem. Mazes may contain aliasing cells, i.e. squares in a different location that look identical to an agent with limited perceptive power. Mazes with aliasing squares present a particular difficult learning problem. As a possible approach to the problem, AgentP, a learning classifier system with associative perception, was recently introduced. AgentP is based on the psychological model of associative perception learning and operates explicitly imprinted images of the environment states. Two types of learning mode are described: the first, self-adjusting AgentP, is more flexible and adapts rapidly to changing information; the second, gradual AgentP, is more conservative in drawing conclusions and rigid when it comes to revising strategy. The performance of both systems is tested on existing and new aliasing environments. The results show that AgentP often outperforms (and always at least matches) the performance of other techniques and, on the large majority of mazes used, learns optimal or near optimal solutions with fewer trials and a smaller classifier population.

10 citations


23 Apr 2005
TL;DR: Two refinements of FASBIR are proposed and evaluated on several very large data sets and shown to be feasible and effective.
Abstract: Filtered Attribute Subspace based Bagging with Injected Randomness (FASBIR) is a recently proposed algorithm for ensembles of k-nn classifiers [28]. FASBIR works by first performing a global filtering of attributes using information gain, then randomising the bagged ensemble with random subsets of the remaining attributes and random distance metrics. In this paper we propose two refinements of FASBIR and evaluate them on several very large data sets.

1 citations