scispace - formally typeset
Search or ask a question

Showing papers by "James Bailey published in 2009"


Proceedings ArticleDOI
14 Jun 2009
TL;DR: This paper derives the analytical formula for the expected mutual information value between a pair of clusterings, and proposes the adjusted version for several popular information theoretic based measures.
Abstract: Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and usefulness of the adjusted measures.

748 citations


Journal ArticleDOI
TL;DR: The challenges and opportunities of feedback control using nonnegative and compartmental system theory for the specific problem of closed-loop control of intensive care unit sedation are discussed.

33 citations


Proceedings Article
01 Dec 2009
TL;DR: This paper addresses the novel problem of querying evolving graphs using spatio-temporal patterns by answering selection queries, which can discover evolving subgraphs that satisfy both a temporal and a spatial predicate.
Abstract: An evolving graph is a graph that can change over time. Such graphs can be applied in modelling a wide range of real-world phenomena, like computer networks, social networks and protein interaction networks. This paper addresses the novel problem of querying evolving graphs using spatio-temporal patterns. In particular, we focus on answering selection queries, which can discover evolving subgraphs that satisfy both a temporal and a spatial predicate. We investigate the efficient implementation of such queries and experimentally evaluate our techniques using real-world evolving graph datasets --- Internet connectivity logs and the Enron email corpus. We show that is possible to use queries to discover meaningful events hidden in this data and demonstrate that our implementation is scalable for very large evolving graphs.

16 citations


Journal ArticleDOI
TL;DR: It is shown that it is possible to obtain superior classification accuracy with this approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.
Abstract: Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage. By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer. We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.

15 citations


Proceedings Article
01 Dec 2009
TL;DR: This paper investigates the suitability of using a new feature weighting scheme for SVM kernel functions, based on receiver operating characteristics (ROC), and demonstrates that it can significantly and substantially boost classification performance, across a range of datasets.
Abstract: Support Vector Machines (SVMs) are a leading tool in classification and pattern recognition and the kernel function is one of its most important components. This function is used to map the input space into a high dimensional feature space. However, it can perform rather poorly when there are too many dimensions (e.g. for gene expression data) or when there is a lot of noise. In this paper, we investigate the suitability of using a new feature weighting scheme for SVM kernel functions, based on receiver operating characteristics (ROC). This strategy is clean, simple and surprisingly effective. We experimentally demonstrate that it can significantly and substantially boost classification performance, across a range of datasets.

13 citations


Book ChapterDOI
06 Oct 2009
TL;DR: This paper describes an approach to improving the robustness of an agent system by augmenting its failure-handling capabilities, based on the concept of semantic compensation: "cleaning up" failed or canceled tasks can help agents behave more robustly and predictably at both an individual and system level.
Abstract: This paper describes an approach to improving the robustness of an agent system by augmenting its failure-handling capabilities. The approach is based on the concept of semantic compensation: "cleaning up" failed or canceled tasks can help agents behave more robustly and predictably at both an individual and system level. Our approach is goal-based, both with respect to defining failure-handling knowledge, and in specifying a failure-handling model that makes use of this knowledge. By abstracting failure-handling above the level of specific actions or task implementations, it is not tied to specific agent architectures or task plans and is more widely applicable. The failure-handling knowledge is employed via a failure-handling support component associated with each agent through a goal-based interface. The use of this component decouples the specification and use of failure-handling information from the specification of the agent's domain problem-solving knowledge, and reduces the failure-handling information that an agent developer needs to provide.

13 citations


Journal ArticleDOI
TL;DR: In this paper, a direct adaptive disturbance rejection control framework for compartmental dynamical systems with exogenous bounded disturbances is proposed, which guarantees partial asymptotic stability with respect to part of the closed-loop system states associated with the plant dynamics.
Abstract: Compartmental system models involve dynamic states whose values are nonnegative. These models are widespread in biological and physiological sciences and play a key role in understanding these processes. In this paper, we develop a direct adaptive disturbance rejection control framework for compartmental dynamical systems with exogenous bounded disturbances. The proposed framework is Lyapunov based and guarantees partial asymptotic stability of the closed-loop system, that is, asymptotic stability with respect to part of the closed-loop system states associated with the plant dynamics. The remainder of the states associated with the adaptive controller gains are shown to be Lyapunov stable. In the case of bounded energy ℒ2 disturbances, the proposed approach guarantees a nonexpansivity constraint on the closed-loop input–output map between the plant disturbances and performance variables. Finally, a numerical example involving the infusion of the anesthetic drug propofol for maintaining a desired constant level of depth of anesthesia for surgery in the face of continuing hemorrhage and hemodilution is provided. Copyright © 2008 John Wiley & Sons, Ltd.

13 citations


Book ChapterDOI
19 Apr 2009
TL;DR: This paper investigates whether expressive contrasts are beneficial for classification by adopting a statistical methodology for eliminating noisy patterns and identifying circumstances where expressive patterns can improve over previous contrast pattern based classifiers.
Abstract: Classification is an important task in data mining. Contrast patterns, such as emerging patterns, have been shown to be powerful for building classifiers, but they rarely exist in sparse data. Recently proposed disjunctive emerging patterns are highly expressive, and can potentially overcome this limitation. Simple contrast patterns only allow simple conjunctions, whereas disjunctive patterns additionally allow expressions of disjunctions. This paper investigates whether expressive contrasts are beneficial for classification. We adopt a statistical methodology for eliminating noisy patterns. Our experiments identify circumstances where expressive patterns can improve over previous contrast pattern based classifiers. We also present some guidelines for i) using expressive patterns based on the nature of the given data, ii) how to choose between the different types of contrast patterns for building a classifier.

12 citations


Book ChapterDOI
01 Jan 2009
TL;DR: A wide range of query languages for the Semantic Web exist, ranging from pure “selection languages” with only limited expressivity, to fully-fledged reasoning languages, to general purpose languages that support multiple data representation formats and allow simultaneous querying of data on both the standard andSemantic Web.
Abstract: DEFINITION A number of formalisms have been proposed for representing data and meta data on the Semantic Web. In particular, RDF, Topic Maps and OWL allow one to describe relationships between data items, such as concept hierarchies and relations between the concepts. A key requirement for the Semantic Web is integrated access to data represented in any of these formalisms, as well the ability to also access data in the formalisms of the “standard Web”, such as (X)HTML and XML. This data access is the objective of Semantic Web query languages. A wide range of query languages for the Semantic Web exist, ranging from i) pure “selection languages” with only limited expressivity, to fully-fledged reasoning languages, and ii) from query languages restricted to a certain data representation format, such as XML or RDF, to general purpose languages that support multiple data representation formats and allow simultaneous querying of data on both the standard and Semantic Web.

11 citations


Journal ArticleDOI
01 Oct 2009
TL;DR: This work investigates a new type of dynamic graph analysis - finding regions of a graph that are evolving in a similar manner and are topologically similar over a period of time and proposes a new algorithm called regHunter, which treats the region discovery problem as a multi-objective optimisation problem, and it uses amulti-level graph partitioning algorithm to discover the regions of correlated change.
Abstract: There is growing interest in studying dynamic graphs, or graphs that evolve with time. In this work, we investigate a new type of dynamic graph analysis - finding regions of a graph that are evolving in a similar manner and are topologically similar over a period of time. For example, these regions can be used to group a set of changes having a common cause in event detection and fault diagnosis. Prior work [6] has proposed a greedy framework called cSTAG to find these regions. It was accurate in datasets where the regions are temporally and spatially well separated. However, in cases where the regions are not well separated, cSTAG produces incorrect groupings. In this paper, we propose a new algorithm called regHunter. It treats the region discovery problem as a multi-objective optimisation problem, and it uses a multi-level graph partitioning algorithm to discover the regions of correlated change. In addition, we propose an external clustering validation technique, and use several existing internal measures to evaluate the accuracy of regHunter. Using synthetic datasets, we found regHunter is significantly more accurate than cSTAG in dynamic graphs that have regions with small separation. Using two real datasets - the access graph of the 1998 World Cup website, and the BGP connectivity graph during the landfall of Hurricane Katrina - we found regHunter obtained more accurate results than cSTAG. Furthermore, regHunter was able to discover two interesting regions for the World Cup access graph that CSTAG was not able to find.

7 citations


01 Jan 2009
TL;DR: In this article, the authors focus on answering selection queries, which can discover evolving subgraphs that satisfy both a temporal and a spatial predicate and investigate the efficient implementation of such queries and experimentally evaluate their techniques using real-world evolving graph datasets.
Abstract: An evolving graph is a graph that can change over time. Such graphs can be applied in modelling a wide range of real-world phenomena, like computer networks, social networks and protein interaction networks. This paper addresses the novel problem of querying evolving graphs using spatio-temporal patterns. In particular, we focus on answering selection queries, which can discover evolving subgraphs that satisfy both a temporal and a spatial predicate. We investigate the efficient implementation of such queries and experimentally evaluate our techniques using real-world evolving graph datasets - Internet connectivity logs and the Enron email corpus. We show that is possible to use queries to discover meaningful events hidden in this data and demonstrate that our implementation is scalable for very large evolving graphs. © 2009, Australian Computer Society, Inc.

Journal ArticleDOI
TL;DR: This work claims that execution logging is essential to support agent system robustness, and that agents should have architectural-level support for logging and recovery methods, and describes an infrastructure-level, default methodology for agent problem-handling, based on logging, and supported by declaratively encoding domain-specific knowledge related to changes in goal status and semantic compensations.
Abstract: In an agent system, the ability to handle problems and recover from them is important in sustaining stability and providing robustness. We claim that execution logging is essential to support agent system robustness, and that agents should have architectural-level support for logging and recovery methods. We describe an infrastructure-level, default methodology for agent problem-handling, based on logging, and supported by declaratively encoding domain-specific knowledge related to changes in goal status and semantic compensations. Via logging, the approach allows repair of already-completed as well as current goals. We define a language, APLR, to support and constrain incremental specification of problem-handling information, with the agents' problem-handling behaviour increasing in sophistication as more knowledge is added to the system. The approach is implemented by mapping the methodology and domain knowledge to 3APL-like plan rules extended to support logging.