Showing papers by "James Bailey published in 2008"

PDF

Open Access

Journal Article•DOI•

Discovering correlated spatio-temporal changes in evolving graphs

[...]

Jeffrey Chan¹, James Bailey¹, Christopher Leckie¹•Institutions (1)

02 Jul 2008-Knowledge and Information Systems

TL;DR: To discover regions of correlated spatio-temporal change in graphs, an algorithm called cSTAG is proposed, which addresses the problem of finding clusters that optimise both temporal and spatial distance measures simultaneously.

...read moreread less

Abstract: Graphs provide powerful abstractions of relational data, and are widely used in fields such as network management, web page analysis and sociology. While many graph representations of data describe dynamic and time evolving relationships, most graph mining work treats graphs as static entities. Our focus in this paper is to discover regions of a graph that are evolving in a similar manner. To discover regions of correlated spatio-temporal change in graphs, we propose an algorithm called cSTAG. Whereas most clustering techniques are designed to find clusters that optimise a single distance measure, cSTAG addresses the problem of finding clusters that optimise both temporal and spatial distance measures simultaneously. We show the effectiveness of cSTAG using a quantitative analysis of accuracy on synthetic data sets, as well as demonstrating its utility on two large, real-life data sets, where one is the routing topology of the Internet, and the other is the dynamic graph of files accessed together on the 1998 World Cup official website.

...read moreread less

59 citations

Proceedings Article•

ROC-tree: A novel decision tree induction algorithm based on receiver operating characteristics to classify gene expression data

[...]

M. Maruf Hossain¹, Hassan, James Bailey¹•Institutions (1)

University of Melbourne¹

01 Oct 2008

TL;DR: This work introduces a new technique for building decision trees that is better suited to gene expression data, based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, to help determine decision tree characteristics, such as node selection and stopping criteria.

...read moreread less

Abstract: Gene expression information from microarray experiments is a primary form of data for biological analysis and can offer insights into disease processes and cellular behaviour. Such datasets are particularly challenging to build classifiers for, due to their very high dimensional nature and small sample size. Decision trees are a seemingly attractive technique for this domain, due to their easily interpretable white box nature and noise resistance. However, existing decision tree methods tend to perform rather poorly for classifying gene expression data. To address this gap, we introduce a new technique for building decision trees that is better suited to this scenario. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, to help determine decision tree characteristics, such as node selection and stopping criteria. We experimentally compare our algorithm, called ROC-tree, against other well known decision tree techniques, on a number of gene expression datasets. The experimental results clearly demonstrate that ROC-tree can deliver better classification accuracy in a range of challenging situations. Copyright © by SIAM.

...read moreread less

25 citations

Book Chapter•DOI•

Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

[...]

Md. Rafiul Hassan¹, M. Maruf Hossain¹, James Bailey¹, Kotagiri Ramamohanarao¹•Institutions (1)

University of Melbourne¹

15 Sep 2008

TL;DR: Experiments show that the proposed method based on consideration of the area under the Receiver Operating Characteristics (ROC) curve can substantially boost the classification performance of the k-NN algorithm and is even able to deliver better accuracy than state-of-the-art non k-nn classifiers, such as support vector machines.

...read moreread less

Abstract: The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well known method to measure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.

...read moreread less

22 citations

Proceedings Article•DOI•

Mining influential attributes that capture class and group contrast behaviour

[...]

Elsa Loekito¹, James Bailey¹•Institutions (1)

University of Melbourne¹

26 Oct 2008

TL;DR: Experimental results demonstrate that the proposed technique can efficiently identify and explain contrast behaviour which would be difficult or impossible to isolate using standard techniques.

...read moreread less

Abstract: Contrast data mining is a key tool for finding differences between sets of objects, or classes, and contrast patterns are a popular method for discrimination between two classes. However, such patterns can be limited in two primary ways: i) They do not readily allow second order differentiation - i.e. discovering contrasts of contrasts, ii) Mining contrast patterns often results in an overwhelming volume of output for the user. To address these limitations, this paper proposes a method which can identify contrast behaviour across both classes and also groups of classes. Furthermore, to increase interpretability for the user, it presents a new technique for finding the attributes which represent the key underlying factors behind the contrast behaviour. The associated mining task is computationally challenging and we describe an efficient algorithm to handle it, based on binary decision diagrams. Experimental results demonstrate that our technique can efficiently identify and explain contrast behaviour which would be difficult or impossible to isolate using standard techniques.

...read moreread less

15 citations

Book Chapter•DOI•

Mining, ranking, and using acronym patterns

[...]

Xiaonan Ji¹, Gu Xu², James Bailey¹, Hang Li²•Institutions (2)

University of Melbourne¹, Microsoft²

26 Apr 2008

TL;DR: This paper presents a new and extensible approach to discover acronym patterns that can also be used for both ranking the patterns, as well as utilizing them within search queries.

...read moreread less

Abstract: Techniques for being able to automatically identify acronym patterns are very important for enhancing a multitude of applications that rely upon search. This task is challenging, due to the many ways that acronyms and their expansions can be embedded in text. Methods for ranking and exploiting acronym patterns are another related, yet mostly untouched area. In this paper we present a new and extensible approach to discover acronym patterns. Furthermore, we present a new approach that can also be used for both ranking the patterns, as well as utilizing them within search queries. In our pattern discovery system, we are able to achieve a clear separation between higher and lower level functionalities. This enables great flexibility and allows users to easily configure and tune the system for different target domains. We evaluate our system and show how it is able to offer new capabilities, compared to existing work in the area.

...read moreread less

9 citations

RegHunter: Using Graph partitioning to discover regions of correlated spatio-temporal change in evolving graphsIntelligent Data Analysis.

[...]

Jeffrey Chan, James Bailey, Christopher Leckie

20 Oct 2008

4 citations

Book Chapter•DOI•

g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines

[...]

Xiaonan Ji¹, James Bailey¹, Kotagiri Ramamohanarao¹•Institutions (1)

University of Melbourne¹

15 Oct 2008

TL;DR: The g-MARS (gapped Markov Chain with Support Vector Machine) protein classifier is presented, which models the structure of a protein sequence by measuring the transition probabilities between pairs of amino acids and can be generalized to incorporate gaps in the Markov chain.

...read moreread less

Abstract: Classifying protein sequences has important applications in areas such as disease diagnosis, treatment development and drug design. In this paper we present a highly accurate classifier called the g-MARS (gapped Markov Chain with Support Vector Machine) protein classifier. It models the structure of a protein sequence by measuring the transition probabilities between pairs of amino acids. This results in a Markov chain style model for each protein sequence. Then, to capture the similarity among non-exactly matching protein sequences, we show that this model can be generalized to incorporate gaps in the Markov chain. We perform a thorough experimental study and compare g-MARS to several other state-of-the-art protein classifiers. Overall, we demonstrate that g-MARS has superior accuracy and operates efficiently on a diverse range of protein families.

...read moreread less

3 citations

Proceedings Article•DOI•

Direct adaptive control for a mutli-compartmental model of a pressure-limited respirator and lung mechanics system

[...]

VijaySekhar Chellaboina¹, Hancao Li², Wassim M. Haddad², James Bailey•Institutions (2)

Harvard University¹, Georgia Institute of Technology²

01 Dec 2008

TL;DR: An adaptive control framework for a multi-compartmental model of a pressure-limited respirator and lung mechanics system where the plant and reference model involve switching and time-varying dynamics is developed.

...read moreread less

Abstract: In this paper, we develop an adaptive control framework for a multi-compartmental model of a pressure-limited respirator and lung mechanics system. Specifically, we develop a model reference direct adaptive controller framework where the plant and reference model involve switching and time-varying dynamics. We then apply the proposed adaptive feedback controller framework to stabilize a given limit cycle corresponding to a clinically plausible respiratory pattern.

...read moreread less

3 citations

Book Chapter•DOI•

Gene Ontology Assisted Exploratory Microarray Clustering and Its Application to Cancer

[...]

Geoff Macintyre¹, James Bailey¹, Daniel Gustafsson², Alex Boussioutas³, Izhak Haviv, Adam Kowalczyk - Show less +2 more•Institutions (3)

University of Melbourne¹, La Trobe University², Peter MacCallum Cancer Centre³

15 Oct 2008

TL;DR: A novel clustering algorithm is developed which incorporates functional gene information from the Gene Ontology into the clustering process, resulting in more biologically meaningfull clusters and shows the potential of such methods for the exploration of cancer etiology.

...read moreread less

Abstract: Gene expression profiling provides insight into the functions of genes at a molecular level. Clustering of gene expression profiles can facilitate the identification of the underlying driving biological program causing genes' co-expression. Standard clustering methods, grouping genes based on similar expression values, fail to capture weak expression correlations potentially causing genes in the same biological process to be grouped separately. We have developed a novel clustering algorithm which incorporates functional gene information from the Gene Ontology into the clustering process, resulting in more biologically meaningfull clusters. We have validated our method using a multi-cancer microarray dataset. In addition, we show the potential of such methods for the exploration of cancer etiology.

...read moreread less

2 citations

Posted Content•

Logical Queries over Views: Decidability and Expressiveness

[...]

James Bailey¹, Guozhu Dong², Anthony Widjaja To³•Institutions (3)

University of Melbourne¹, Wright State University², University of Edinburgh³

18 Mar 2008-arXiv: Logic in Computer Science

TL;DR: In this article, the problem of deciding satisfiability of first-order logic queries over views is studied, with the aim being to delimit the boundary between the decidable and the undecidable fragments of this language.

...read moreread less

Abstract: We study the problem of deciding satisfiability of first order logic queries over views, our aim being to delimit the boundary between the decidable and the undecidable fragments of this language. Views currently occupy a central place in database research, due to their role in applications such as information integration and data warehousing. Our main result is the identification of a decidable class of first order queries over unary conjunctive views that generalises the decidability of the classical class of first order sentences over unary relations, known as the Lowenheim class. We then demonstrate how various extensions of this class lead to undecidability and also provide some expressivity results. Besides its theoretical interest, our new decidable class is potentially interesting for use in applications such as deciding implication of complex dependencies, analysis of a restricted class of active database rules, and ontology reasoning.

...read moreread less

1 citations

Book•

Principles and Practice of Semantic Web Reasoning

[...]

José Júlio Alferes, James Bailey, Wolfgang May, Uta Schwertel

01 Jan 2008

TL;DR: Semantic Web Reasoning Using a Blackboard System and Effective and Efficient Data Access in the Versatile Web Query Language Xcerpt.

...read moreread less

Abstract: Session 1. Invited Talk.- The RuleML Family of Web Rule Languages.- Session 2. Reasoning I.- Automated Reasoning Support for First-Order Ontologies.- Combining Safe Rules and Ontologies by Interfacing of Reasoners.- Session 3. Applications.- Realizing Business Processes with ECA Rules: Benefits, Challenges, Limits.- Interaction Protocols and Capabilities: A Preliminary Report.- Semantic Web Reasoning for Analyzing Gene Expression Profiles.- Session 4. Querying.- Data Model and Query Constructs for Versatile Web Query Languages: State-of-the-Art and Challenges for Xcerpt.- AMa ? oS-Abstract Machine for Xcerpt: Architecture and Principles.- Towards More Precise Typing Rules for Xcerpt.- Session 5. Reasoning II.- Extending an OWL Web Node with Reactive Behavior.- Supporting Open and Closed World Reasoning on the Web.- Reasoning with Temporal Constraints in RDF.- Session 6. Reasoning III.- Bidirectional Mapping Between OWL DL and Attempto Controlled English.- XML Querying Using Ontological Information.- Semantic Web Reasoning Using a Blackboard System.- Systems Session.- Effective and Efficient Data Access in the Versatile Web Query Language Xcerpt.- Web Queries with Style: Rendering Xcerpt Programs with CSSNG.- Information Gathering in a Dynamic World.- Practice of Inductive Reasoning on the Semantic Web: A System for Semantic Web Mining.- Fuzzy Time Intervals System Description of the FuTI-Library.- A Prototype of a Descriptive Type System for Xcerpt.

...read moreread less

Proceedings Article•

Web Information Systems Engineering - WISE 2008: 9th International Conference, Auckland, New Zealand, September 1-3, 2008, Proceedings

[...]

James Bailey¹, David Maier, Klaus-Dieter Schewe², Bernhard Thalheim³, Xiaoyang Sean Wang⁴ - Show less +1 more•Institutions (4)

University of Melbourne¹, Association for Computing Machinery², University of Kiel³, Microsoft⁴

25 Sep 2008

TL;DR: This book constitutes the proceedings of the 9th International Conference on Web Information Systems Engineering, WISE 2008, held in Auckland, New Zealand, in September 2008, and contains 17 revised full papers and 14 revised short papers presented.

...read moreread less

Abstract: This book constitutes the proceedings of the 9th International Conference on Web Information Systems Engineering, WISE 2008, held in Auckland, New Zealand, in September 2008. The 17 revised full papers and 14 revised short papers presented together with two keynote talks were carefully reviewed and selected from around 110 submissions. The papers are organized in topical sections on grid computing and peer-to-peer systems; Web mining; rich Web user interfaces; semantic Web; Web information retrieval; Web data integration; queries and peer-to-peer systems; and Web services.

...read moreread less

Proceedings Article•

Exploring the benefit of contextual information for boosting TREC Genomic IR performance

[...]

Bader Aljaber¹, Nicola Stokes², James Bailey¹, Yi Li²•Institutions (2)

University of Melbourne¹, University College Dublin²

05 Dec 2008

TL;DR: The experimental results suggest that document context provides the strongest evidence of contextual information for this task, and various contextual evidence of similarity outside of the passage such as query/fulltext similarity, query/citation sentence similarity, queries/title similarity, and query/abstract similarity are investigated.

...read moreread less

Abstract: Query Expansion is a widely used technique that augments a query with synonymous and related terms in order to address a common issue in ad hoc retrieval: the vocabulary mismatch problem, where relevant documents contain query terms that are semantically similar, but lexically distinct. Standard query expansion techniques include pseudo relevance feedback and ontology-based expansion. In this paper, we explore the use of contextual information as a means of expanding the context surrounding the unit of retrieval, rather than the query, which in this case is a document passage. The ad hoc retrieval task that we focus on in this paper was investigated at the TREC 2006 Genomic tracks, where systems were required to retrieve relevant answer passages. The most commonly reported indexing strategy was passage indexing. Although this simplifies post-retrieval processing, retrieval performance can be hurt as valuable contextual information in the containing document is lost. The focus of this paper is to investigate various contextual evidence of similarity outside of the passage such as: query/fulltext similarity, query/citation sentence similarity, query/title similarity, query/abstract similarity. These similarity scores are then used to boost the rank of passages that exhibit high contextual evidence of query similarity. Our experimental results suggest that document context provides the strongest evidence of contextual information for this task.

...read moreread less