Showing papers by "Stephen Muggleton published in 1998"

PDF

Open Access

Journal Article•DOI•

Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL

[...]

Paul W. Finn¹, Stephen Muggleton², David C. Page³, Ashwin Srinivasan⁴•Institutions (4)

Pfizer¹, University of York², University of Louisville³, University of Oxford⁴

01 Feb 1998-Machine Learning

TL;DR: This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design, and the Inductive Logic Programming (ILP) system progol is applied to the problem of identifying potential pharmacophores for ACE inhibition.

...read moreread less

Abstract: This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the Inductive Logic Programming (ILP) system progol is applied to the problem of identifying potential pharmacophores for ACE inhibition. The case study reported in this paper supports four general lessons for machine learning and knowledge discovery, as well as more specific lessons for pharmacophore discovery, for Inductive Logic Programming, and for ACE inhibition. The general lessons for machine learning and knowledge discovery are as follows. 1. An initial rediscovery step is a useful tool when approaching a new application domain. 2. General machine learning heuristics may fail to match the details of an application domain, but it may be possible to successfully apply a heuristic-based algorithm in spite of the mismatch. 3. A complete search for all plausible hypotheses can provide useful information to a user, although experimentation may be required to choose between competing hypotheses. 4. A declarative knowledge representation facilitates the development and debugging of background knowledge in collaboration with a domain expert, as well as the communication of final results.

...read moreread less

138 citations

Book Chapter•DOI•

Biochemical Knowledge Discovery Using Inductive Logic Programming

[...]

Stephen Muggleton¹, Ashwin Srinivasan², Ross D. King³, Michael J.E. Sternberg•Institutions (3)

University of York¹, University of Oxford², Aberystwyth University³

14 Dec 1998

TL;DR: According to the combined accuracy/explanation criterion provided in this paper, on both subsets comparative trials show that Progol's structurally-oriented hypotheses are preferable to those of other machine learning algorithms.

...read moreread less

Abstract: Machine Learning algorithms are being increasingly used for knowledge discovery tasks. Approaches can be broadly divided by distinguishing discovery of procedural from that of declarative knowledge. Client requirements determine which of these is appropriate. This paper discusses an experimental application of machine learning in an area related to drug design. The bottleneck here is in finding appropriate constraints to reduce the large number of candidate molecules to be synthesised and tested. Such constraints can be viewed as declarative specifications of the structuralel ements necessary for high medicinal activity and low toxicity. The first-order representation used within Inductive Logic Programming (ILP) provides an appropriate description language for such constraints. Within this application area knowledge accreditation requires not only a demonstration of predictive accuracy but also, and crucially, a certification of novel insight into the structuralc hemistry. This paper describes an experiment in which the ILP system Progolw as used to obtain structural constraints associated with mutagenicity of molecules. In doing so Progol found a new indicator of mutagenicity within a subset of previously published data. This subset was already known not to be amenable to statistical regression, though its complement was adequately explained by a linear model. According to the combined accuracy/explanation criterion provided in this paper, on both subsets comparative trials show that Progol's structurally-oriented hypotheses are preferable to those of other machine learning algorithms.

...read moreread less

42 citations

Book Chapter•DOI•

Application of Inductive Logic Programming to Discover Rules Governing the Three-Dimensional Topology of Protein Structure

[...]

Marcel Turcotte, Stephen Muggleton¹, Michael J.E. Sternberg•Institutions (1)

University of York¹

22 Jul 1998

TL;DR: The application of ILP to fold recognition represents a novel and promising approach to this problem and makes use of a new feature of Progol4.4 for numeric parameter estimation.

...read moreread less

Abstract: Inductive Logic Programming (ILP) has been applied to discover rules governing the three-dimensional topology of protein structure. The data-set unifies two sources of information; SCOP and PROMOTIF. Cross-validation results for experiments using two background knowledge sets, global (attribute-valued) and constitutional (relational), are presented. The application makes use of a new feature of Progol4.4 for numeric parameter estimation. At this early stage of development, the rules produced can only be applied to proteins for which the secondary structure is known. However, since the rules are insightful, they should prove to be helpful in assisting the development of taxonomic schemes. The application of ILP to fold recognition represents a novel and promising approach to this problem.

...read moreread less

34 citations

Book Chapter•DOI•

Detecting Traffic Problems with ILP

[...]

Saso Dzeroski, Nico Jacobs¹, Martin Molina², Carlos Moure², Stephen Muggleton³, Wim Van Laer¹ - Show less +2 more•Institutions (3)

Katholieke Universiteit Leuven¹, Technical University of Madrid², University of York³

22 Jul 1998

TL;DR: Three state-of-the art ILP systems are applied to learn how to detect traffic problems and compare their performance to the performance of a propositional learning system on the same problem.

...read moreread less

Abstract: Expert systems for decision support have recently been suc- cessfully introduced in road transport management. These systems include knowledge on traffic problem detection and alleviation. The paper describes experiments in automated acquisition of knowledge on traffic problem detection. The task is to detect road sections where a problem has occured (critical sections) from sensor data. It is necessary to use inductive logic programming (ILP) for this purpose as relational back- ground knowledge on the road network is essential. In this paper, we apply three state-of-the art ILP systems to learn how to detect traffic problems and compare their performance to the performance of a propositional learning system on the same problem.

...read moreread less

29 citations

Book Chapter•DOI•

Repeat Learning Using Predicate Invention

[...]

Khalid Khan¹, Khalid Khan², Stephen Muggleton², Rupert Parson¹•Institutions (2)

University of Oxford¹, University of York²

22 Jul 1998

TL;DR: A new predicate invention mechanism implemented in Progol4.4 is used in repeat learning experiments within a chess domain and the results indicate that significant performance increases can be achieved.

...read moreread less

Abstract: Most of machine learning is concerned with learning a single concept from a sequence of examples. In repeat learning the teacher chooses a series of related concepts randomly and independently from a distribution D. A finite sequence of examples is provided for each concept in the series. The learner does not initially know D, but progressively updates a posterior estimation of D as the series progresses. This paper considers predicate invention within Inductive Logic Programming as a mechanism for updating the learner's estimation of D. A new predicate invention mechanism implemented in Progol4.4 is used in repeat learning experiments within a chess domain. The results indicate that significant performance increases can be achieved. The paper develops a Bayesian framework and demonstrates initial theoretical results for repeat learning.

...read moreread less

26 citations

Book Chapter•DOI•

Completing Inverse Entailment

[...]

Stephen Muggleton¹•Institutions (1)

University of York¹

22 Jul 1998

TL;DR: By enlarging the bottom set used within IE, it is possible to make a revised version of IE complete with respect to entailment for Horn theories.

...read moreread less

Abstract: Yamamoto has shown that the Inverse Entailment (IE) mechanism described previously by the author is complete for Plotkin's relative subsumption but incomplete for entailment. That is to say, an hypothesised clause H can be derived from an example E under a background theory B using IE if and only if H subsumes E relative to B in Plotkin's sense. Yamamoto gives examples of H for which B U H ⊨ E but H cannot be constructed using IE from B and E. The main result of the present paper is a theorem to show that by enlarging the bottom set used within IE, it is possible to make a revised version of IE complete with respect to entailment for Horn theories. Furthermore, it is shown for function-free definite clauses that given a bound k on the arity of predicates used in B and E, the cardinality of the enlarged bottom set is bounded above by the polynomial function p(c + 1)k, where p is the number of predicates in B, E and c is the number of constants in B ⊔ Ē.

...read moreread less

22 citations

Proceedings Article•

Inductive Logic Programming: Issues, Results and the LLL Challenge (abstract).

[...]

Stephen Muggleton

01 Jan 1998

TL;DR: This work has shown that ILP approaches to natural language problems extend with relati v ease to various languages other than English and the area of Learning Language in Logic (LLL) is producing a number of challenges to existing ILP theory and implementation.

...read moreread less

Abstract: Inductive Logic Programming (ILP) [9, 11] is the area of AI which deals with the induction of hypothesised predica te definitions from examples and background knowledge. Logic pro grams are used as a single representation for examples, backgroun d knowledge and hypotheses. ILP is differentiated from most other f orms of Machine Learning (ML) both by its use of an expressive repr esentation language and its ability to make use of logically e ncoded background knowledge. This has allowed successful applica tions of ILP [1] in areas such as molecular biology [12, 10, 6, 5] and na tural language [7, 3, 2] which both have rich sources of backgro und knowledge and both benefit from the use of an expressive conce pt representation languages. For instance, the ILP system Pro gol has recently been used to generate comprehensible description s of the 23 most populated fold classes of proteins [14], where no suc h descriptions had previously been formulated manually. In the natural language area ILP has not only been shown to have higher accur acies than various other ML approaches in learning the past te n e of English [8] but also shown to be capable of learning accurate grammars which translate sentences into deductive database que ri s [15]. In both cases, follow up studies [13, 4] have shown that these ILP approaches to natural language problems extend with relati v ease to various languages other than English. The area of Learning Language in Logic (LLL) is producing a number of challenges to existing ILP theory and implementat ions. In particular, language applications of ILP require revisi on and extension of a hierarchically defined set of predicates in whic h the examples are typically only provided for predicates at the top of the hierarchy. New predicates often need to be invented, and com plex recursion is usually involved. Similarly the term structure o f semantic objects is far more complex than in other applications of ILP . Advances in ILP theory and implementation related to the chall enges of LLL are already producing beneficial advances in other seq uenceoriented applications of ILP. In addition LLL is starting to develop its own character as a sub-discipline of AI involving the conflue nc of computational linguistics, machine learning and logic pro gramming.

...read moreread less

15 citations

Book Chapter•DOI•

A Comparison of ILP and Propositional Systems on Propositional Traffic Data

[...]

Sam G. B. Roberts¹, Wim Van Laer², Nico Jacobs², Stephen Muggleton³, Jeremy Broughton⁴ - Show less +1 more•Institutions (4)

University of Oxford¹, Katholieke Universiteit Leuven², University of York³, Transport Research Laboratory⁴

22 Jul 1998

TL;DR: The conclusion drawn from the experimental results is that on such a propositional dataset ILP algorithms perform competitively in terms of predictive accuracy with propositional systems, but are significantly outperformed interms of time taken for learning.

...read moreread less

Abstract: This paper presents an experimental comparison of two Inductive Logic Programming algorithms, PROGOL and TILDE, with C4.5, a propositional learning algorithm, on a propositional dataset of road traffic accidents. Rebalancing methods are described for handling the skewed distribution of positive and negative examples in this dataset, and the relative cost of errors of commission and omission in this domain. It is noted that before the use of these methods all algorithms perform worse than majority class. On rebalancing, all did significantly better. The conclusion drawn from the experimental results is that on such a propositional dataset ILP algorithms perform competitively in terms of predictive accuracy with propositional systems, but are significantly outperformed in terms of time taken for learning.

...read moreread less

11 citations

Book Chapter•DOI•

Knowledge Discovery in Biological and Chemical Domains

[...]

Stephen Muggleton¹•Institutions (1)

University of York¹

14 Dec 1998

TL;DR: Inductive Logic Programming (ILP) provides an approach to knowledge discovery techniques which generate logical formulae from data which are suitable for knowledge discovery within the pharmaceutical industry.

...read moreread less

Abstract: The pharmaceutical industry is increasingly overwhelmed by large-volume-data. This is generated both internally as a side-effect of screening tests and combinatorial chemistry, as well as externally from sources such as the human genome project. The industry is predominantly knowledge-driven. For instance, knowledge is required within computational chemistry for pharmacophore identification, as well as for determining biological function using sequence analysis. From a computer science point of view, the knowledge requirements within the industry give higher emphasis to “knowing that” (declarative or descriptive knowledge) rather than “knowing how” (procedural or prescriptive knowledge). Mathematical logic has always been the preferred representation for declarative knowledge and thus knowledge discovery techniques are required which generate logical formulae from data. Inductive Logic Programming (ILP) [6,1] provides such an approach

...read moreread less

2 citations

Proceedings Article•

Advances in ILP Theory and Implementations (Abstract)

[...]

Stephen Muggleton

22 Jul 1998

1 citations

Book Chapter•DOI•

Advances in ILP theory and implementations

[...]

Stephen Muggleton¹•Institutions (1)

University of York¹

22 Jul 1998

TL;DR: The development of Bayesian approaches to ILP supported the development of U-learnability, which allows classes of distributions over the hypotheses, and it was shown that for any exponential-decay distribution the class of time-bounded logic-programs is polynomially U- learnable.

...read moreread less

Abstract: A strong linkage exists between advances in applications, implementations and theory within Inductive Logic Programming (ILP) Early ILP systems, such as FOIL, Golem and LINUS learned single predicate definitions from positive and negative examples and extensional background knowledge They also employed strong learning biases such as ij-determinacy Although these systems found a number of applications, they had problems in areas such as molecular biology and natural language learning General mechanisms for inverting entailment have now been developed which support the use of non-ground background knowledge, and the revision of multiple inter-related predicates ILP theory results concerning complete refinement graph operators now allow efficient admissible searches The absolute requirement for negative examples (rare within natural language domains) has been eased by Bayesian analysis of learning from positive-only examples Bayesian approaches have also supported sample complexity analysis of predicate invention within the framework of repeat learning In this framework it is assumed that the learner's prior is not equivalent to the distribution from which the teacher is sampling targets By providing a series of sessions the learner is able to update the initial prior by adding and deleting background predicates Within the Bayesian framework stochastic logic program representations have been used to estimate the distribution of examples over the instance space Stochastic logic programs are a generalisation of hidden Markov models and stochastic grammars Apart from a few special cases PAC-learning results have been largely negative for ILP This is in large part due to the fact that testing satisfiability is intractable for most interesting subsets of first-order Horn logic The development of Bayesian approaches to ILP supported the development of U-learnability, which allows classes of distributions over the hypotheses Here it was shown that for any exponential-decay distribution the class of time-bounded logic-programs is polynomially U-learnable The use of such bounds on proof depth is common within ILP systems Although logically impure, this approach allows general-purpose flexible representations, while maintaining termination guarantees

...read moreread less

Book Chapter•DOI•

Recent Developments in Applying Machine Learning to Drug Design

[...]

Ross D. King¹, Michael J.E. Sternberg¹, Stephen Muggleton, Ashwin Srinivasan•Institutions (1)

Lincoln's Inn¹

01 Jan 1998

TL;DR: A new and general approach to forming Structure Activity Relationships (SARs) is described, based on representing chemical structure by atoms and their bond connectivities in combination with the Inductive Logic Programming (ILP) algorithm Progol.

...read moreread less

Abstract: A new and general approach to forming Structure Activity Relationships (SARs) is described. This is based on representing chemical structure by atoms and their bond connectivities in combination with the Inductive Logic Programming (ILP) algorithm Progol. Existing SAR methods describe chemical structure using attributes which are general properties of an object. It is not possible to map directly chemical structure to attribute-based descriptions, as such descriptions have no internal organisation. A more natural and general way to describe chemical structure is to use a relational description, where the internal construction of the description maps that of the object described. Our atom and bond connectivities representation is a relational description. ILP algorithms can form SARs with relational descriptions. We have tested the relational approach by investigating the SAR of 230 aromatic and heteroaromatic nitro compounds. These compounds had been split previously into two sub-sets, 188 compounds that were amenable to regression, and 42 that were not. For the 188 compounds, a SAR was found that was as accurate as the best statistical or neural network generated SARs. The Progol SAR has the advantages that it did not need the use of any indicator variables hand-crafted by an expert, and the generated rules were easily comprehensible. For the 42 compounds, Progol formed a SAR that was significantly (P < 0.025) more accurate than linear regression, quadratic regression, and back-propagation. This SAR is based on a new automatically generated structural alert for mutagenicity.

...read moreread less