scispace - formally typeset
Search or ask a question

Showing papers by "Stephen Muggleton published in 1998"


Journal ArticleDOI
TL;DR: This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design, and the Inductive Logic Programming (ILP) system progol is applied to the problem of identifying potential pharmacophores for ACE inhibition.
Abstract: This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the Inductive Logic Programming (ILP) system progol is applied to the problem of identifying potential pharmacophores for ACE inhibition. The case study reported in this paper supports four general lessons for machine learning and knowledge discovery, as well as more specific lessons for pharmacophore discovery, for Inductive Logic Programming, and for ACE inhibition. The general lessons for machine learning and knowledge discovery are as follows. 1. An initial rediscovery step is a useful tool when approaching a new application domain. 2. General machine learning heuristics may fail to match the details of an application domain, but it may be possible to successfully apply a heuristic-based algorithm in spite of the mismatch. 3. A complete search for all plausible hypotheses can provide useful information to a user, although experimentation may be required to choose between competing hypotheses. 4. A declarative knowledge representation facilitates the development and debugging of background knowledge in collaboration with a domain expert, as well as the communication of final results.

138 citations


Book ChapterDOI
14 Dec 1998
TL;DR: According to the combined accuracy/explanation criterion provided in this paper, on both subsets comparative trials show that Progol's structurally-oriented hypotheses are preferable to those of other machine learning algorithms.
Abstract: Machine Learning algorithms are being increasingly used for knowledge discovery tasks. Approaches can be broadly divided by distinguishing discovery of procedural from that of declarative knowledge. Client requirements determine which of these is appropriate. This paper discusses an experimental application of machine learning in an area related to drug design. The bottleneck here is in finding appropriate constraints to reduce the large number of candidate molecules to be synthesised and tested. Such constraints can be viewed as declarative specifications of the structuralel ements necessary for high medicinal activity and low toxicity. The first-order representation used within Inductive Logic Programming (ILP) provides an appropriate description language for such constraints. Within this application area knowledge accreditation requires not only a demonstration of predictive accuracy but also, and crucially, a certification of novel insight into the structuralc hemistry. This paper describes an experiment in which the ILP system Progolw as used to obtain structural constraints associated with mutagenicity of molecules. In doing so Progol found a new indicator of mutagenicity within a subset of previously published data. This subset was already known not to be amenable to statistical regression, though its complement was adequately explained by a linear model. According to the combined accuracy/explanation criterion provided in this paper, on both subsets comparative trials show that Progol's structurally-oriented hypotheses are preferable to those of other machine learning algorithms.

42 citations


Book ChapterDOI
22 Jul 1998
TL;DR: The application of ILP to fold recognition represents a novel and promising approach to this problem and makes use of a new feature of Progol4.4 for numeric parameter estimation.
Abstract: Inductive Logic Programming (ILP) has been applied to discover rules governing the three-dimensional topology of protein structure. The data-set unifies two sources of information; SCOP and PROMOTIF. Cross-validation results for experiments using two background knowledge sets, global (attribute-valued) and constitutional (relational), are presented. The application makes use of a new feature of Progol4.4 for numeric parameter estimation. At this early stage of development, the rules produced can only be applied to proteins for which the secondary structure is known. However, since the rules are insightful, they should prove to be helpful in assisting the development of taxonomic schemes. The application of ILP to fold recognition represents a novel and promising approach to this problem.

34 citations


Book ChapterDOI
22 Jul 1998
TL;DR: Three state-of-the art ILP systems are applied to learn how to detect traffic problems and compare their performance to the performance of a propositional learning system on the same problem.
Abstract: Expert systems for decision support have recently been suc- cessfully introduced in road transport management. These systems include knowledge on traffic problem detection and alleviation. The paper describes experiments in automated acquisition of knowledge on traffic problem detection. The task is to detect road sections where a problem has occured (critical sections) from sensor data. It is necessary to use inductive logic programming (ILP) for this purpose as relational back- ground knowledge on the road network is essential. In this paper, we apply three state-of-the art ILP systems to learn how to detect traffic problems and compare their performance to the performance of a propositional learning system on the same problem.

29 citations


Book ChapterDOI
22 Jul 1998
TL;DR: A new predicate invention mechanism implemented in Progol4.4 is used in repeat learning experiments within a chess domain and the results indicate that significant performance increases can be achieved.
Abstract: Most of machine learning is concerned with learning a single concept from a sequence of examples. In repeat learning the teacher chooses a series of related concepts randomly and independently from a distribution D. A finite sequence of examples is provided for each concept in the series. The learner does not initially know D, but progressively updates a posterior estimation of D as the series progresses. This paper considers predicate invention within Inductive Logic Programming as a mechanism for updating the learner's estimation of D. A new predicate invention mechanism implemented in Progol4.4 is used in repeat learning experiments within a chess domain. The results indicate that significant performance increases can be achieved. The paper develops a Bayesian framework and demonstrates initial theoretical results for repeat learning.

26 citations


Book ChapterDOI
22 Jul 1998
TL;DR: By enlarging the bottom set used within IE, it is possible to make a revised version of IE complete with respect to entailment for Horn theories.
Abstract: Yamamoto has shown that the Inverse Entailment (IE) mechanism described previously by the author is complete for Plotkin's relative subsumption but incomplete for entailment. That is to say, an hypothesised clause H can be derived from an example E under a background theory B using IE if and only if H subsumes E relative to B in Plotkin's sense. Yamamoto gives examples of H for which B U H ⊨ E but H cannot be constructed using IE from B and E. The main result of the present paper is a theorem to show that by enlarging the bottom set used within IE, it is possible to make a revised version of IE complete with respect to entailment for Horn theories. Furthermore, it is shown for function-free definite clauses that given a bound k on the arity of predicates used in B and E, the cardinality of the enlarged bottom set is bounded above by the polynomial function p(c + 1)k, where p is the number of predicates in B, E and c is the number of constants in B ⊔ Ē.

22 citations


Proceedings Article
01 Jan 1998
TL;DR: This work has shown that ILP approaches to natural language problems extend with relati v ease to various languages other than English and the area of Learning Language in Logic (LLL) is producing a number of challenges to existing ILP theory and implementation.
Abstract: Inductive Logic Programming (ILP) [9, 11] is the area of AI which deals with the induction of hypothesised predica te definitions from examples and background knowledge. Logic pro grams are used as a single representation for examples, backgroun d knowledge and hypotheses. ILP is differentiated from most other f orms of Machine Learning (ML) both by its use of an expressive repr esentation language and its ability to make use of logically e ncoded background knowledge. This has allowed successful applica tions of ILP [1] in areas such as molecular biology [12, 10, 6, 5] and na tural language [7, 3, 2] which both have rich sources of backgro und knowledge and both benefit from the use of an expressive conce pt representation languages. For instance, the ILP system Pro gol has recently been used to generate comprehensible description s of the 23 most populated fold classes of proteins [14], where no suc h descriptions had previously been formulated manually. In the natural language area ILP has not only been shown to have higher accur acies than various other ML approaches in learning the past te n e of English [8] but also shown to be capable of learning accurate grammars which translate sentences into deductive database que ri s [15]. In both cases, follow up studies [13, 4] have shown that these ILP approaches to natural language problems extend with relati v ease to various languages other than English. The area of Learning Language in Logic (LLL) is producing a number of challenges to existing ILP theory and implementat ions. In particular, language applications of ILP require revisi on and extension of a hierarchically defined set of predicates in whic h the examples are typically only provided for predicates at the top of the hierarchy. New predicates often need to be invented, and com plex recursion is usually involved. Similarly the term structure o f semantic objects is far more complex than in other applications of ILP . Advances in ILP theory and implementation related to the chall enges of LLL are already producing beneficial advances in other seq uenceoriented applications of ILP. In addition LLL is starting to develop its own character as a sub-discipline of AI involving the conflue nc of computational linguistics, machine learning and logic pro gramming.

15 citations


Book ChapterDOI
22 Jul 1998
TL;DR: The conclusion drawn from the experimental results is that on such a propositional dataset ILP algorithms perform competitively in terms of predictive accuracy with propositional systems, but are significantly outperformed interms of time taken for learning.
Abstract: This paper presents an experimental comparison of two Inductive Logic Programming algorithms, PROGOL and TILDE, with C4.5, a propositional learning algorithm, on a propositional dataset of road traffic accidents. Rebalancing methods are described for handling the skewed distribution of positive and negative examples in this dataset, and the relative cost of errors of commission and omission in this domain. It is noted that before the use of these methods all algorithms perform worse than majority class. On rebalancing, all did significantly better. The conclusion drawn from the experimental results is that on such a propositional dataset ILP algorithms perform competitively in terms of predictive accuracy with propositional systems, but are significantly outperformed in terms of time taken for learning.

11 citations


Book ChapterDOI
14 Dec 1998
TL;DR: Inductive Logic Programming (ILP) provides an approach to knowledge discovery techniques which generate logical formulae from data which are suitable for knowledge discovery within the pharmaceutical industry.
Abstract: The pharmaceutical industry is increasingly overwhelmed by large-volume-data. This is generated both internally as a side-effect of screening tests and combinatorial chemistry, as well as externally from sources such as the human genome project. The industry is predominantly knowledge-driven. For instance, knowledge is required within computational chemistry for pharmacophore identification, as well as for determining biological function using sequence analysis. From a computer science point of view, the knowledge requirements within the industry give higher emphasis to “knowing that” (declarative or descriptive knowledge) rather than “knowing how” (procedural or prescriptive knowledge). Mathematical logic has always been the preferred representation for declarative knowledge and thus knowledge discovery techniques are required which generate logical formulae from data. Inductive Logic Programming (ILP) [6,1] provides such an approach

2 citations


Proceedings Article
22 Jul 1998

1 citations


Book ChapterDOI
22 Jul 1998
TL;DR: The development of Bayesian approaches to ILP supported the development of U-learnability, which allows classes of distributions over the hypotheses, and it was shown that for any exponential-decay distribution the class of time-bounded logic-programs is polynomially U- learnable.
Abstract: A strong linkage exists between advances in applications, implementations and theory within Inductive Logic Programming (ILP) Early ILP systems, such as FOIL, Golem and LINUS learned single predicate definitions from positive and negative examples and extensional background knowledge They also employed strong learning biases such as ij-determinacy Although these systems found a number of applications, they had problems in areas such as molecular biology and natural language learning General mechanisms for inverting entailment have now been developed which support the use of non-ground background knowledge, and the revision of multiple inter-related predicates ILP theory results concerning complete refinement graph operators now allow efficient admissible searches The absolute requirement for negative examples (rare within natural language domains) has been eased by Bayesian analysis of learning from positive-only examples Bayesian approaches have also supported sample complexity analysis of predicate invention within the framework of repeat learning In this framework it is assumed that the learner's prior is not equivalent to the distribution from which the teacher is sampling targets By providing a series of sessions the learner is able to update the initial prior by adding and deleting background predicates Within the Bayesian framework stochastic logic program representations have been used to estimate the distribution of examples over the instance space Stochastic logic programs are a generalisation of hidden Markov models and stochastic grammars Apart from a few special cases PAC-learning results have been largely negative for ILP This is in large part due to the fact that testing satisfiability is intractable for most interesting subsets of first-order Horn logic The development of Bayesian approaches to ILP supported the development of U-learnability, which allows classes of distributions over the hypotheses Here it was shown that for any exponential-decay distribution the class of time-bounded logic-programs is polynomially U-learnable The use of such bounds on proof depth is common within ILP systems Although logically impure, this approach allows general-purpose flexible representations, while maintaining termination guarantees

Book ChapterDOI
01 Jan 1998
TL;DR: A new and general approach to forming Structure Activity Relationships (SARs) is described, based on representing chemical structure by atoms and their bond connectivities in combination with the Inductive Logic Programming (ILP) algorithm Progol.
Abstract: A new and general approach to forming Structure Activity Relationships (SARs) is described. This is based on representing chemical structure by atoms and their bond connectivities in combination with the Inductive Logic Programming (ILP) algorithm Progol. Existing SAR methods describe chemical structure using attributes which are general properties of an object. It is not possible to map directly chemical structure to attribute-based descriptions, as such descriptions have no internal organisation. A more natural and general way to describe chemical structure is to use a relational description, where the internal construction of the description maps that of the object described. Our atom and bond connectivities representation is a relational description. ILP algorithms can form SARs with relational descriptions. We have tested the relational approach by investigating the SAR of 230 aromatic and heteroaromatic nitro compounds. These compounds had been split previously into two sub-sets, 188 compounds that were amenable to regression, and 42 that were not. For the 188 compounds, a SAR was found that was as accurate as the best statistical or neural network generated SARs. The Progol SAR has the advantages that it did not need the use of any indicator variables hand-crafted by an expert, and the generated rules were easily comprehensible. For the 42 compounds, Progol formed a SAR that was significantly (P < 0.025) more accurate than linear regression, quadratic regression, and back-propagation. This SAR is based on a new automatically generated structural alert for mutagenicity.