scispace - formally typeset
Open AccessProceedings Article

Finding frequent substructures in chemical compounds

TLDR
In this paper, a knowledge discovery method for structured data is presented, where patterns reflect the one-tomany and many-to-many relationships of several tables, and background knowledge, represented in a uniform manner in some of the tables, has an essential role here, unlike in most data mining settings for the discovery of frequent patterns.
Abstract
The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research challenge for artificial intelligence. Our approach to the problem is descriptive rather than based on classification; the goal being to find common substructures and properties in chemical compounds, and in this way to contribute to scientific insight. This approach contrasts with previous machine learning research on this problem, which has mainly concentrated on predicting the toxicity of unknown chemicals. Our contribution to the field of data mining is the ability to discover useful frequent patterns that are beyond the complexity of association rules or their known variants. This is vital to the problem, which requires the discovery of patterns that are out of the reach of simple transformations to frequent itemsets. We present a knowledge discovery method for structured data, where patterns reflect the one-to-many and many-to-many relationships of several tables. Background knowledge, represented in a uniform manner in some of the tables, has an essential role here, unlike in most data mining settings for the discovery of frequent patterns.

read more

Content maybe subject to copyright    Report

Citations
More filters

Data Mining: Concepts and Techniques (2nd edition)

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Book ChapterDOI

An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

TL;DR: A novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set through the extended algorithm of the basket analysis is proposed.
Proceedings ArticleDOI

Efficiently mining frequent trees in a forest

TL;DR: This work presents TREEMinER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list, and finds that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties.
Book

Relational Data Mining

TL;DR: This coherently written multi-author monograph provides a thorough introduction and systematic overview of the area and will become a valuable source of reference for R&D professionals active in relational data mining.
Proceedings ArticleDOI

A quickstart in frequent structure mining can make a difference

TL;DR: The GrAph/Sequence/Tree extractiON (Gaston) algorithm is introduced that implements the "quickstart principle", which is based on the fact that these classes of structures are contained in each other, thus allowing for the development of structure mining algorithms that split the search into steps of increasing complexity.
References
More filters
Proceedings ArticleDOI

Mining association rules between sets of items in large databases

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Proceedings ArticleDOI

Mining sequential patterns

TL;DR: Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.
Book

Principles of database and knowledge-base systems

TL;DR: This book goes into the details of database conception and use, it tells you everything on relational databases from theory to the actual used algorithms.
Journal ArticleDOI

Discovery of Frequent Episodes in Event Sequences

TL;DR: This work gives efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and presents detailed experimental results that are in use in telecommunication alarm management.