Finding frequent substructures in chemical compounds

Open AccessProceedings Article

Finding frequent substructures in chemical compounds

- pp 30-36

TLDR

In this paper, a knowledge discovery method for structured data is presented, where patterns reflect the one-tomany and many-to-many relationships of several tables, and background knowledge, represented in a uniform manner in some of the tables, has an essential role here, unlike in most data mining settings for the discovery of frequent patterns.

Abstract:

The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research challenge for artificial intelligence. Our approach to the problem is descriptive rather than based on classification; the goal being to find common substructures and properties in chemical compounds, and in this way to contribute to scientific insight. This approach contrasts with previous machine learning research on this problem, which has mainly concentrated on predicting the toxicity of unknown chemicals. Our contribution to the field of data mining is the ability to discover useful frequent patterns that are beyond the complexity of association rules or their known variants. This is vital to the problem, which requires the discovery of patterns that are out of the reach of simple transformations to frequent itemsets. We present a knowledge discovery method for structured data, where patterns reflect the one-to-many and many-to-many relationships of several tables. Background knowledge, represented in a uniform manner in some of the tables, has an essential role here, unlike in most data mining settings for the discovery of frequent patterns.

Finding frequent substructures in chemical compounds

Citations

Data Mining: Concepts and Techniques (2nd edition)

An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

Efficiently mining frequent trees in a forest

Relational Data Mining

A quickstart in frequent structure mining can make a difference

References

Mining association rules between sets of items in large databases

Mining sequential patterns

Fast discovery of association rules

Principles of database and knowledge-base systems

Discovery of Frequent Episodes in Event Sequences

Related Papers (5)

Frequent subgraph discovery

An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

gSpan: graph-based substructure pattern mining

Fast Algorithms for Mining Association Rules in Large Databases

Efficient mining of frequent subgraphs in the presence of isomorphism