scispace - formally typeset
Search or ask a question

Showing papers on "Chemical database published in 2000"


Journal ArticleDOI
TL;DR: The electrotopological state (E-state) is presented as a representation of molecular structure useful for definition of a space for chemical structures that provides the basis for chemical database management.
Abstract: The electrotopological state (E-state) is presented as a representation of molecular structure useful for definition of a space for chemical structures. This E-state representation provides the basis for chemical database management. The E-state formalism is presented along with its extension to the atom-type E-state. An approach to database organization, using polychlorobiphenyls (PCBs) as examples, reveals the descriptive power of the E-state paradigm. A well-organized chemical database, as described here, may be searched to find structures similar to a target structure with the expectation that such structures may exhibit properties similar to the target. Searches using the atom-type E-state indices are demonstrated with two example drug molecules.

78 citations


Journal ArticleDOI
TL;DR: This article reviews measures for evaluating the effectiveness of similarity searches in chemical databases and concludes that the cumulative recall and G-H score measures are the most useful of those tested.
Abstract: This article reviews measures for evaluating the effectiveness of similarity searches in chemical databases, drawing principally upon the many measures that have been described previously for evaluating the performance of text search engines The use of the various measures is exemplified by fragment-based 2D similarity searches on several databases for which both structural and bioactivity data are available It is concluded that the cumulative recall and G-H score measures are the most useful of those tested

65 citations


Book ChapterDOI
TL;DR: Ivanciuc and Balaban as mentioned in this paper defined the most important molecular graph matrices, polynomials, and spectra, and some examples are presented, and a large variety of mathematical operations are applied to molecular graphs giving atomic and molecular structural descriptions.
Abstract: O. Ivanciuc and A.T. Balaban University “Politehnica” of Bucharest, Department of Organic Chemistry,Faculty of Industrial Chemistry, Oficiul 12 CP 243, 78100 Bucharest, RomaniaUsually, the chemical structure of organic compounds is represented by molecular graphs. Graph algorithms are currently used in isomer generation, coding of chemical compounds and reactions, chemical database search, similarity and diversity assessment. An important graph theory application is the numerical characterization of chemical structures with graph invariants, that can be polynomials, spectra, atomic or molecular topological indices. After a brief presentation of graph theory and chemical graphs, the molecular graphs are defined. The most important molecular graph matrices, polynomials, and spectra are introduced, and some examples are presented. A large variety of mathematical operations are applied to molecular graphs giving atomic and molecular structural descriptors; the most important topological indices are reviewed, and their applications in structure-property models are briefly presented.

53 citations


Journal ArticleDOI
TL;DR: The procedures applied for MMDDI and IMDDI calculations allow one to automatically compile lists of compounds, which can simplify molecular diversity analyses and database searching.
Abstract: A new mutual molecular dataset diversity index (MMDDI), individual molecular dataset diversity index (IMDDI), and volume ratio (VR) are proposed to assess molecular dataset diversity. MMDDI and IMDDI can serve as valuable instruments for selecting monomer pools for combinatorial synthesis and in decision making about acquiring new databases. MMDDI can also be used as one of the criteria to estimate the quality of quantitative structure−activity relationship (QSAR) models aimed at the prediction of biological activities. The indices can be calculated directly from molecular descriptor values. The procedures applied for MMDDI and IMDDI calculations allow one to automatically compile lists of compounds, which can simplify molecular diversity analyses and database searching. The information can also be used for forming training and test sets in QSAR analysis.

35 citations


Journal ArticleDOI
TL;DR: A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications and it is shown that the SVD/TNPACK duo is efficient for minimizing the distance error objective function.
Abstract: A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications. The projection mapping of the compound database (described as vectors in the high-dimensional space of chemical descriptors) is based on the singular value decomposition (SVD) combined with a minimization procedure implemented with the efficient truncated-Newton program package (TNPACK). Numerical experiments on four chemical datasets with real-valued descriptors (ranging from 58 to 27 255 compounds) show that the SVD/TNPACK projection duo achieves a reasonable accuracy in 2D, varying from 30% to about 100% of pairwise distance segments that lie within 10% of the original distances. The lowest percentages, corresponding to scaled datasets, can be made close to 100% with projections onto a 10-dimensional space. We also show that the SVD/TNPACK duo is efficient for minimizing the distance error objective function (especially for scaled datasets), and that TNPACK is much more efficient than a current popular approach of steepest descent minimization in this application context. Applications of our projection technique to similarity and diversity sampling in drug design can be envisioned.

23 citations



Journal ArticleDOI
TL;DR: Kohonen neural networks, also known as Self Organizing Map (SOM), offer a useful 2D representation of the compound distribution inside a large chemical database and fuzzy techniques based on the "concept of partial truth" reveal to be also a valuable tool for the direct exploitation of chemical databases or SOM.
Abstract: Kohonen neural networks, also known as Self Organizing Map (SOM), offer a useful 2D representation of the compound distribution inside a large chemical database. This distribution results from the compound organization in a molecular diversity hyperspace derived from a large set of molecular descriptors. Fuzzy techniques based on the "concept of partial truth" reveal to be also a valuable tool for the direct exploitation of chemical databases or SOM. In such cases a fuzzy clustering algorithm is used. In this paper, a complete hybrid system, combining SOM and fuzzy clustering, is applied. As example, a series of olfactory compounds was selected. The complexity of such information is that a same compound may exhibit different odors. It is shown how fuzzy logic helps to have a better understanding of the organization of the compounds. These hybrid systems, using simultaneously SOM and fuzzy clustering, are foreseen as powerful tools for "virtual pre-screening".

15 citations


Journal ArticleDOI
TL;DR: In this paper, a new mutual molecular dataset diversity index (MMDDI) and volume ratio (VR) are proposed to assess the diversity of molecular dataset data, which can serve as valuable instruments for selecting monomer pools for combinatorial synthesis and in decision making about acquiring new databases.
Abstract: A new mutual molecular dataset diversity index (MMDDI), individual molecular dataset diversity index (IMDDI), and volume ratio (VR) are proposed to assess molecular dataset diversity. MMDDI and IMDDI can serve as valuable instruments for selecting monomer pools for combinatorial synthesis and in decision making about acquiring new databases. MMDDI can also be used as one of the criteria to estimate the quality of quantitative structure−activity relationship (QSAR) models aimed at the prediction of biological activities. The indices can be calculated directly from molecular descriptor values. The procedures applied for MMDDI and IMDDI calculations allow one to automatically compile lists of compounds, which can simplify molecular diversity analyses and database searching. The information can also be used for forming training and test sets in QSAR analysis.

5 citations


Journal ArticleDOI
TL;DR: The electrotopological state (E-state) as mentioned in this paper is a representation of molecular structure useful for definition of a space for chemical structures, which provides the basis for chemical database management.
Abstract: The electrotopological state (E-state) is presented as a representation of molecular structure useful for definition of a space for chemical structures. This E-state representation provides the basis for chemical database management. The E-state formalism is presented along with its extension to the atom-type E-state. An approach to database organization, using polychlorobiphenyls (PCBs) as examples, reveals the descriptive power of the E-state paradigm. A well-organized chemical database, as described here, may be searched to find structures similar to a target structure with the expectation that such structures may exhibit properties similar to the target. Searches using the atom-type E-state indices are demonstrated with two example drug molecules.

2 citations


Journal ArticleDOI
TL;DR: For the conversion of nonstructural chemical databases to structure databases, a series of algorithms to find the closest match between existing names to names in a reference database are described.
Abstract: For the conversion of nonstructural chemical databases to structure databases, a series of algorithms to find the closest match between existing names to names in a reference database are described...

1 citations