Showing papers on "Chemical database published in 2000"
••
TL;DR: The electrotopological state (E-state) is presented as a representation of molecular structure useful for definition of a space for chemical structures that provides the basis for chemical database management.
Abstract: The electrotopological state (E-state) is presented as a representation of molecular structure useful for definition of a space for chemical structures. This E-state representation provides the basis for chemical database management. The E-state formalism is presented along with its extension to the atom-type E-state. An approach to database organization, using polychlorobiphenyls (PCBs) as examples, reveals the descriptive power of the E-state paradigm. A well-organized chemical database, as described here, may be searched to find structures similar to a target structure with the expectation that such structures may exhibit properties similar to the target. Searches using the atom-type E-state indices are demonstrated with two example drug molecules.
78 citations
••
TL;DR: This article reviews measures for evaluating the effectiveness of similarity searches in chemical databases and concludes that the cumulative recall and G-H score measures are the most useful of those tested.
Abstract: This article reviews measures for evaluating the effectiveness of similarity searches in chemical databases, drawing principally upon the many measures that have been described previously for evaluating the performance of text search engines The use of the various measures is exemplified by fragment-based 2D similarity searches on several databases for which both structural and bioactivity data are available It is concluded that the cumulative recall and G-H score measures are the most useful of those tested
65 citations
••
TL;DR: Ivanciuc and Balaban as mentioned in this paper defined the most important molecular graph matrices, polynomials, and spectra, and some examples are presented, and a large variety of mathematical operations are applied to molecular graphs giving atomic and molecular structural descriptions.
Abstract: O. Ivanciuc and A.T. Balaban
University “Politehnica” of Bucharest, Department of Organic Chemistry,Faculty of Industrial Chemistry, Oficiul 12 CP 243,
78100 Bucharest, RomaniaUsually, the chemical structure of organic compounds is represented by molecular
graphs. Graph algorithms are currently used in isomer generation, coding of
chemical compounds and reactions, chemical database search, similarity and
diversity assessment. An important graph theory application is the numerical
characterization of chemical structures with graph invariants, that can be
polynomials, spectra, atomic or molecular topological indices. After a brief
presentation of graph theory and chemical graphs, the molecular graphs are
defined. The most important molecular graph matrices, polynomials, and spectra
are introduced, and some examples are presented. A large variety of mathematical
operations are applied to molecular graphs giving atomic and molecular structural
descriptors; the most important topological indices are reviewed, and their
applications in structure-property models are briefly presented.
53 citations
••
TL;DR: The procedures applied for MMDDI and IMDDI calculations allow one to automatically compile lists of compounds, which can simplify molecular diversity analyses and database searching.
Abstract: A new mutual molecular dataset diversity index (MMDDI), individual molecular dataset diversity index (IMDDI), and volume ratio (VR) are proposed to assess molecular dataset diversity. MMDDI and IMDDI can serve as valuable instruments for selecting monomer pools for combinatorial synthesis and in decision making about acquiring new databases. MMDDI can also be used as one of the criteria to estimate the quality of quantitative structure−activity relationship (QSAR) models aimed at the prediction of biological activities. The indices can be calculated directly from molecular descriptor values. The procedures applied for MMDDI and IMDDI calculations allow one to automatically compile lists of compounds, which can simplify molecular diversity analyses and database searching. The information can also be used for forming training and test sets in QSAR analysis.
35 citations
••
TL;DR: A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications and it is shown that the SVD/TNPACK duo is efficient for minimizing the distance error objective function.
Abstract: A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications. The projection mapping of the compound database (described as vectors in the high-dimensional space of chemical descriptors) is based on the singular value decomposition (SVD) combined with a minimization procedure implemented with the efficient truncated-Newton program package (TNPACK). Numerical experiments on four chemical datasets with real-valued descriptors (ranging from 58 to 27 255 compounds) show that the SVD/TNPACK projection duo achieves a reasonable accuracy in 2D, varying from 30% to about 100% of pairwise distance segments that lie within 10% of the original distances. The lowest percentages, corresponding to scaled datasets, can be made close to 100% with projections onto a 10-dimensional space. We also show that the SVD/TNPACK duo is efficient for minimizing the distance error objective function (especially for scaled datasets), and that TNPACK is much more efficient than a current popular approach of steepest descent minimization in this application context. Applications of our projection technique to similarity and diversity sampling in drug design can be envisioned.
23 citations
••
TL;DR: Kohonen neural networks, also known as Self Organizing Map (SOM), offer a useful 2D representation of the compound distribution inside a large chemical database and fuzzy techniques based on the "concept of partial truth" reveal to be also a valuable tool for the direct exploitation of chemical databases or SOM.
Abstract: Kohonen neural networks, also known as Self Organizing Map (SOM), offer a useful 2D representation of the compound distribution inside a large chemical database. This distribution results from the compound organization in a molecular diversity hyperspace derived from a large set of molecular descriptors. Fuzzy techniques based on the "concept of partial truth" reveal to be also a valuable tool for the direct exploitation of chemical databases or SOM. In such cases a fuzzy clustering algorithm is used. In this paper, a complete hybrid system, combining SOM and fuzzy clustering, is applied. As example, a series of olfactory compounds was selected. The complexity of such information is that a same compound may exhibit different odors. It is shown how fuzzy logic helps to have a better understanding of the organization of the compounds. These hybrid systems, using simultaneously SOM and fuzzy clustering, are foreseen as powerful tools for "virtual pre-screening".
15 citations
••
TL;DR: In this paper, a new mutual molecular dataset diversity index (MMDDI) and volume ratio (VR) are proposed to assess the diversity of molecular dataset data, which can serve as valuable instruments for selecting monomer pools for combinatorial synthesis and in decision making about acquiring new databases.
Abstract: A new mutual molecular dataset diversity index (MMDDI), individual molecular dataset diversity index (IMDDI), and volume ratio (VR) are proposed to assess molecular dataset diversity. MMDDI and IMDDI can serve as valuable instruments for selecting monomer pools for combinatorial synthesis and in decision making about acquiring new databases. MMDDI can also be used as one of the criteria to estimate the quality of quantitative structure−activity relationship (QSAR) models aimed at the prediction of biological activities. The indices can be calculated directly from molecular descriptor values. The procedures applied for MMDDI and IMDDI calculations allow one to automatically compile lists of compounds, which can simplify molecular diversity analyses and database searching. The information can also be used for forming training and test sets in QSAR analysis.
5 citations
••
TL;DR: The electrotopological state (E-state) as mentioned in this paper is a representation of molecular structure useful for definition of a space for chemical structures, which provides the basis for chemical database management.
Abstract: The electrotopological state (E-state) is presented as a representation of molecular structure useful for definition of a space for chemical structures. This E-state representation provides the basis for chemical database management. The E-state formalism is presented along with its extension to the atom-type E-state. An approach to database organization, using polychlorobiphenyls (PCBs) as examples, reveals the descriptive power of the E-state paradigm. A well-organized chemical database, as described here, may be searched to find structures similar to a target structure with the expectation that such structures may exhibit properties similar to the target. Searches using the atom-type E-state indices are demonstrated with two example drug molecules.
2 citations
••
TL;DR: For the conversion of nonstructural chemical databases to structure databases, a series of algorithms to find the closest match between existing names to names in a reference database are described.
Abstract: For the conversion of nonstructural chemical databases to structure databases, a series of algorithms to find the closest match between existing names to names in a reference database are described...
1 citations