scispace - formally typeset
Search or ask a question

Showing papers in "Sar and Qsar in Environmental Research in 2000"


Journal ArticleDOI
TL;DR: Several QSPR and QSAR applications are reviewed including the study of physical properties of organic compounds, diamagnetic susceptibilities, and biological properties and the applications of the TOSS-MODE approach to discrimination of active/inactive compounds.
Abstract: A recently introduced graph-theoretical approach to the study of structure-property-activity relationships is presented The theoretical approach and the computational strategy for the use of the TOSS-MODE approach are given with details Several QSPR and QSAR applications are reviewed including the study of physical properties of organic compounds, diamagnetic susceptibilities, and biological properties The applications of the TOSS-MODE approach to discrimination of active/inactive compounds, the virtual screening of compounds with a desired property from databases of chemical structures, identification of active/inactive fragments and its relationships with 2D/3D pharmacophores, and to the design of novel compounds with desired biological activities are also reviewed

66 citations


Journal ArticleDOI
TL;DR: Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR) and advantages of PLS compared to MLR are illustrated with typical applications.
Abstract: Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.

45 citations


Journal ArticleDOI
TL;DR: QSARs based upon the logarithm of the octanol-water partition coefficient, logP, and energy of the lowest unoccupied molecular orbital, ELUMO were developed to model the toxicity of aliphatic compounds to the marine bacterium Vibrio fischeri and suggest different electrophilic mechanisms occur between classes, as well as within the diones and haloesters.
Abstract: QSARs based upon the logarithm of the octanol-water partition coefficient, logP, and energy of the lowest unoccupied molecular orbital, ELUMO were developed to model the toxicity of aliphatic compounds to the marine bacterium Vibrio fischeri. Statistically robust, hydrophobic-dependent QSARs were found for chloroalcohols and haloacetonitriles. Modelling of the toxicity of the haloesters and the diones required the use of terms to describe both hydrophobicity and electrophilicity. The differences in intercepts, slopes, and fit of these models suggest different electrophilic mechanisms occur between classes, as well as within the diones and haloesters. In order to model globally the toxicity of aliphatic compounds to V. fischeri, all the data determined in this study were combined with those determined previously for alkanones, alkanals, and alkenals. A highly predictive two-parameter QSAR [pT15 = 0.760(log P) −0.625(E LUMO) −0.466; n = 63, s = 0.462, r 2 = 0.846, F = 171, Pr > F = 0.0001] was deve...

41 citations


Journal ArticleDOI
TL;DR: A three-layer feedforward neural network trained by the back-propagation algorithm was used as statistical engine for deriving a powerful QSAR model accounting for the weight of the fish, time of exposure, temperature, pH, and hardness.
Abstract: A Quantitative Structure-Activity Relationship (QSAR) model was derived for estimating the acute toxicity of pesticides against Oncorhynchus mykiss under varying experimental conditions. Chemicals were described by means of autocorrelation descriptors encoding lipophilicity (H0 to H5) and the H-bonding acceptor ability (HBA0) and H-bonding donor ability (HBD0) of the pesticides. A three-layer feedforward neural network trained by the back-propagation algorithm was used as statistical engine for deriving a powerful QSAR model accounting for the weight of the fish, time of exposure, temperature, pH, and hardness.

34 citations


Journal ArticleDOI
TL;DR: A self-organising multilayered iterative algorithm that provides linear and non-linear polynomial regression models thus allowing the user to control the number and the power of the terms in the models.
Abstract: This article presents a self-organising multilayered iterative algorithm that provides linear and non-linear polynomial regression models thus allowing the user to control the number and the power of the terms in the models. The accuracy of the algorithm is compared to the partial least squares (PLS) algorithm using fourteen data sets in quantitative-structure activity relationship studies. The calculated data show that the proposed method is able to select simple models characterized by a high prediction ability and thus provides a considerable interest in quantitative-structure activity relationship studies. The software is developed using client-server protocol (Java and C++ languages) and is available for world-wide users on the Web site of the authors.

31 citations


Journal ArticleDOI
TL;DR: Construction of optimal molecular descriptors to be used for multiple regression analysis of several properties of alcohols are considered and optimal variable weight is found.
Abstract: We consider construction of optimal molecular descriptors to be used for multiple regression analysis of several properties of alcohols. The descriptors are obtained by considering shorter paths with variable weight x for carbon-oxygen bond in alcohol. In particular we consider as molecular descriptors paths of length 1, 2 and 3. The multiple regression analysis of the following molecular properties was examined: - log S (S = solubility), CSA (cavity surface area), log P (P = octanol/water partition), and log gamma (gamma = infinite solution activity coefficient). By minimizing the standard error of the regression for each property we found optimal variable weight.

24 citations


Journal ArticleDOI
TL;DR: The use of kinematic, asynchronous, stochastic cellular automata to model liquid properties, solution phenomena and kinetic phenomena encountered in complex biological systems is described.
Abstract: This paper describes the use of kinematic, asynchronous, stochastic cellular automata to model liquid properties, solution phenomena and kinetic phenomena encountered in complex biological systems. Cellular automata models of dynamic phenomena represent in silico experiments designed to assess the effects of competing factors on the physical and chemical properties of solutions and other complex systems. Specific applications include solution behavior, separation of immiscible liquids, micelle formation, diffusion, membrane passage, first- and second-order chemical kinetics, enzyme activity and acid dissociation. Cellular automata is thus considered as providing an exploratory method for the analysis of dynamic phenomena and the discovery and understanding of new, unexpected phenomena.

16 citations


Journal ArticleDOI
TL;DR: Kohonen neural networks, also known as Self Organizing Map (SOM), offer a useful 2D representation of the compound distribution inside a large chemical database and fuzzy techniques based on the "concept of partial truth" reveal to be also a valuable tool for the direct exploitation of chemical databases or SOM.
Abstract: Kohonen neural networks, also known as Self Organizing Map (SOM), offer a useful 2D representation of the compound distribution inside a large chemical database. This distribution results from the compound organization in a molecular diversity hyperspace derived from a large set of molecular descriptors. Fuzzy techniques based on the "concept of partial truth" reveal to be also a valuable tool for the direct exploitation of chemical databases or SOM. In such cases a fuzzy clustering algorithm is used. In this paper, a complete hybrid system, combining SOM and fuzzy clustering, is applied. As example, a series of olfactory compounds was selected. The complexity of such information is that a same compound may exhibit different odors. It is shown how fuzzy logic helps to have a better understanding of the organization of the compounds. These hybrid systems, using simultaneously SOM and fuzzy clustering, are foreseen as powerful tools for "virtual pre-screening".

15 citations


Journal ArticleDOI
TL;DR: Although 3-D QSAR models for colchicinoid series is far less predictive, it allows for a discussion on the relative influence of the structural motifs of these compounds.
Abstract: A novel method for modeling 3D QSAR has been developed. The method involves a multiple training of a series of self-organizing networks (SOM). The obtained networks have been used for processing the data of one reference molecule. A scheme for the analysis of such data with the PLS analysis has been proposed and tested using the steroids data with corticosteroid binding globulin (CBG) affinity. The predictivity of the CBG models measured with the SDEP parameter is among the best one reported. Although 3-D QSAR models for colchicinoid series is far less predictive, it allows for a discussion on the relative influence of the structural motifs of these compounds.

12 citations


Journal ArticleDOI
TL;DR: 86 compounds from NTP carcinogenic potency data base have been used to derive neural network models and the predicted carcinogenic classes and the neighbors in the neural network influencing the predictions have been discussed.
Abstract: 86 compounds from NTP carcinogenic potency data base have been used to derive neural network models. Compounds were described with topological indices. Carcinogenicity has been given as a binary quantity - a compound is carcinogenic or non carcinogenic. Several models have been tested with a recognition ability test and with the leave-one-out cross validation method. For the best model the ratio between correct and wrong classifications was 70/30. Furthermore, the model has been used to classify 17 compounds not used for setting of the models. The predicted carcinogenic classes and the neighbors in the neural network influencing the predictions have been discussed.

9 citations


Journal ArticleDOI
TL;DR: A general-case neural network model for 13C NMR spectrum prediction (estimation) was built from more than 8,300 carbon atoms having various environments, withvantages, disadvantages and peculiarities of neural network-based data modelling.
Abstract: A general-case neural network model for 13C NMR spectrum prediction (estimation) was built from more than 8,300 carbon atoms having various environments. Building the model from the data set required a few weeks' work using commercial software. Average deviation on test data is ca. 4 ppm. There is no limit on molecule complexity. Estimation error does not depend on molecule size or complexity. The emphasis is on the data, the method and the results, not on the processes that take place inside the modelling software. Advantages, disadvantages and peculiarities of neural network-based data modelling ("data mining") are described at length. The differences in data handling between the data mining approach and traditional statistical modelling techniques are discussed and illustrated in detail. The spectrum predictor is available from PMSI at no charge.

Journal ArticleDOI
TL;DR: This paper presents an introduction to classification theory and shows how artificial neural networks can be used for classification and maps out a bootstrapped procedure for interval estimation of posterior probabilities.
Abstract: Classification problems are often encountered in medical diagnosis. This paper presents an introduction to classification theory and shows how artificial neural networks can be used for classification. We also map out a bootstrapped procedure for interval estimation of posterior probabilities. The entire procedure is illustrated using the diabetes mellitus data in Pima Indians.

Journal ArticleDOI
TL;DR: The application of artificial feedforward neural networks to deal with some fundamental problems tied with the folding process and the structure-function relationship in proteins are discussed.
Abstract: In the genomic era DNA sequencing is increasing our knowledge of the molecular structure of genetic codes from bacteria to man at a hyperbolic rate. Billions of nucleotides and millions of aminoacids are already filling the electronic files of the data bases presently available, which contain a tremendous amount of information on the most biologically relevant macromolecules, such as DNA. RNA and proteins. The most urgent problem originates from the need to single out the relevant information amidst a wealth of general features. Intelligent tools are therefore needed to optimise the search. Data mining for sequence analysis in biotechnology has been substantially aided by the development of new powerful methods borrowed from the machine learning approach. In this paper we discuss the application of artificial feedforward neural networks to deal with some fundamental problems tied with the folding process and the structure-function relationship in proteins.

Journal ArticleDOI
TL;DR: In this paper, the authors used ANNs with Extended Delta-Bar-Delta (EDBD) back propagation learning algorithm to predict the standard enthalpy and entropy of 87 acyclic alkanes.
Abstract: Artificial Neural Networks (ANNs) with Extended Delta-Bar-Delta (EDBD) back propagation learning algorithm have been developed to predict the standard enthalpy and entropy of 87 acyclic alkanes. Molecular weight, boiling point and density of the compounds were used as input parameters. The network's architecture and parameters were optimized to give maximum performances. The best network was a 3-6-2 ANN, and the optimum learning epoch was about 1320. The results show that the maximum relative errors of enthalpy and entropy are less than 3%. They reveal that the performances of ANNs for predicting the enthalpy and entropy of alkanes are satisfying.

Journal ArticleDOI
TL;DR: Artificial neural networks can successfully and conveniently solve the problem of predictions of programmed-temperature retention times, and provide useful data for analysis of naphthas in petrochemical industry.
Abstract: It is proposed for the first time a method of prediction of the programmed-temperature retention times of components of naphthas in capillary gas chromatography using artificial neural networks. People are used to predict the programmed-temperature retention time using many formulas such as the integral formula, which requires that four parameters must be determined by calculation or experiments. However the results obtained by the formula are not so good to meet the demand of industry. In order to predict retention time accurately and conveniently, artificial neural networks using five-fold cross-validation and leave-20%-out methods have been applied. Only two parameters: density and isothermal retention index were used as input vectors. The average RMS error for predicted values of five different networks was 0.18, whereas the RMS error of predictions by the integral formula was 0.69. Obviously, the predictions by neural networks were much better than predictions by the formula, and neural netw...

Journal ArticleDOI
TL;DR: For OA agonists, the more similar to reference compound NC (24) the structure of test compound, the higher the activity, whereas for OA antagonists it was not the case.
Abstract: The quantitative structure-activity relationship of 39 octopamine (OA) agonists and 12 antagonists against the thoracic nerve cord of the migratory locust, Locusta migratoria L. was analyzed using atom based rigid fit method or flexible fitting offered by PowerFit 1.0 from MicroSimulation. For OA agonists, the more similar to reference compound NC (24) the structure of test compound, the higher the activity, whereas for OA antagonists it was not the case. Antagonists may not interact with the same part of the membrane with which the agonists interact. Taken the part of the membrane with which the agonist interacts as the true receptor, the antagonist may well interact with an area surrounding the receptor including the ionophore.