scispace - formally typeset
Search or ask a question

Showing papers in "Qsar & Combinatorial Science in 2003"


Journal ArticleDOI
TL;DR: A set of simple guidelines for developing validated and predictive QSPR models is presented, highlighting the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and some algorithms that can be used for this purpose.
Abstract: This paper emphasizes the importance of rigorous validation as a crucial, integral component of Quantitative Structure Property Relationship (QSPR) model development. We consider some examples of published QSPR models, which in spite of their high fitted accuracy for the training sets and apparent mechanistic appeal, fail rigorous validation tests, and, thus, may lack practical utility as reliable screening tools. We present a set of simple guidelines for developing validated and predictive QSPR models. To this end, we discuss several validation strategies including (1) randomization of the modelled property, also called Y-scrambling, (2) multiple leave-many-out cross-validations, and (3) external validation using rational division of a dataset into training and test sets. We also highlight the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and discuss some algorithms that can be used for this purpose. We advocate the broad use of these guidelines in the development of predictive QSPR models.

1,838 citations


Journal ArticleDOI
TL;DR: The review provides analysis of potential pitfalls of descriptor based similarity analysis – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity.
Abstract: Although the concept of similarity is a convenient for humans, a formal definition of similarity between chemical compounds is needed to enable automatic decision-making. The objective of similarity measures in toxicology and drug design is to allow assessment of chemical activities. The ideal similarity measure should be relevant to the activity of interest. The relevance could be established by exploiting the knowledge about fundamental chemical and biological processes responsible for the activity. Unfortunately, this knowledge is rarely available and therefore different approximations have been developed based on similarity between structures or descriptor values. Various methods are reviewed, ranging from two-dimensional, three-dimensional and field approaches to recent methods based on “Atoms in Molecules” theory. All these methods attempt to describe chemical compounds by a set of numerical values and define some means for comparison between them. The review provides analysis of potential pitfalls of this methodology – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity. A brief review of known methods for descriptor selection is also provided. The popular “neighborhood behavior” principle is criticized, since proximity with respect to descriptors does not necessarily mean proximity with respect to activity. Structural similarity should also be used with care, as it does not always imply similar activity, as shown by examples. We remind that similarity measures and classification techniques based on distances rely on certain data distribution assumptions. If these assumptions are not satisfied for a given dataset, the results could be misleading. A discussion on similarity in descriptor space in the context of applicability domain assessment of QSAR models is also provided. Finally, it is shown that descriptor based similarity analysis is prone to errors if the relationship between the activity and the descriptors has not been previously established. A justification for the usage of a particular similarity measure should be provided for every specific activity by expert knowledge or derived by data modeling techniques.

365 citations


Journal ArticleDOI
TL;DR: The BAF-QSAR can be applied to categorize organic chemical substances on their bioaccumulation potential and used in the derivation of water quality guidelines and total maximum daily loadings by relating internal concentrations of organic chemicals in upper trophic fish species to corresponding concentrations in the water.
Abstract: This study presents the development of a quantitativestructure activity relationship (QSAR) for assessing the bioaccumulation potential of organic chemicals in aquatic food webs. The QSAR is derived by parameterization and calibration of a mechanistic food web bioaccumulation model. Calibration of the QSAR is based on the derivation of a large database of bioconcentration and bioaccumulation factors, which is evaluated for data quality. The QSAR provides estimates of the bioaccumulation potential of organic chemicals in higher trophic level fish species of aquatic food webs. The QSAR can be adapted to include the effect of metabolic transformation and trophic dilution on the BAF. The BAF-QSAR can be applied to categorize organic chemical substances on their bioaccumulation potential. It identifies chemicals with a log KOW between 4.0 and 12.2 to exhibit BAFs greater than 5 000 in the absence of significant metabolic transformation rates. The BAF-QSAR can also be used in the derivation of water quality guidelines and total maximum daily loadings by relating internal concentrations of organic chemicals in upper trophic fish species to corresponding concentrations in the water.

198 citations


Journal ArticleDOI
TL;DR: Synthetic advances for the construction of Biginelli libraries via solution phase and solid-phase strategies that are amenable to a high-throughput or combinatorial format are detailed.
Abstract: With the emergence of high-throughput screening in the pharmaceutical industry over a decade ago, synthetic chemists were faced with the challenge of preparing large collections of molecules to satisfy the demand for new screening compounds. The unique exploratory power of multicomponent reactions such as the Ugi four-component reaction was soon recognized to be extremely valuable to produce compound libraries in a time- and cost effective manner. The present review article summarizes strategies for the construction of libraries through another multicomponent reaction, the Biginelli dihydropyrimidine synthesis. In this three-component condensation dating back to 1893, CH-acidic carbonyl compounds, aldehydes and urea-type building blocks combine to assemble a multifunctionalized dihydropyrimidine scaffold. Due to the interesting pharmacological properties associated with the privileged DHPM structures, the Biginelli reaction and related procedures have received increasing attention in recent years. This review details synthetic advances for the construction of Biginelli libraries via solution phase and solid-phase strategies that are amenable to a high-throughput or combinatorial format.

143 citations


Journal ArticleDOI
TL;DR: The SPARC solute-solute physical process models have been developed and tested for vapor pressure (at any temperature), heat of vaporization (at 25C and the boiling point), diffusion coefficient ( at 25C) and boiling point at any pressure) for a relatively large number of organic molecules as discussed by the authors.
Abstract: The prototype computer program SPARC has been under development for several years to estimate physical properties and chemical reactivity parameters of organic compounds strictly from molecular structure. SPARC solute-solute physical process models have been developed and tested for vapor pressure (at any temperature), heat of vaporization (at 25C and the boiling point), diffusion coefficient (at 25C) and boiling point (at any pressure) for a relatively large number of organic molecules. The RMS deviation error of the predicted the vapor pressures, heats of vaporization (at any temperature) and boiling points (at any pressure) were close to the intralaboratory experimental errors.

136 citations


Journal ArticleDOI
TL;DR: A method for visualization of molecule distributions in a pharmacophore space by self-organizing maps (SOM) was developed which can be used for virtual library design and potential applications of this "pharmacophore road map" approach are discussed.
Abstract: A collection of reference molecules for ligand-based library design was compiled from the recent scientific literature. Selected properties of this set of 4,236 drugs and drug candidates were analyzed and compared to previous studies. Lipophilicity and molecular weight distributions revealed a trend toward larger, more lipophilic molecules of newer drug candidates. Using the compound collection as a source of reference data, a method for visualization of molecule distributions in a pharmacophore space by self-organizing maps (SOM) was developed which can be used for virtual library design. Potential applications of this "pharmacophore road map" approach are discussed.

130 citations


Journal ArticleDOI
TL;DR: In this article, a log BCF of 238 non-ionic organic compounds was modelled by multiple linear regression models, using theoretical structural descriptors of different kinds (1D-, 2D-and 3D-) selected by the Genetic Algorithm procedure.
Abstract: Bioconcentration factor (BCF) is an important ecotoxicological parameter describing the tendency of chemical concentration in organisms, mainly aquatic. Log BCF of 238 non-ionic organic compounds was modelled by multiple linear regression models, using theoretical structural descriptors of different kinds (1D-, 2D- and 3D-) selected by the Genetic Algorithm procedure. The models, validated for their predictivity using internal (Q 2 , Q 2 L M O ) and external validations (Q 2 E X T ) can be applied for the prediction of not available data, and also for not yet synthesized compounds. Comparison of the proposed models and log Kow- or molecular connectivity (MCIs)-based models reveals that our approach is more predictive than the log Kow-based model and also simpler than the MCIs-based model.

90 citations


Journal ArticleDOI
TL;DR: An overview of the regulatory assessment of pharmaceuticals and personal care products (PPCPs) is provided and will expand on many of the topics for research in the future, and the role that QSAR scientists can play in this research will be highlighted.
Abstract: The potential effect of human and veterinary medicines and other personal care products on the environment has become an important topic over the past few years. Whilst an assessment of the potential environmental risks posed by new and existing pharmaceuticals has been required in the United States (U.S.) for a number of decades, in the European Union (EU) and Canada assessments have only been required in the last 5-10 years. In the U.S., guidance has been available since the early 1980s on the assessment of veterinary medicines, whereas only recently has detailed guidance become available on how to perform the risk assessment in other areas. For example, in Canada, new pharmaceuticals (and other substances including novel foods, food additives, human biologics and genetic therapies, medical devices, natural health products, veterinary drugs, cosmetics) have been required to be notified for an environmental assessment under the Canadian Environmental Protection Act (CEPA 1999) since 2001. The European Medicines Evaluation Authority (EMEA) has published guidelines for assessment of veterinary medicines in use in Europe. For veterinary medicines attempts are currently being made by the Veterinary International Co-operation on Harmonisation (VICH) to harmonise these approaches. Generally, the current assessment approaches are tiered and initially involve a comparison of environmental concentrations with set trigger values. If the trigger values are exceeded then a formal assessment has to be performed requiring data on environmental fate and ecotoxicity. Concerns have been raised over the current approaches used in each of the assessment processes and there are a number of areas that warrant further research. This paper will provide an overview of the regulatory assessment of pharmaceuticals and personal care products (PPCPs) and will expand on many of the topics for research in the future, and the role that QSAR scientists can play in this research will be highlighted.

71 citations


Journal ArticleDOI
TL;DR: The relevance of these findings is that current regulations and protocols may misidentify low KOW but high KOA chemicals as having no bioaccumulation potential and very hydrophobic substances which appear not to biomagnify in aquatic organisms but have the potential to biomagenify in terrestrial food-chains.
Abstract: KOW based QSARs are used to assess the bioaccumulation potential of thousands of commercial chemicals in Canada and internationally The QSARs, which are based on information from aquatic organisms, identify chemicals with a log KOW 5 to have a potential to biomagnify in food-chains This study investigates whether KOW based QSARs are also effective in identifying biomagnifying chemicals in terrestrial food-chains First, a terrestrial bioaccumulation model is developed and used to hypothesize the general relationship between the chemical×s octanol-air and octanol-water partition and its biomagnification potential Secondly, field observations of the bioaccumulation of persistent organic pollutants in wolves are used to test the hypothesis and explore the fundamental differences between QSARs for bioaccumulation in aquatic and terrestrial food-chains The results indicate that (i) QSARs for bioaccumulation in terrestrial foodchains should include both octanol-air (KOA) and octanol water partition coefficients (KOW); (ii) chemicals with a log KOA approximately 5 can biomagnify in terrestrial foodchains if log KOW 2 and the rate chemical transformation or metabolism is low; (iii) biomagnification factors in terrestrial food-chains are much greater than those in aquatic food-chains; (iv) biomagnification factors of very hydrophobic substances (log KOW 7) in terrestrial biota do not drop off with increasing KOW as has been observed in aquatic biota The relevance of these findings is that current regulations and protocols may misidentify (i) low KOW but high KOA chemicals as having no bioaccumulation potential and (ii) very hydrophobic (log KOW 85) which appear not to biomagnify in aquatic organisms but have the potential to biomagnify in terrestrial food-chains Considering that 679% of the approximately 12 000 organic chemicals on Canada×s Domestic Substances List exhibit high KOA but low KOW, this represents a major gap in our methods for screening bioaccumulative substances

65 citations


Journal ArticleDOI
TL;DR: This paper proposes that there is a real mechanistic difference between general and polar narcosis, which manifests itself by significantly different QSARs, even when based on log K m w, and a mathematical model is derived which can explain why this difference is based on a difference in physical chemistry.
Abstract: There are currently two schools of thought surrounding the existence of polar and non-polar narcosis mechanisms in aquatic toxicity. Some authors argue that there is a real distinction between the two modes of action but recently support has grown for the suggestion that there may be no real difference and that the apparent distinction disappears when membrane/water partition ceofficients K m w are used, and has arisen purely because the octanol/water partition coefficient P is in fact an inadequate descriptor of partitioning into a lipid membrane. In this paper the evidence is analysed and it is concluded that although practically useful QSARs covering both general and polar narcotics can be developed based on log K m w , there is nevertheless a real mechanistic difference between general and polar narcosis, which manifests itself by: significantly different QSARs, even when based on log K m w , for general and polar narcotics treated separately; differences in FATS; non-additivity between general narcotics and polar narcotics in mixture toxicity studies. We propose that this difference is based on a difference in physical chemistry, general narcotics acting via 3-D partition (able to move in all directions in the hydrocarbon-like interior of the membrane) and polar narcotics acting via 2-D partition (involving binding between a functional group on the narcotic and the polar phosphatidyl choline head groups at the membrane surface). Based on this hypothesis a mathematical model is derived which can explain: why log P based QSARs covering diverse general narcotics are of statistically better quality than log P based QSARs covering diverse polar narcotics; why the slopes of log P based QSARs are larger for general narcosis than for polar narcosis: why the intercepts of log P based QSARs are larger for polar narcotics than for general narcotics; why some chemicals show additive toxicity both with general and polar narcotics.

64 citations


Journal ArticleDOI
TL;DR: In this article, a QSAR based on reactivity and hydrophobicity parameters has been developed for these aldehydes, which can also be extended to include 1,2-diketones, which react by Schiff base formation.
Abstract: Although not all aldehydes are skin sensitizers, many of them, covering a diverse range of structures, show varying degrees of sensitization potential. Based on consideration of their reaction chemistry, it is possible to identify structural features associated with sensitization potential or the lack of it. Many aldehydes, including several fragrance allergens, can sensitize by Schiff base formation. A QSAR based on reactivity and hydrophobicity parameters has been developed for these aldehydes. The QSAR can be extended to include 1,2-diketones, which can also react by Schiff base formation. The findings indicate that for skin sensitization, as for several other areas of toxicology, chemicals are better classified in terms of their reaction chemistry rather than in terms of their functional groups, i.e., based on mechanisms of action as opposed to chemical class.

Journal ArticleDOI
TL;DR: In this article, the mechanism of aquatic toxicity of cationic organic compounds and the relationship of toxicity with log P (P = octanol/water partition coefficient) was reconsidered and the underlying solvation chemistry is reinterpreted.
Abstract: This paper considers the mechanism of aquatic toxicity of cationic organic compounds and the relationship of toxicity with log P (P = octanol/water partition coefficient). The rather complex log P calculation method givenby Leo and Hansch for cationics is reconsidered and the underlying solvation chemistry is reinterpreted. This reinterpretation is tested by application to micellisation data for cationic surfactants. It is found that in its micellisation potential, expressed as pCMC, a cationic surfactant is similar to the anionic surfactant with the same hydrophobe. Our modifications, derived from this finding, to the log P calculation for cationics, lead to a cationics fish toxicity QSAR whose slope and intercept are close to the range commonly found for polar narcosis QSARs. The compounds included in this QSAR include 2 non-surfactants, indicating that for cationics, as already observed for anionics and nonionics, whether a compound is a surfactant or not is irrelevant to its toxicity. That the cationics act by a polar narcosis mechanism is supported by the results of mixture toxicity studies, indicating additive joint action for mixtures of a cationic with known polar narcotics and independent joint action for a mixture of a cationic with a known general narcotic.

Journal ArticleDOI
TL;DR: In this paper, the effect of steric molecular attributes on predicting bioconcentration factors was studied using 694 chemicals with available experimental BCF and K o w values, and it was found that maximum cross sectional diameters and conformational flexibility of chemicals affect significantly bi-concentration and could be used to explain identification of certain highly hydrophobic chemicals in humans and fish.
Abstract: The bioaccumulation potential of chemicals is used to indicate when chemicals are likely to contaminate fish, birds and other wildlife, and humans. Together with knowledge of the persistence of chemicals, the bioaccumulation potential is useful in setting priorities for hazard identification as well as environmental monitoring. Because the measurement of the bioaccumulation potential is costly, developing reliable estimates of this important indicator directly from chemical structure has long been a goal of Quantitative Structure Activity Relationship (QSAR) practitioners. Many previous models for predicting bioconcentration factors (BCF) for organic chemicals have been based on linear and bilinear relationships between log(BCF) and octanol-water partition coefficient (log(K o w )), some of which also included other structural parameters such as structural correction factors or molecular connectivity indices, Fujita's characters, etc. Most of these BCF models have been derived for predicting passive diffusion of chemicals with log. octanol-water partition coefficients log(K o w ) <7. Most previous models showed large discrepancy for large number of chemicals (predominantly highly lipophilic) found in humans and fish. The effect of steric molecular attributes on predicting BCF was studied using 694 chemicals with available experimental BCF and K o w values. It was found that maximum cross sectional diameters and conformational flexibility of chemicals affect significantly bioconcentration and could be used to explain identification of certain highly hydrophobic chemicals in humans and fish.

Journal ArticleDOI
TL;DR: A combinatorial protocol is introduced here to interface it with the multiple linear regression (MLR) for variable selection and it is demonstrated that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame.
Abstract: A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q 2 value in the range of 0.632 -0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the interparameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.

Journal ArticleDOI
TL;DR: This essay will point out often-underestimated fundamental differences betweendrug discovery and materials science, which are faced in software assisted library design for high throughput ap-proaches, and new methodologies have to bedeveloped which allow the design of efficient libraries.
Abstract: High throughput experimentation (HTE) in catalysis re-search and materials science has ± in spite of its relativelyshort history ± already reached an impressive level ofsophistication with respect to synthetic methods [1], reactortechnology [2], and fast analytical assays [3], and severalreviewpapersareavailablewhichcoverthesedevelopments[4]. In order to fully exploit the advantages associated withthe success in the above mentioned areas, equally sophis-ticatedmethodsarerequiredtomanagetheflowofdataandto extract useful information from these data. However,suitable solutions to these problems, often even partialsolutions, are still lacking. Fully integrated and adaptedinformatics tools to capture, store and treat the highthroughput workflow of data for heterogeneous catalysisand materials are yet to be developed. Equally important,efficient software based methods for library design, forwhich developments on the fundamental level are stillnecessary, are urgently needed to make full use of the novelexperimental tools.Some of the problems are similar as in high throughputdrug discovery, where advanced software support solutionsfollowed the experimental developments. However, thehigh complexity of solids and heterogeneously catalyzedprocessescreatesnovelchallengesgoingbeyondthosefacedin drug discovery. These challenges are often not acknowl-edged, yet, in the community. In this essay we will point outoften-underestimated fundamental differences betweendrug discovery and materials science, which are faced insoftware assisted library design for high throughput ap-proaches. Even if HTE increases the screening power byorders of magnitude, the number of potential experimentsto be carried out is infinite (see, for example, the consid-erations of Jansen concerning the number of possible solidcompounds [5]). Therefore new methodologies have to bedeveloped which allow the design of efficient libraries.Novelconcepts andstrategies for screening with specializedsoftware components are proposed to enhance discoveryand optimization rates.HTE in heterogeneous catalysis and materials sciencerelies on the iterative preparation and testing of largelibraries of solids, either in a parallel mode or sequentially.Theprocessstartswiththedesignofaninitialsetofcatalysts,which can be done either randomly, or following certainrules, or be based on the experience and intuition of thechemist who designs the library. Such an initial library isthen prepared and tested. After analyzing the results andbased on the analysis, a new set of experiments is designed.This methodology is not fundamentally different from theone used in the past in the search for novel catalysts andprocesses. However, the role of the chemist drasticallychanges because the numbers of experiments to be con-ducted and the amount of data to be collected and treatedare orders of magnitude higher. Without an efficientinformatics environment, it is impossible to plan and designsuch vast numbers of experiments [6].In the beginning of a high throughput discovery programtwo possible starting situations can be identified:(i) screen-ing is based on prior information and catalytic systems areavailable which show some activity for the desired reaction,or (ii) there is essentially no precedence of a catalyst, or thesystems previously investigated do not seem to have thepotential for further improvement. The first situation isoften described by the term ™optimization program∫, thesecond situation subsumed under the label ™discoveryprogram∫.

Journal ArticleDOI
TL;DR: In this paper, the authors extended the QSAR model to predict metal toxicity using data from the US EPA ECOTOX database and for binary metal mixtures, using the Microtox ¾ bioassay, the interactions of binary mixtures of metals (Co, Cu, Mn, Ni, and Zn) were quantified using a linear model with an interaction term.
Abstract: Environmental toxicologists readily adopted QSARs from pharmacology to predict organic contaminant toxicity. In contrast, models relating metal ion characteristics to their bioactivity remain poorly explored and underutilized. Quantitative Ion Character-Activity Relationships (QICARs) have recently been developed to predict metal toxicity. The QICAR approach, based on metal-ligand binding tendencies, has been applied successfully to a wide range of effects, species, and media on a single metal basis. In previous single metal studies, a softness parameter and the log KOH were among the ion qualities with the highest predictive value for toxicity. Here, QICAR modeling is extended to predict toxicity using data from the US EPA ECOTOX database and for binary metal mixtures. Using the US EPA ECOTOX database, predictive single metal models were produced for four fish species (bluegill, carp, fathead minnow, and mummichog). Using the Microtox ¾ bioassay, the interactions of binary mixtures of metals (Co, Cu, Mn, Ni, and Zn) were quantified using a linear model with an interaction term. A predictive relationship was developed for metal interaction between metal pairs and the difference in softness. This study supports the hypothesis that general prediction of metal toxicity and interactions from ion characteristics is feasible. It is important that additional work with metals of different valences and sizes be done to further enhance the general accuracy of metal interaction predictions.

Journal ArticleDOI
TL;DR: In this paper, two quantitative models for the prediction of aqueous solubility of 1293 organic compounds were generated by a Multilinear Regression (MLR) analysis, and a Backpropagation (BPG) neural network.
Abstract: Two quantitative models for the prediction of aqueous solubility of 1293 organic compounds were generated by a Multilinear Regression (MLR) analysis, and a Backpropagation (BPG) neural network. The molecules were represented by 18 topological descriptors. The physicochemical relationship between solubility and the descriptors for different individual classes of monofunctional group compounds such as hydrocarbons, ethers, halocarbons, alcohols, aldehydes and ketones, acids, esters, and amines was investigated. The 1293 compounds were divided into a training set of 741 compounds and a test set of 552 compounds based on a Kohonen's self-organizing neural network map. The models obtained show a good predictive power: for the test set, a correlation coefficient of 0.97 and a standard deviation of 0.52 were achieved by the backpropagation neural network approch.

Journal ArticleDOI
Kunal Roy1
TL;DR: The study shows that substituents on the appended 2-phenyl ring and 4-amino or 4-keto substitution on the pyrazolo[3,4-c]quinoline nucleus modulate the selectivity pattern and negative charge on the quinoline nitrogen and volume and lipophilicity of the whole molecules are important contributors to the selectivities.
Abstract: Considering potential of selective adenosine receptor subtype ligands in the development of prospective drug candidates, A 1 and A 3 receptor binding affinity data of 2-arylpyrazolo[3,4-c]quinoline derivatives have been subjected to QSAR analyses to explore the physicochemical requirements for selective binding. The study has been carried out with Wang-Ford charges of the common atoms of the molecules calculated from their energy minimized conformations using AMI technique. Apart from the charge parameters, physicochemical variables like partition coefficient and molar refractivity of the whole molecules have been used along with suitable indicator variables. The study shows that substituents on the appended 2-phenyl ring and 4-amino or 4-keto substitution on the pyrazolo[3,4-c]quinoline nucleus modulate the selectivity pattern. Further, negative charge on the quinoline nitrogen and volume and lipophilicity of the whole molecules are important contributors to the selectivity.

Journal ArticleDOI
TL;DR: The General Solubility Equation (GSE) as discussed by the authors is a simple method of estimating the molar aqueous solubility of an organic non-electrolyte in water (S w ) as a function of its celsius melting point (MP) and octanol-water partition coefficient (K o w ): log S w = -0.01(MP-25) - log K o w + 0.5
Abstract: The General Solubility Equation (GSE) provides a simple method of estimating the molar aqueous solubility of an organic non-electrolyte in water (S w ) as a function of its celsius melting point (MP) and octanol-water partition coefficient (K o w ): log S w = -0.01(MP-25) - log K o w + 0.5 The melting term of the GSE is based upon the Clausius-Clapyron equation and Walden's rule. The aqueous activity coefficient is assumed to be the reciprocal of the octanol-water partition coefficient. The constant is based upon the molarity of pure octanol. There are no fitted parameters in the GSE. Extension of the GSE to weak electrolytes in buffered aqueous solutions is straightforward. The concentration of the ionized species, S i , is accounted for by incorporating one additional term, which contains the pK a of the solute and pH of the solution. For a weak acid, S t o t a l = S w + S i = S w [1 + 10 ( p H - p K a ) ] The solubility of a weak electrolyte in unbuffered water requires further consideration because the solute will determine the pH of the solution. It is shown that in unbuffered media S t o t a l = S w + S i = S w + (S w K a ) 1 / 2 Thus, it is not necessary to explicitly know the pH of the saturated solution to estimate the solubility of a weak electrolyte in water. The GSE is validated on data set of over a thousand compounds, covering a wide range of structural categories. The GSE is compared to a number of other solubility estimation techniques using the criteria of accuracy of fit, applicability, parsimony, convenience, and elegance.

Journal ArticleDOI
TL;DR: In this article, the validity of log P calculations is checked for the substructure methods CLOGP, KOWWIN, and AB/logP and the whole-molecule method SciLogP via experimental log P for 174 molecules, comprising 90 simple organics and 84 more complex drugs.
Abstract: The validity of log P calculations is checked for the substructure methods CLOGP, KOWWIN, and AB/logP and the whole-molecule method SciLogP via experimental log P for 174 molecules, comprising 90 simple organics and 84 more complex drugs. Averaged absolute residual sums (AARS) give the following ranking for the entire set: CLOGP > KOWWIN AB/logP > SciLogP. Separate analysis of simple organics yields: CLOGP > KOWWIN > AB/logP > SciLogP. For the drugs we find: CLOGP KOWWIN AB/LogP > SciLogP. In a second step, we compared the validity of the calculation programs focussing on structural factors with a critical impact on log P such as resonance and H-bonding interactions AARS values show that CLOGP and KOWWIN scored slightly better than AB/I,ogP and SciLogP; this agrees with the good performance of CLOGP and KOWWIN when dealing with simple compounds. AB/LogP averaged correction factors obtained from both simple and complex compounds, so it produced a slightly lower accuracy. α-Effects, representing strong interactions between conjugated n-electrons within polar functional groups, were identified from compounds lacking "isolating carbons". which break α-effects. All compounds in this data set are difficult to deal with for the substructure methods, but should be easy to deal with for the whole-molecule approach. In practice, however, SciLogP performed worse than the substructure methods. The best performance was shown by CLOGP, followed by KOWWIN and AB/LogP. Taken together, all substructure methods produced better results than the whole-molecule method. The possible explanation may be that substructure methods automatically account for unknown effects by splitting compounds into fragments and/or conducting class-specific analyses. Whole-molecule approaches cannot account for unknown effects, as long as they neglect class-specific analyses. Among the substructure approaches, our results correlate with the methodology of algorithm development. CLOGP and KOWWIN were developed in a long iterative process, using simple organics for increment derivation and complex drugs for algorithm refinement. AB/LogP was developed in a fast two-step procedure that did not discriminate between simple and complex compounds. So it produced slightly lower accuracy for simple organics, but not lower accuracy for the complex drugs.

Journal ArticleDOI
TL;DR: The importance of model domain, uncertainty, validity and predictability assessment in promoting the regulatory acceptance of QSARs is discussed.
Abstract: For Quantitative Structure Activity Relationships (QSARs) to be accepted by the regulated and regulatory communities, their scope for use needs to be agreed upon by government and industry. This paper discusses the importance of model domain, uncertainty, validity and predictability assessment in promoting the regulatory acceptance of QSARs.

Journal ArticleDOI
TL;DR: In this article, the rate constant for the tropospheric degradation of 125 organic compounds by reaction with ozone, the least widely and successfully modelled degradation process, is predicted by MLR-QSAR modeling based on a variety of theoretical molecular descriptors, selected by the GA-VSS procedure.
Abstract: The lifetime of organic chemicals in the atmosphere can be calculated from a knowledge of the rate constant of their reaction with free radicals (OH . , NO 3 . ) and O 3 . The rate constant for the tropospheric degradation of 125 organic compounds by reaction with ozone, the least widely and successfully modelled degradation process, is predicted here by MLR-QSAR modelling based on a variety of theoretical molecular descriptors, selected by the Genetic Algorithm-Variable Subset Selection (GA-VSS) procedure. The proposed models, checked for their reliability and robustness, have good predictivity, verified by internal (Q 2 L M O ( 5 0 % ) = 82 - 88% ) and also external validation (Q 2 E X T =90%). The best splitting of the original data set into representative training and test sets has been obtained by the Experimental Design approach. The model applicability domain was always verified by the leverage approach. The average root-mean square error (RMS) for the prediction of log k O 3 was 0.73, similar to the typical range of experimental error.

Journal ArticleDOI
TL;DR: A series of quantitative structure-activity relationship models are presented to describe the in vitro hormone activity (estrogen receptor binding, reporter gene induction, and cell proliferation) of bisphenol A and 24 of its analogs, suggesting that it may be possible to use such structure activities to develop bispenols that are useful monomers with reduced hormone activity.
Abstract: Bisphenol A is a monomer constituent of epoxy and polycarbonate resins used in consumer products. Many studies have shown that bisphenol A is a weak estrogen receptor agonist with endocrine disrupting potential in exposed organisms. Presented here is a series of quantitative structure-activity relationship models to describe the in vitro hormone activity (estrogen receptor binding, reporter gene induction, and cell proliferation) of bisphenol A and 24 of its analogs. The hormone activity ranged over four orders of magnitude, with bisphenol A displaying intermediate activity. Comparative molecular field analysis, comparative molecular similarity indices, and hologram quantitative structure activity models were generated using SYBYL 6.8. Bisphenols with optimal estrogen activity contained two unencumbered phenolic groups in the para orientation, and multiple alkyl substituents extending from the carbon linking phenolic rings. Bisphenols with methyl group hydrogens replaced by halogens also produced strong estrogenic analogs. These studies suggest that it may be possible to use such structure activities to develop bisphenols that are useful monomers with reduced hormone activity.

Journal ArticleDOI
TL;DR: The SVM is shown to be competitive with techniques representing a state-of-the-art on three challenging pharmaceutical classification tasks and demonstrates good potential for further use in this area of drug discovery.
Abstract: This work describes the application of a recent addition to the data analysis toolbox, Support Vector Machines (SVMs) [1], to a classification task involved in contemporary drug discovery. A brief introduction to the SVM method, which relates the theoretical background of the algorithm to the familiar concepts of the perceptron algorithm and basis functions, is followed by a discussion of the relative advantages and disadvantages of using SVMs for supervised machine learning. A complex binary classification scenario presented by pharmaceutical modelling is described. References are provided throughout, to direct the reader towards more detailed descriptions of the work discussed. The remainder of the work describes an experimental methodology for the comparison of an SVM and several other supervised machine learning techniques when applied to real data provided by GlaxoSmithKline. The SVM is shown to be competitive with techniques representing a state-of-the-art on three challenging pharmaceutical classification tasks and demonstrates good potential for further use in this area of drug discovery.

Journal ArticleDOI
TL;DR: ILP has advantages over many other widely used methods as it can reason with relations and hence discover chemical substructures and 3D features without these aspects having been explicitly encoded prior to learning.
Abstract: The application of Inductive Logic Programming (ILP), a form of machine learning, to derive structure activity relationships (SAR) and to discover pharmacophores is reported. The ILP approach was initially applied to model 1D SARs in terms of the attributes of the molecules. Subsequently 2D ILP SARs were developed describing chemical connectivity. Finally ILP has been used to model 3D SARs in which the conformation of the pharmacophore can be described. ILP has advantages over many other widely used methods as it can reason with relations and hence discover chemical substructures and 3D features without these aspects having been explicitly encoded prior to learning. In particular, there is no requirement for a structural superposition. Additionally, the results of ILP provide chemical descriptions that can readily be understood by a medicinal chemist. In several trials, ILP-based SARs have been shown to be significantly more accurate than widely-used methods.

Journal ArticleDOI
TL;DR: In this paper, the SPARC chemical reactivity models were extended to calculate hydrolysis rate constants for carboxylic acid esters from molecular structure, and the energy differences between the initial state and the transition state for a molecule of interest were factored into internal and external mechanistic perturbation components.
Abstract: SPARC chemical reactivity models were extended to calculate hydrolysis rate constants for carboxylic acid esters from molecular structure. The energy differences between the initial state and the transition state for a molecule of interest are factored into internal and external mechanistic perturbation components. The internal perturbations quantify the interactions of the appended perturber (P) with the reaction center (C). These internal perturbations are factored into SPARC's mechanistic components of electrostatic and resonance effects. External perturbations quantify the solute-solvent interactions and are factored into H-bonding, field stabilization and steric effects. These models have been tested using 1471 measured hydrolysis rate constants in water and mixed-solvent systems at different temperatures. The aggregate RMS deviation of the calculated versus observed values was 0.374 M - 1 s - 1 ; close to the intralaboratory experimental error.

Journal ArticleDOI
TL;DR: This work focuses on the utilization of a fuzzy pharmacophore description of molecular similarity and specifically on the influence of fuzzy Pharmacophore pattern matching on the neighborhood behavior (NB) of the similarity scoring scheme.
Abstract: The similarity principle, stating that molecules of similar structure behave similarly, is an important concept in medicinal chemistry. A properly characterized and well-understood neighborhood behavior of the structural space versus the activity space is fundamental for the application of the similarity principle in computational chemistry. In this work we focus on the utilization of a fuzzy pharmacophore description of molecular similarity and specifically on the influence of fuzzy pharmacophore pattern matching on the neighborhood behavior (NB) of the similarity scoring scheme. NB is defined as a structure-activity relationship between the intermolecular distances/ dissimilarities in the pharmacophore fingerprint structure space and the corresponding activity differences, formally seen as intermolecular distances in the activity spaces. The latter are defined on hand of a wide variety of datasets on pharmacological and physico-chemical properties and property profiles. We also investigate the clustering behavior (CB), where the structure-activity relationship is described in terms of distance-derived associations of compounds into clusters via classical hierarchical clustering procedures. The neighborhood behavior and the cluster behavior provide alternative and complementary criteria for evaluating the pertinence of a molecular similarity metric.

Journal ArticleDOI
TL;DR: The SMILIB offers the possibility to construct very large combinatorial libraries using the flexible and portable SMILES format, which allows for creation of easily customized libraries using linkers of different size and chemical nature.
Abstract: A software tool was developed for fast combinatorial library enumeration (SMILIB). Its particular features are its simplicity to use, high flexibility in constructing combinatorial libraries and high speed of library construction. SMILIB offers the possibility to construct very large combinatorial libraries using the flexible and portable SMILES format. Libraries are generated at rates of approximately 30,000 molecules per minute. Combinatorial building blocks are attached to scaffolds by means of linkers rather than to concatenate them directly. This allows for creation of easily customized libraries using linkers of different size and chemical nature. A web interface for a limited web-based version of the software is available at URL: www.modlab.de. An unlimited binary version of SMILIB for command line execution on Linux systems is available from this URL.

Journal ArticleDOI
TL;DR: In this article, a detailed QSPR investigation of the water solubility (logS) of 1063 solid neutral chemicals, agrochemicals, drugs and prodrugs has been carried out.
Abstract: A detailed QSPR investigation of the water solubility (logS) of 1063 solid neutral chemicals, agrochemicals, drugs and prodrugs has been carried out. The application of the "General Solubility Equation" of Yalkowsky et al. resulted in a correlation between experimental and calculated solubility with rather modest statistic criteria. It was found that 191 compounds (18%) have calculated logS with deviations above one logarithmic unit. A comparison of experimental values with those calculated by an equation previously derived for liquid chemicals demonstrates that a part of the total set of compounds containing certain substructures such as chloroalkyls, phenyls, biphenyls, nitrogen (acyclic and in cycles), benzanthracenes, phenols, and chemicals with ether, ester, and N,N-disubstituted carboxamide groups obey to the equation for liquids. Not unexpectedly, the other part of compounds containing both strong H-bond acceptor and donor groups is essentially less soluble than calculated for compounds in the liquid state. It is proposed that this low solubility is connected with a specific H-bond association of these compounds in their crystal lattice. Two approaches were tested for the derivation of quantitative models for prediction of the solubility of solid chemicals and drugs. The first approach is based on the QSPR for liquids extended by several indicator variables for different functional groups. The second approach is based on the combination of chemical similarity and traditional QSAR techniques. In the framework of this approach different numbers of structural and physicochemical neighbors of a compound-of-interest were considered together with the corresponding HYBOT descriptor data. This method enables to calculate the solubility of a compound-of-interest by using the solubility of nearest neighbor compounds and the difference between descriptor data of those neighbors and the corresponding data of the compound-of-interest. It was found that the use of nearest neighbor compounds with high Tanimoto index value (i.e. good structural similarity) and close H-bond acceptor and donor factors (good physicochemical similarity) ensures good prediction of water solubility with average absolute errors on the level of the error of experimental logS determination.

Journal ArticleDOI
TL;DR: In this article, a decision support tool was proposed to facilitate subsequent pollution prevention activities by regulated and regulatory communities using Quantitative Structure Activity Relationship (QSAR) estimates for these endpoints appear as an attractive alternative.
Abstract: The selection of Persistent, Bioaccumulative and Toxic (PBT) substances constitutes an important task in the possible regulation of chemicals. Due to the shortage of experimental data, Quantitative Structure Activity Relationship (QSAR) estimates for these endpoints appear as an attractive alternative. In the present study, biodegradation, bioaccumulation and aquatic toxicity estimates were obtained using the BioWin, BCFWin and ECOSAR modules of the EPI suite. Partial order theory was used to rank QSAR-based PBT substances. The proposed approach is suggested as a decision support tool to facilitate subsequent pollution prevention activities by regulated and regulatory communities.