scispace - formally typeset
Journal ArticleDOI

Variable Selection in QSAR Studies. II. A Highly Efficient Combination of Systematic Search and Evolution

Hugo Kubinyi
- 01 Jan 1994 - 
- Vol. 13, Iss: 4, pp 393-401
Reads0
Chats0
TLDR
It is demonstrated that systematic search is the best strategy for regression models with two or three X variables, and nearly all relevant regression models are found by this combination of systematic search with the mutation/selection algorithm MUSEUM.
Abstract
Recently two evolutionary strategies for the derivation of regression models, a genetic function approximation and the mutation/ selection algorithm MUSEUM have been described. The MUSEUM (Mutation and Selection Uncover Models) algorithm starts from a model containing randomly chosen variables. Random mutation, first by addition or elimination of only one or very few variables, afterwards by simultaneous random additions, eliminations and/or exchanges of several variables at a time, leads to new models which are evaluated by an appropriate fitness function. Only the “fittest” model is stored and used for further mutation and selection, leading to better and better models. However, the fitness of all models with up to three X variables can be determined much faster by calculation of the correlation coefficients ry.ij and ry.ijk from the partial correlation coefficients ryi, rij, ryj.j, rjk.i and ryk.ij. Using the Selwood data set (n = 31 compounds, k = 53 variables), it is demonstrated that systematic search is the best strategy for regression models with two or three X variables. The variables contained in the best three-variable models can be selected for further investigation, using the evolutionary approach. With the exception of complex models, containing six and more variables, nearly all relevant regression models are found by this combination of systematic search with the mutation/selection algorithm MUSEUM; the results are obtained in considerably shorter time than by including all variables in the calculations. In addition, systematic search is also a valuable tool for variable selection prior to stepwise regression and PLS analyses.

read more

Citations
More filters
Journal ArticleDOI

Novel variable selection quantitative structure--property relationship approach based on the k-nearest-neighbor principle

TL;DR: A novel automated variable selection quantitative structure-activity relationship (QSAR) method, based on the kappa-nearest neighbor principle (kNN-QSar) has been developed, which implies that similar compounds display similar profiles of pharmacological activities.
Journal ArticleDOI

Molecular similarity and diversity in chemoinformatics: from theory to applications.

TL;DR: The approaches used to define and descript the concepts of molecular similarity and diversity in the context of chemoinformatics are discussed and the descriptions and analyses of different methods and techniques are introduced.
Journal ArticleDOI

Artificial neural networks and genetic algorithms in QSAR

TL;DR: The general merits and drawbacks of the neural network modeling approach are discussed, and the relationship between neural networks, statistics and expert systems is clarified, and a separate section is devoted exclusively to the subject of validating neural networks models.
Journal ArticleDOI

Unsupervised Forward Selection: A Method for Eliminating Redundant Variables

TL;DR: An unsupervised learning method is proposed for variable selection and its performance assessed using three typical QSAR data sets, showing it to produce simple, robust, and easily interpreted models for the chosen data sets.
Journal ArticleDOI

Three-dimensional QSAR using the k-nearest neighbor method and its interpretation

TL;DR: A novel three-dimensional QSAR approach, kNN-MFA, developed based on principles of the k-nearest neighbor method combined with various variable selection procedures was used to generate models for three different data sets and predict the activity of test molecules through each of these models.
References
More filters
Journal ArticleDOI

Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships

TL;DR: The genetic function approximation (GFA) algorithm is applied to three published data sets to demonstrate it is an effective tool for doing both QSAR and QSPR.
Journal ArticleDOI

Chance factors in studies of quantitative structure-activity relationships.

TL;DR: Using a modified Fortran stepwise multiple-regression analysis program, simulated QSAR studies employing random numbers were run and a substantial incidence of correlations with high r2 values were found, although the overall degree of chance correlation noted was less than that reported in a previous study.
Journal ArticleDOI

Variable Selection in QSAR Studies. I. An Evolutionary Algorithm

TL;DR: A comparison of the results for the Selwood data set with those obtained by other groups shows that more relevant models are derived by the evolutionary approach than by other methods.
Journal ArticleDOI

Structure-activity relationships of antifilarial antimycin analogues: a multivariate pattern recognition study.

TL;DR: Analysis of the structure-activity relationships of a series of novel antifilarial antimycin A1 analogues indicated that membrane or lipid solubility is an important determinant in biological activity agreeing with the proposed primary mode of action of the compounds as disrupters of cuticular glucose uptake.
Related Papers (5)