Machine learning methods in chemoinformatics
TLDR
This discussion is methods‐based and focused on some algorithms that chemoinformatics researchers frequently use, particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k‐Nearest Neighbors and naïve Bayes classifiers.Abstract:
Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naive Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481.
How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183read more
Citations
More filters
Journal ArticleDOI
SwissADME: A free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules
TL;DR: The new SwissADME web tool is presented that gives free access to a pool of fast yet robust predictive models for physicochemical properties, pharmacokinetics, drug-likeness and medicinal chemistry friendliness, among which in-house proficient methods such as the BOILED-Egg, iLOGP and Bioavailability Radar are presented.
Journal ArticleDOI
MoleculeNet: a benchmark for molecular machine learning
Zhenqin Wu,Bharath Ramsundar,Evan N. Feinberg,Joseph Gomes,Caleb Geniesse,Aneesh S. Pappu,Karl Leswing,Vijay S. Pande +7 more
TL;DR: A large scale benchmark for molecular machine learning consisting of multiple public datasets, metrics, featurizations and learning algorithms.
Journal ArticleDOI
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
TL;DR: This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model.
Journal ArticleDOI
Machine-learning approaches in drug discovery: methods and applications.
TL;DR: This work focuses on machine-learning techniques within the context of ligand-based VS (LBVS), providing a detailed view of the current state of the art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances.
Journal ArticleDOI
Machine Learning in Computer-Aided Synthesis Planning
TL;DR: Two critical aspects of CASP and recent machine learning approaches to both challenges are focused on, including the problem of retrosynthetic planning and anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan.
References
More filters
Journal Article
R: A language and environment for statistical computing.
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Journal ArticleDOI
Random Forests
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Posted Content
Improving neural networks by preventing co-adaptation of feature detectors
TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
Journal ArticleDOI
A comparison of methods for multiclass support vector machines
Hsu Chih-Wei,Chih-Jen Lin +1 more
TL;DR: Decomposition implementations for two "all-together" multiclass SVM methods are given and it is shown that for large problems methods by considering all data at once in general need fewer support vectors.
Journal ArticleDOI
A systematic analysis of performance measures for classification tasks
Marina Sokolova,Guy Lapalme +1 more
TL;DR: This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class,multi-labelled, and hierarchical, to produce a measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem.