Machine learning methods in chemoinformatics

doi:10.1002/WCMS.1183

Open AccessJournal ArticleDOI

Machine learning methods in chemoinformatics

John B. O. Mitchell

- 01 Sep 2014 -

Wiley Interdisciplinary Reviews: Computa...

- Vol. 4, Iss: 5, pp 468-481

TLDR

This discussion is methods‐based and focused on some algorithms that chemoinformatics researchers frequently use, particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k‐Nearest Neighbors and naïve Bayes classifiers.

Abstract:

Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naive Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SwissADME: A free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules

Antoine Daina, +4 more

- 03 Mar 2017 -

Scientific Reports

TL;DR: The new SwissADME web tool is presented that gives free access to a pool of fast yet robust predictive models for physicochemical properties, pharmacokinetics, drug-likeness and medicinal chemistry friendliness, among which in-house proficient methods such as the BOILED-Egg, iLOGP and Bioavailability Radar are presented.

...read moreread less

Journal ArticleDOI

MoleculeNet: a benchmark for molecular machine learning

Zhenqin Wu, +7 more

- 03 Jan 2018 -

Chemical Science

TL;DR: A large scale benchmark for molecular machine learning consisting of multiple public datasets, metrics, featurizations and learning algorithms.

...read moreread less

Journal ArticleDOI

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

Marwin H. S. Segler, +3 more

- 24 Jan 2018 -

ACS central science

TL;DR: This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model.

...read moreread less

Journal ArticleDOI

Machine-learning approaches in drug discovery: methods and applications.

Antonio Lavecchia

- 01 Mar 2015 -

Drug Discovery Today

TL;DR: This work focuses on machine-learning techniques within the context of ligand-based VS (LBVS), providing a detailed view of the current state of the art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances.

...read moreread less

Journal ArticleDOI

Machine Learning in Computer-Aided Synthesis Planning

Connor W. Coley, +2 more

- 01 May 2018 -

Accounts of Chemical Research

TL;DR: Two critical aspects of CASP and recent machine learning approaches to both challenges are focused on, including the problem of retrosynthetic planning and anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal Article

R: A language and environment for statistical computing.

R Core Team

- 01 Jan 2014 -

MSOR connections

TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.

...read moreread less

Journal ArticleDOI

Random Forests

Leo Breiman

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Posted Content

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, +4 more

- 03 Jul 2012 -

arXiv: Neural and Evolutionary Computing

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.

...read moreread less

Journal ArticleDOI

A comparison of methods for multiclass support vector machines

Hsu Chih-Wei, +1 more

- 01 Mar 2002 -

IEEE Transactions on Neural Networks

TL;DR: Decomposition implementations for two "all-together" multiclass SVM methods are given and it is shown that for large problems methods by considering all data at once in general need fewer support vectors.

...read moreread less

Journal ArticleDOI

A systematic analysis of performance measures for classification tasks

Marina Sokolova, +1 more

- 01 Jul 2009 -

Information Processing and Management

TL;DR: This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class,multi-labelled, and hierarchical, to produce a measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem.

...read moreread less