scispace - formally typeset
Journal ArticleDOI

Fragment Database FDB-17.

TLDR
A much smaller subset of GDB-17 is selected, called the fragment database FDB- 17, which contains 10 million fragmentlike molecules evenly covering a broad value range for molecular size, polarity, and stereochemical complexity.
Abstract
To better understand chemical space we recently enumerated the database GDB-17 containing 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogen following the simple rules of chemical stability and synthetic feasibility. However, due to the combinatorial explosion caused by systematic enumeration GDB-17 is strongly biased toward the largest, functionally and stereochemically most complex molecules and far too large for most virtual screening tools. Herein we selected a much smaller subset of GDB-17, called the fragment database FDB-17, which contains 10 million fragmentlike molecules evenly covering a broad value range for molecular size, polarity, and stereochemical complexity. The database is available at www.gdb.unibe.ch for download and free use, together with an interactive visualization application and a Web-based nearest neighbor search tool to facilitate the selection of new fragment-sized molecules for chemical synthesis.

read more

Citations
More filters
Journal ArticleDOI

Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery

TL;DR: The current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects.
Journal ArticleDOI

Transfer Learning for Drug Discovery

TL;DR: This perspective aims to provide an overview of transferLearning and related applications in drug discovery and give outlooks as to future development and application of transfer learning for drug discovery.
Journal ArticleDOI

Visualization of very large high-dimensional data sets as minimum spanning trees.

TL;DR: This paper applies a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree, to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets.
Journal ArticleDOI

Machine-learning structural and electronic properties of metal halide perovskites using a hierarchical convolutional neural network

TL;DR: It is shown that a well-designed hierarchical ML approach has a higher fidelity in predicting properties of the MHPs compared to straight-forward methods and underscores the importance of a careful network design and a hierarchical approach to alleviate issues associated with imbalanced dataset distributions.
Journal ArticleDOI

The Alexandria library, a quantum-chemical database of molecular properties for force field development.

TL;DR: The Alexandria library is presented as an open and freely accessible database of optimized molecular geometries, frequencies, electro static moments up to the hexadecupole, electrostatic potential, polarizabilities, and thermochemistry, obtained from quantum chemistry calculations for 2704 compounds.
References
More filters
Journal ArticleDOI

SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules

TL;DR: This chapter discusses the construction of Benzenoid and Coronoid Hydrocarbons through the stages of enumeration, classification, and topological properties in a number of computers used for this purpose.
Journal ArticleDOI

Extended-Connectivity Fingerprints

TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.
Journal ArticleDOI

PubChem Substance and Compound databases

TL;DR: An overview of the PubChem Substance and Compound databases is provided, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access.
Journal ArticleDOI

ZINC 15 – Ligand Discovery for Everyone

TL;DR: A suite of ligand annotation, purchasability, target, and biology association tools, incorporated into ZINC and meant for investigators who are not computer specialists, offer new analysis tools that are easy for nonspecialists yet with few limitations for experts.
Journal ArticleDOI

DrugBank 4.0: shedding new light on drug metabolism

TL;DR: The latest update of DrugBank, DrugBank 4.0, has been further expanded to contain data on drug metabolism, absorption, distribution, metabolism, excretion and toxicity (ADMET) and other kinds of quantitative structure activity relationships (QSAR) information.
Related Papers (5)