scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Cambridge Structural Database

01 Apr 2016-Acta Crystallographica Section B Structural Crystallography and Crystal Chemistry (International Union of Crystallography)-Vol. 72, Iss: 2, pp 171-179
TL;DR: The creation, maintenance, information content and availability of the Cambridge Structural Database (CSD), the world’s repository of small molecule crystal structures, are described.
Abstract: The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal–organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.
Citations
More filters
Journal ArticleDOI
TL;DR: An overview of Mercury 4.0, an analysis, design and prediction platform that acts as a hub for the entire Cambridge Structural Database software suite, is presented.
Abstract: The program Mercury, developed at the Cambridge Crystallographic Data Centre, was originally designed primarily as a crystal structure visualization tool. Over the years the fields and scientific communities of chemical crystallography and crystal engineering have developed to require more advanced structural analysis software. Mercury has evolved alongside these scientific communities and is now a powerful analysis, design and prediction platform which goes a lot further than simple structure visualization.

2,075 citations


Cites methods from "The Cambridge Structural Database"

  • ...The Mercury interface also acts as a hub for wider capabilities of the software suite built around the Cambridge Structural Database (CSD) (Allen, 2002; Groom et al., 2016)....

    [...]

Journal ArticleDOI
08 Aug 2019
TL;DR: A comprehensive overview and analysis of the most recent research in machine learning principles, algorithms, descriptors, and databases in materials science, and proposes solutions and future research paths for various challenges in computational materials science.
Abstract: One of the most exciting tools that have entered the material science toolbox in recent years is machine learning. This collection of statistical methods has already proved to be capable of considerably speeding up both fundamental and applied research. At present, we are witnessing an explosion of works that develop and apply machine learning to solid-state systems. We provide a comprehensive overview and analysis of the most recent research in this topic. As a starting point, we introduce machine learning principles, algorithms, descriptors, and databases in materials science. We continue with the description of different machine learning approaches for the discovery of stable materials and the prediction of their crystal structure. Then we discuss research in numerous quantitative structure–property relationships and various approaches for the replacement of first-principle methods by machine learning. We review how active learning and surrogate-based optimization can be applied to improve the rational design process and related examples of applications. Two major questions are always the interpretability of and the physical understanding gained from machine learning models. We consider therefore the different facets of interpretability and their importance in materials science. Finally, we propose solutions and future research paths for various challenges in computational materials science.

1,301 citations


Cites methods from "The Cambridge Structural Database"

  • ...considered a larger training set of around 14,000 materials from the SuperCon database.(82) Superconductors were first classified into groups with Tc below and above 10 K, resulting in an accuracy and F1 score of about 92%....

    [...]

Journal ArticleDOI
TL;DR: A large scale benchmark for molecular machine learning consisting of multiple public datasets, metrics, featurizations and learning algorithms.
Abstract: Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

1,277 citations

Journal ArticleDOI
01 Sep 2017-IUCrJ
TL;DR: The accurate and efficient CE-B3LYP and CE-HF model energies for intermolecular interactions in molecular crystals are extended to a broad range of crystals by calibration against density functional results for molecule/ion pairs extracted from 171 crystal structures.

704 citations


Cites background from "The Cambridge Structural Database"

  • ...HF/3-21G monomer calculations also failed to converge for some openshell molecules/ions [Cambridge Structural Database (CSD; Groom et al., 2016) refcodes ACACCR07, ACACVO04, CPNDYV07, IGACEC, JIYKEH, AFATAE and AGEFEX], and those structures were not included in the determination of scale factors…...

    [...]

Journal ArticleDOI
TL;DR: An efficient scheme for the in silico sampling for parts of the molecular chemical space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm is proposed and discussed, opening many possible applications in modern computational chemistry and drug discovery.
Abstract: We propose and discuss an efficient scheme for the in silico sampling for parts of the molecular chemical space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm. The focus of this work is set on the generation of proper thermodynamic ensembles at a quantum chemical level for conformers, but similar procedures for protonation states, tautomerism and non-covalent complex geometries are also discussed. The conformational ensembles consisting of all significantly populated minimum energy structures normally form the basis of further, mostly DFT computational work, such as the calculation of spectra or macroscopic properties. By using basic quantum chemical methods, electronic effects or possible bond breaking/formation are accounted for and a very reasonable initial energetic ranking of the candidate structures is obtained. Due to the huge computational speedup gained by the fast low-cost quantum chemical methods, overall short computation times even for systems with hundreds of atoms (typically drug-sized molecules) are achieved. Furthermore, specialized applications, such as sampling with implicit solvation models or constrained conformational sampling for transition-states, metal-, surface-, or noncovalently bound complexes are discussed, opening many possible applications in modern computational chemistry and drug discovery. The procedures have been implemented in a freely available computer code called CREST, that makes use of the fast and reliable GFNn-xTB methods.

671 citations

References
More filters
Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations


"The Cambridge Structural Database" refers background or methods in this paper

  • ...…to 24 residues and mono-, di- and tri-nucleotides are included in the CSD, higher oligomers are covered by the Nucleic Acids Database (Coimbatore Narayanan et al., 2014) with the Protein Data Bank (PDB; Berman et al., 2000) curating and sharing structural data of larger biological macromolecules....

    [...]

  • ...…compounds that could be reliably identified using InChIs as being in common between these resources and the CSD. InChIs have also been used to identify correspondences between CSD entries and ligands bound to macromolecules in structures archived in the Protein Data Bank (PDB; Berman et al., 2000)....

    [...]

Journal ArticleDOI
TL;DR: New features added to the refinement program SHELXL since 2008 are described and explained.
Abstract: The improvements in the crystal structure refinement program SHELXL have been closely coupled with the development and increasing importance of the CIF (Crystallographic Information Framework) format for validating and archiving crystal structures. An important simplification is that now only one file in CIF format (for convenience, referred to simply as `a CIF') containing embedded reflection data and SHELXL instructions is needed for a complete structure archive; the program SHREDCIF can be used to extract the .hkl and .ins files required for further refinement with SHELXL. Recent developments in SHELXL facilitate refinement against neutron diffraction data, the treatment of H atoms, the determination of absolute structure, the input of partial structure factors and the refinement of twinned and disordered structures. SHELXL is available free to academics for the Windows, Linux and Mac OS X operating systems, and is particularly suitable for multiple-core processors.

28,425 citations


"The Cambridge Structural Database" refers methods in this paper

  • ...The embedding of reflection data into the CIF by structure refinement programs such as SHELXL (Sheldrick, 2015) greatly simplifies this process....

    [...]

Journal ArticleDOI
TL;DR: OLEX2 seamlessly links all aspects of the structure solution, refinement and publication process and presents them in a single workflow-driven package, with the ultimate goal of producing an application which will be useful to both chemists and crystallographers.
Abstract: New software, OLEX2, has been developed for the determination, visualization and analysis of molecular crystal structures. The software has a portable mouse-driven workflow-oriented and fully comprehensive graphical user interface for structure solution, refinement and report generation, as well as novel tools for structure analysis. OLEX2 seamlessly links all aspects of the structure solution, refinement and publication process and presents them in a single workflow-driven package, with the ultimate goal of producing an application which will be useful to both chemists and crystallographers.

19,990 citations


"The Cambridge Structural Database" refers methods in this paper

  • ...Automatic links in software used during structure determination (Dolomanov et al., 2009), the ease with which structures can be deposited, attribution of credit in the form of a DOI and continued demonstration of the value to science of depositing crystal structures (Berman et al., 2015) may help…...

    [...]

Journal ArticleDOI
TL;DR: This paper reports on the current status of structure validation in chemical crystallography and describes the current state of research in this area.
Abstract: Automated structure validation was introduced in chemical crystallography about 12 years ago as a tool to assist practitioners with the exponential growth in crystal structure analyses. Validation has since evolved into an easy-to-use checkCIF/PLATON web-based IUCr service. The result of a crystal structure determination has to be supplied as a CIF-formatted computer-readable file. The checking software tests the data in the CIF for completeness, quality and consistency. In addition, the reported structure is checked for incomplete analysis, errors in the analysis and relevant issues to be verified. A validation report is generated in the form of a list of ALERTS on the issues to be corrected, checked or commented on. Structure validation has largely eliminated obvious problems with structure reports published in IUCr journals, such as refinement in a space group of too low symmetry. This paper reports on the current status of structure validation and possible future extensions.

13,163 citations


"The Cambridge Structural Database" refers methods in this paper

  • ...In 2015 the checkCIF/PLATON service (Spek, 2009) was integrated into the CCDC’s deposition process (Fig....

    [...]

Journal ArticleDOI
TL;DR: The Cambridge Structural Database now contains data for more than a quarter of a million small-molecule crystal structures, and projections concerning future accession rates indicate that the CSD will contain at least 500,000 crystal structures by the year 2010.
Abstract: The Cambridge Structural Database (CSD) now contains data for more than a quarter of a million small-molecule crystal structures. The information content of the CSD, together with methods for data acquisition, processing and validation, are summarized, with particular emphasis on the chemical information added by CSD editors. Nearly 80% of new structural data arrives electronically, mostly in CIF format, and the CCDC acts as the official crystal structure data depository for 51 major journals. The CCDC now maintains both a CIF archive (more than 73000 CIFs dating from 1996), as well as the distributed binary CSD archive; the availability of data in both archives is discussed. A statistical survey of the CSD is also presented and projections concerning future accession rates indicate that the CSD will contain at least 500000 crystal structures by the year 2010.

9,865 citations


"The Cambridge Structural Database" refers background in this paper

  • ...Perhaps the most quantitative assessment that can be made of the value of the resource is that the previous published description of the CSD (Allen, 2002), which this article supersedes, has received over 10 000 citations....

    [...]