scispace - formally typeset
Search or ask a question
Author

Matthew P. Lightfoot

Bio: Matthew P. Lightfoot is an academic researcher from University of Cambridge. The author has contributed to research in topics: Database design & Identifier. The author has an hindex of 1, co-authored 1 publications receiving 4784 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The creation, maintenance, information content and availability of the Cambridge Structural Database (CSD), the world’s repository of small molecule crystal structures, are described.
Abstract: The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal–organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.

6,313 citations

Journal ArticleDOI
01 Jan 2023-IUCrJ
TL;DR: In this article , the benefits, challenges, mechanism and trends in CSD structures are discussed, as well as the benefits and challenges of sharing crystallographic data that are unlikely to be published in a scientific article.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An overview of Mercury 4.0, an analysis, design and prediction platform that acts as a hub for the entire Cambridge Structural Database software suite, is presented.
Abstract: The program Mercury, developed at the Cambridge Crystallographic Data Centre, was originally designed primarily as a crystal structure visualization tool. Over the years the fields and scientific communities of chemical crystallography and crystal engineering have developed to require more advanced structural analysis software. Mercury has evolved alongside these scientific communities and is now a powerful analysis, design and prediction platform which goes a lot further than simple structure visualization.

2,075 citations

Journal ArticleDOI
08 Aug 2019
TL;DR: A comprehensive overview and analysis of the most recent research in machine learning principles, algorithms, descriptors, and databases in materials science, and proposes solutions and future research paths for various challenges in computational materials science.
Abstract: One of the most exciting tools that have entered the material science toolbox in recent years is machine learning. This collection of statistical methods has already proved to be capable of considerably speeding up both fundamental and applied research. At present, we are witnessing an explosion of works that develop and apply machine learning to solid-state systems. We provide a comprehensive overview and analysis of the most recent research in this topic. As a starting point, we introduce machine learning principles, algorithms, descriptors, and databases in materials science. We continue with the description of different machine learning approaches for the discovery of stable materials and the prediction of their crystal structure. Then we discuss research in numerous quantitative structure–property relationships and various approaches for the replacement of first-principle methods by machine learning. We review how active learning and surrogate-based optimization can be applied to improve the rational design process and related examples of applications. Two major questions are always the interpretability of and the physical understanding gained from machine learning models. We consider therefore the different facets of interpretability and their importance in materials science. Finally, we propose solutions and future research paths for various challenges in computational materials science.

1,301 citations

Journal ArticleDOI
TL;DR: A large scale benchmark for molecular machine learning consisting of multiple public datasets, metrics, featurizations and learning algorithms.
Abstract: Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

1,277 citations

Journal ArticleDOI
01 Sep 2017-IUCrJ
TL;DR: The accurate and efficient CE-B3LYP and CE-HF model energies for intermolecular interactions in molecular crystals are extended to a broad range of crystals by calibration against density functional results for molecule/ion pairs extracted from 171 crystal structures.

704 citations

Journal ArticleDOI
TL;DR: An efficient scheme for the in silico sampling for parts of the molecular chemical space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm is proposed and discussed, opening many possible applications in modern computational chemistry and drug discovery.
Abstract: We propose and discuss an efficient scheme for the in silico sampling for parts of the molecular chemical space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm. The focus of this work is set on the generation of proper thermodynamic ensembles at a quantum chemical level for conformers, but similar procedures for protonation states, tautomerism and non-covalent complex geometries are also discussed. The conformational ensembles consisting of all significantly populated minimum energy structures normally form the basis of further, mostly DFT computational work, such as the calculation of spectra or macroscopic properties. By using basic quantum chemical methods, electronic effects or possible bond breaking/formation are accounted for and a very reasonable initial energetic ranking of the candidate structures is obtained. Due to the huge computational speedup gained by the fast low-cost quantum chemical methods, overall short computation times even for systems with hundreds of atoms (typically drug-sized molecules) are achieved. Furthermore, specialized applications, such as sampling with implicit solvation models or constrained conformational sampling for transition-states, metal-, surface-, or noncovalently bound complexes are discussed, opening many possible applications in modern computational chemistry and drug discovery. The procedures have been implemented in a freely available computer code called CREST, that makes use of the fast and reliable GFNn-xTB methods.

671 citations