scispace - formally typeset
Search or ask a question
Author

Chenru Duan

Bio: Chenru Duan is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Computer science & Medicine. The author has an hindex of 14, co-authored 33 publications receiving 627 citations. Previous affiliations of Chenru Duan include Singapore–MIT alliance & Zhejiang University.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry.
Abstract: Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.

146 citations

Journal ArticleDOI
TL;DR: The ANN-driven EI approach achieves at least 500-fold acceleration over random search, identifying a Pareto-optimal design in around 5 weeks instead of 50 years, and shows that a multitask ANN with latent-distance-based UQ surpasses the generalization performance of a GP in this space.
Abstract: The accelerated discovery of materials for real world applications requires the achievement of multiple design objectives. The multidimensional nature of the search necessitates exploration of mult...

104 citations

Journal ArticleDOI
TL;DR: In this paper, the authors compare the performance of LASSO, kernel ridge regression (KRR), and artificial neural network (ANN) models using heuristic, topological revised autocorrelation (RAC) descriptors.
Abstract: Machine learning the electronic structure of open shell transition metal complexes presents unique challenges, including robust and automated data set generation. Here, we introduce tools that simplify data acquisition from density functional theory (DFT) and validation of trained machine learning models using the molSimplify automatic design (mAD) workflow. We demonstrate this workflow by training and comparing the performance of LASSO, kernel ridge regression (KRR), and artificial neural network (ANN) models using heuristic, topological revised autocorrelation (RAC) descriptors we have recently introduced for machine learning inorganic chemistry. On a series of open shell transition metal complexes, we evaluate set aside test errors of these models for predicting the HOMO level and HOMO–LUMO gap. The best performing models are ANNs, which show 0.15 and 0.25 eV test set mean absolute errors on the HOMO level and HOMO–LUMO gap, respectively. Poor performing KRR models using the full 153-feature RAC set ar...

99 citations

Journal ArticleDOI
TL;DR: In this article, the hierarchy equation of motion (HEOM) is extended to the zero-temperature sub-Ohmic spin-boson model, providing a numerically accurate prediction of quantum dynamics.
Abstract: With a decomposition scheme for the bath correlation function, the hierarchy equation of motion (HEOM) is extended to the zero-temperature sub-Ohmic spin-boson model, providing a numerically accurate prediction of quantum dynamics. As a dynamic approach, the extended HEOM determines the delocalized-localized (DL) phase transition from the extracted rate kernel and the coherent-incoherent dynamic transition from the short-time oscillation. As the bosonic bath approaches from the strong to weak sub-Ohmic regimes, a crossover behavior is identified for the critical Kondo parameter of the DL transition, accompanied by the transition from the coherent to incoherent dynamics in the localization.

81 citations

Journal ArticleDOI
TL;DR: Five key mandates for realizing computationally driven accelerated discovery in inorganic chemistry are outlined, including fully automated simulation of new compounds, knowledge of prediction sensitivity or accuracy, faster-than-fast property prediction methods, and maps for rapid chemical space traversal.
Abstract: Recent transformative advances in computing power and algorithms have made computational chemistry central to the discovery and design of new molecules and materials. First-principles simulations are increasingly accurate and applicable to large systems with the speed needed for high-throughput computational screening. Despite these strides, the combinatorial challenges associated with the vastness of chemical space mean that more than just fast and accurate computational tools are needed for accelerated chemical discovery. In transition-metal chemistry and catalysis, unique challenges arise. The variable spin, oxidation state, and coordination environments favored by elements with well-localized d or f electrons provide great opportunity for tailoring properties in catalytic or functional (e.g., magnetic) materials but also add layers of uncertainty to any design strategy. We outline five key mandates for realizing computationally driven accelerated discovery in inorganic chemistry: (i) fully automated simulation of new compounds, (ii) knowledge of prediction sensitivity or accuracy, (iii) faster-than-fast property prediction methods, (iv) maps for rapid chemical space traversal, and (v) a means to reveal design rules on the kilocompound scale. Through case studies in open-shell transition-metal chemistry, we describe how advances in methodology and software in each of these areas bring about new chemical insights. We conclude with our outlook on the next steps in this process toward realizing fully autonomous discovery in inorganic chemistry using computational chemistry.

78 citations


Cited by
More filters
01 Feb 1995
TL;DR: In this paper, the unpolarized absorption and circular dichroism spectra of the fundamental vibrational transitions of the chiral molecule, 4-methyl-2-oxetanone, are calculated ab initio using DFT, MP2, and SCF methodologies and a 5S4P2D/3S2P (TZ2P) basis set.
Abstract: : The unpolarized absorption and circular dichroism spectra of the fundamental vibrational transitions of the chiral molecule, 4-methyl-2-oxetanone, are calculated ab initio. Harmonic force fields are obtained using Density Functional Theory (DFT), MP2, and SCF methodologies and a 5S4P2D/3S2P (TZ2P) basis set. DFT calculations use the Local Spin Density Approximation (LSDA), BLYP, and Becke3LYP (B3LYP) density functionals. Mid-IR spectra predicted using LSDA, BLYP, and B3LYP force fields are of significantly different quality, the B3LYP force field yielding spectra in clearly superior, and overall excellent, agreement with experiment. The MP2 force field yields spectra in slightly worse agreement with experiment than the B3LYP force field. The SCF force field yields spectra in poor agreement with experiment.The basis set dependence of B3LYP force fields is also explored: the 6-31G* and TZ2P basis sets give very similar results while the 3-21G basis set yields spectra in substantially worse agreements with experiment. jg

1,652 citations

Journal Article
TL;DR: A trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful when the scientific connection between the descriptor and the actuating mechanisms is unclear.
Abstract: Statistical learning of materials properties or functions so far starts with a largely silent, nonchallenged step: the choice of the set of descriptive parameters (termed descriptor). However, when the scientific connection between the descriptor and the actuating mechanisms is unclear, the causality of the learned descriptor-property relation is uncertain. Thus, a trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful. We analyze this issue and define requirements for a suitable descriptor. For a classic example, the energy difference of zinc blende or wurtzite and rocksalt semiconductors, we demonstrate how a meaningful descriptor can be found systematically.

455 citations

Journal ArticleDOI
TL;DR: The discovery and development of catalysts and catalytic processes are essential components to maintaining an ecological balance in the future as mentioned in this paper, and recent revolutions made in data science could have a...
Abstract: The discovery and development of catalysts and catalytic processes are essential components to maintaining an ecological balance in the future. Recent revolutions made in data science could have a ...

272 citations

Journal ArticleDOI
TL;DR: A review of the most prominent algorithmic concepts of explainable artificial intelligence, and forecasts future opportunities, potential applications as well as several remaining challenges is provided in this article. But, the review is limited to the use of deep learning for drug discovery.
Abstract: Deep learning bears promise for drug discovery, including advanced image analysis, prediction of molecular structure and function, and automated generation of innovative chemical entities with bespoke properties. Despite the growing number of successful prospective applications, the underlying mathematical models often remain elusive to interpretation by the human mind. There is a demand for ‘explainable’ deep learning methods to address the need for a new narrative of the machine language of the molecular sciences. This Review summarizes the most prominent algorithmic concepts of explainable artificial intelligence, and forecasts future opportunities, potential applications as well as several remaining challenges. We also hope it encourages additional efforts towards the development and acceptance of explainable artificial intelligence techniques. Drug discovery has recently profited greatly from the use of deep learning models. However, these models can be notoriously hard to interpret. In this Review, Jimenez-Luna and colleagues summarize recent approaches to use explainable artificial intelligence techniques in drug discovery.

270 citations