scispace - formally typeset
Search or ask a question

Showing papers by "Chenru Duan published in 2022"


Journal ArticleDOI
27 Apr 2022-JACS Au
TL;DR: In this article , the authors exploit active learning to simultaneously optimize methane activation and methanol release calculated with machine learning-accelerated density functional theory in a space of 16 M candidate catalysts including novel macrocycles.
Abstract: Despite decades of effort, no earth-abundant homogeneous catalysts have been discovered that can selectively oxidize methane to methanol. We exploit active learning to simultaneously optimize methane activation and methanol release calculated with machine learning-accelerated density functional theory in a space of 16 M candidate catalysts including novel macrocycles. By constructing macrocycles from fragments inspired by synthesized compounds, we ensure synthetic realism in our computational search. Our large-scale search reveals that low-spin Fe(II) compounds paired with strong-field (e.g., P or S-coordinating) ligands have among the best energetic tradeoffs between hydrogen atom transfer (HAT) and methanol release. This observation contrasts with prior efforts that have focused on high-spin Fe(II) with weak-field ligands. By decoupling equatorial and axial ligand effects, we determine that negatively charged axial ligands are critical for more rapid release of methanol and that higher-valency metals [i.e., M(III) vs M(II)] are likely to be rate-limited by slow methanol release. With full characterization of barrier heights, we confirm that optimizing for HAT does not lead to large oxo formation barriers. Energetic span analysis reveals designs for an intermediate-spin Mn(II) catalyst and a low-spin Fe(II) catalyst that are predicted to have good turnover frequencies. Our active learning approach to optimize two distinct reaction energies with efficient global optimization is expected to be beneficial for the search of large catalyst spaces where no prior designs have been identified and where linear scaling relationships between reaction energies or barriers may be limited or unknown.

13 citations


Journal ArticleDOI
TL;DR: In this article , the authors report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal-organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities.
Abstract: We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal-organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities. We obtain over 2,000 solvent removal stability measures from text mining and 3,000 thermal decomposition temperatures from thermogravimetric analysis data. We assess the validity of our NLP methods and the accuracy of our extracted data by comparing to a hand-labeled subset. Machine learning (ML, i.e. artificial neural network) models trained on this data using graph- and pore-geometry-based representations enable prediction of stability on new MOFs with quantified uncertainty. Our web interface, MOFSimplify, provides users access to our curated data and enables them to harness that data for predictions on new MOFs. MOFSimplify also encourages community feedback on existing data and on ML model predictions for community-based active learning for improved MOF stability models.

8 citations


Journal ArticleDOI
TL;DR: This poster presents a probabilistic procedure for quantifying the polyene-like properties of polymethine-like materials using a high-resolution X-ray diffraction– Gomez–Seiden–Bouchut–Seidel (CNRS)–Sequestration (CVD) method.
Abstract: Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high-throughput screening (VHTS). Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates the MR effect on a chemical property prediction is not well established. We evaluate MR diagnostics for over 10 000 transition-metal complexes (TMCs) and compare to those for organic molecules. We observe that only some MR diagnostics are transferable from one chemical space to another. By studying the influence of MR character on chemical properties (i.e., MR effect) that involve multiple potential energy surfaces (i.e., adiabatic spin splitting, ΔEH–L, and ionization potential, IP), we show that differences in MR character are more important than the cumulative degree of MR character in predicting the magnitude of an MR effect. Motivated by this observation, we build transfer learning models to predict CCSD(T)-level adiabatic ΔEH–L and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving coupled cluster accuracy (i.e., to within 1 kcal mol−1 MAE) for robust VHTS.

7 citations


Journal ArticleDOI
TL;DR: Machine learning has become a part of the fabric of high-throughput screening and computational discovery of materials as discussed by the authors , and it can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance.
Abstract: Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward (a) the discovery of new materials through large-scale enumerative screening, (b) the design of materials through identification of rules and principles that govern materials properties, and (c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

7 citations


Journal ArticleDOI
TL;DR: In this article , an electron density fitting and Δ-learning model is used to select the density functional approximation with the lowest expected error with respect to the coupled cluster theory in a system-specific manner.
Abstract: Approximate density functional theory has become indispensable owing to its balanced cost–accuracy trade-off, including in large-scale screening. To date, however, no density functional approximation (DFA) with universal accuracy has been identified, leading to uncertainty in the quality of data generated from density functional theory. With electron density fitting and Δ-learning, we build a DFA recommender that selects the DFA with the lowest expected error with respect to the gold standard (but cost-prohibitive) coupled cluster theory in a system-specific manner. We demonstrate this recommender approach on the evaluation of vertical spin splitting energies of transition metal complexes. Our recommender predicts top-performing DFAs and yields excellent accuracy (about 2 kcal mol−1) for chemical discovery, outperforming both individual Δ-learning models and the best conventional single-functional approach from a set of 48 DFAs. By demonstrating transferability to diverse synthesized compounds, our recommender potentially addresses the accuracy versus scope dilemma broadly encountered in computational chemistry. A density functional recommender enables chemical space exploration by selecting the best exchange–correlation functional for each system, outperforming the use of a single functional for all systems or transfer learning models.

5 citations


Journal ArticleDOI
TL;DR: In this article , the authors carried out high-level multi-reference configuration interaction theory and coupled cluster quantum chemical calculations on HfO and HfB ground and low-lying electronic states, and computed full potential energy curves, excitation energies, ionization energies, electronic configurations, and spectroscopic parameters with large quadruple-ζ and quintuple-ε quality correlation consistent basis sets.
Abstract: Knowledge of the chemical bonding of HfO and HfB ground and low-lying electronic states provides essential insights into a range of catalysts and materials that contain Hf-O or Hf-B moieties. Here, we carry out high-level multi-reference configuration interaction theory and coupled cluster quantum chemical calculations on these systems. We compute full potential energy curves, excitation energies, ionization energies, electronic configurations, and spectroscopic parameters with large quadruple-ζ and quintuple-ζ quality correlation consistent basis sets. We also investigate equilibrium chemical bonding patterns and effects of correlating core electrons on property predictions. Differences in the ground state electron configuration of HfB(X4Σ-) and HfO(X1Σ+) lead to a significantly stronger bond in HfO than HfB, as judged by both dissociation energies and equilibrium bond distances. We extend our analysis to the chemical bonding patterns of the isovalent HfX (X = O, S, Se, Te, and Po) series and observe similar trends. We also note a linear trend between the decreasing value of the dissociation energy (De) from HfO to HfPo and the singlet-triplet energy gap (ΔES-T) of the molecule. Finally, we compare these benchmark results to those obtained using density functional theory (DFT) with 23 exchange-correlation functionals spanning multiple rungs of "Jacob's ladder." When comparing DFT errors to coupled cluster reference values on dissociation energies, excitation energies, and ionization energies of HfB and HfO, we observe semi-local generalized gradient approximations to significantly outperform more complex and high-cost functionals.

3 citations


Journal ArticleDOI
TL;DR: A ligand-derived machine learning representation is developed to train neural networks to predict the MR character of TMCs from properties of the constituent ligands, which yields models with excellent performance and superior transferability to unseen ligand chemistry and compositions.
Abstract: Accurate virtual high-throughput screening (VHTS) of transition metal complexes (TMCs) remains challenging due to the possibility of high multireference (MR) character that complicates property evaluation. We compute MR diagnostics for over 5,000 ligands present in previously synthesized octahedral mononuclear transition metal complexes in the Cambridge Structural Database (CSD). To accomplish this task, we introduce an iterative approach for consistent ligand charge assignment for ligands in the CSD. Across this set, we observe that the MR character correlates linearly with the inverse value of the averaged bond order over all bonds in the molecule. We then demonstrate that ligand additivity of the MR character holds in TMCs, which suggests that the TMC MR character can be inferred from the sum of the MR character of the ligands. Encouraged by this observation, we leverage ligand additivity and develop a ligand-derived machine learning representation to train neural networks to predict the MR character of TMCs from properties of the constituent ligands. This approach yields models with excellent performance and superior transferability to unseen ligand chemistry and compositions.

2 citations


Journal ArticleDOI
TL;DR: In this paper , a large-scale analysis of mononuclear octahedral transition metal complexes deposited in an experimental database confirms an underrepresentation of lower-symmetry complexes.
Abstract: To accelerate the exploration of chemical space, it is necessary to identify the compounds that will provide the most additional information or value. A large-scale analysis of mononuclear octahedral transition metal complexes deposited in an experimental database confirms an under-representation of lower-symmetry complexes. From a set of around 1000 previously studied Fe(II) complexes, we show that the theoretical space of synthetically accessible complexes formed from the relatively small number of unique ligands is significantly (∼816k) larger. For the properties of these complexes, we validate the concept of ligand additivity by inferring heteroleptic properties from a stoichiometric combination of homoleptic complexes. An improved interpolation scheme that incorporates information about cis and trans isomer effects predicts the adiabatic spin-splitting energy to around 2 kcal/mol and the HOMO level to less than 0.2 eV. We demonstrate a multi-stage strategy to discover leads from the 816k Fe(II) complexes within a targeted property region. We carry out a coarse interpolation from homoleptic complexes that we refine over a subspace of ligands based on the likelihood of generating complexes with targeted properties. We validate our approach on nine new binary and ternary complexes predicted to be in a targeted zone of discovery, suggesting opportunities for efficient transition metal complex discovery.

2 citations


Journal ArticleDOI
TL;DR: In this article , a convolutional neural network is used to monitor geometry optimizations on the fly, and exploit its good performance and transferability in identifying geometry optimization failures for catalyst design.
Abstract: Virtual high-throughput screening (VHTS) and machine learning (ML) have greatly accelerated the design of single-site transition-metal catalysts. VHTS of catalysts, however, is often accompanied with a high calculation failure rate and wasted computational resources due to the difficulty of simultaneously converging all mechanistically relevant reactive intermediates to expected geometries and electronic states. We demonstrate a dynamic classifier approach, i.e., a convolutional neural network that monitors geometry optimizations on the fly, and exploit its good performance and transferability in identifying geometry optimization failures for catalyst design. We show that the dynamic classifier performs well on all reactive intermediates in the representative catalytic cycle of the radical rebound mechanism for the conversion of methane to methanol despite being trained on only one reactive intermediate. The dynamic classifier also generalizes to chemically distinct intermediates and metal centers absent from the training data without loss of accuracy or model confidence. We rationalize this superior model transferability as arising from the use of electronic structure and geometric information generated on-the-fly from density functional theory calculations and the convolutional layer in the dynamic classifier. When used in combination with uncertainty quantification, the dynamic classifier saves more than half of the computational resources that would have been wasted on unsuccessful calculations for all reactive intermediates being considered.

2 citations


Journal ArticleDOI
Abstract: High-throughput screening of large hypothetical databases of metal-organic frameworks (MOFs) can uncover new materials, but their stability in real-world applications is often unknown. We leverage community knowledge and machine learning (ML) models to identify MOFs that are thermally stable and stable upon activation. We separate these MOFs into their building blocks and recombine them to make a new hypothetical MOF database of over 50,000 structures that samples orders of magnitude more connectivity nets and inorganic building blocks than prior databases. This database shows an order of magnitude enrichment of ultrastable MOF structures that are stable upon activation and more than one standard deviation more thermally stable than the average experimentally characterized MOF. For the nearly 10,000 ultrastable MOFs, we compute bulk elastic moduli to confirm these materials have good mechanical stability, and we report methane deliverable capacities. Our work identifies privileged metal nodes in ultrastable MOFs that optimize gas storage and mechanical stability simultaneously.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors report the first application of low-cost MR diagnostics based on the fractional occupation number calculated with finite-temperature DFT to solid-state systems.
Abstract: When a many-body wave function of a system cannot be captured by a single determinant, high-level multireference (MR) methods are required to properly explain its electronic structure. MR diagnostics to estimate the magnitude of such static correlation have been primarily developed for molecular systems and range from low in computational cost to as costly as the full MR calculation itself. We report the first application of low-cost MR diagnostics based on the fractional occupation number calculated with finite-temperature DFT to solid-state systems. To compare the behavior of the diagnostics on solids and molecules, we select metal-organic frameworks (MOFs) as model materials because their reticular nature provides an intuitive way to identify molecular derivatives. On a series of closed-shell MOFs, we demonstrate that the DFT-based MR diagnostics are equally applicable to solids as to their molecular derivatives. The magnitude and spatial distribution of the MR character of a MOF are found to have a good correlation with those of its molecular derivatives, which can be calculated much more affordably in comparison to those of the full MOF. The additivity of MR character discussed here suggests the set of molecular derivatives to be a good representation of a MOF for both MR detection and ultimately for MR corrections, facilitating accurate and efficient high-throughput screening of MOFs and other porous solids.

Journal Article
TL;DR: Differences in MR character are more important than the total degree of MR character in predicting MR effect in property prediction, and transfer learning models are built to directly predict CCSD(T)-level adiabatic $\Delta E_\mathrm{H-L}$ and IP from lower levels of theory.
Abstract: Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high throughput screening (VHTS). Nevertheless, most VHTS is carried out with approximate density functional theory (DFT) using a single functional. Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates MR effect on chemical property prediction is not well established. We evaluate MR diagnostics of over 10,000 transition metal complexes (TMCs) and compare to those in organic molecules. We reveal that only some MR diagnostics are transferable across these materials spaces. By studying the influence of MR character on chemical properties (i.e., MR effect) that involves multiple potential energy surfaces (i.e., adiabatic spin splitting, $\Delta E_\mathrm{H-L}$, and ionization potential, IP), we observe that cancellation in MR effect outweighs accumulation. Differences in MR character are more important than the total degree of MR character in predicting MR effect in property prediction. Motivated by this observation, we build transfer learning models to directly predict CCSD(T)-level adiabatic $\Delta E_\mathrm{H-L}$ and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving chemical accuracy (i.e., 1 kcal/mol) for robust VHTS.

Journal ArticleDOI
TL;DR: In this paper , low-cost machine learning (ML) models were used to predict the excited state properties of photoactive iridium complexes, including mean emission energy of phosphorescence, excited state lifetime, and emission spectral integral.
Abstract: Photoactive iridium complexes are of broad interest due to their applications ranging from lighting to photocatalysis. However, the excited state property prediction of these complexes challenges ab initio methods such as time-dependent density functional theory (TDDFT) both from an accuracy and a computational cost perspective, complicating high throughput virtual screening (HTVS). We instead leverage low-cost machine learning (ML) models to predict the excited state properties of photoactive iridium complexes. We use experimental data of 1,380 iridium complexes to train and evaluate the ML models and identify the best-performing and most transferable models to be those trained on electronic structure features from low-cost density functional theory tight binding calculations. Using these models, we predict the three excited state properties considered, mean emission energy of phosphorescence, excited state lifetime, and emission spectral integral, with accuracy competitive with or superseding TDDFT. We conduct feature importance analysis to identify which iridium complex attributes govern excited state properties and we validate these trends with explicit examples. As a demonstration of how our ML models can be used for HTVS and the acceleration of chemical discovery, we curate a set of novel hypothetical iridium complexes and identify promising ligands for the design of new phosphors.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a design space of 32.5M transition metal complexes (TMCs) for machine learning-based chemical discovery. And they used efficient global optimization to sample candidate low-spin chromophores that simultaneously have low absorption energies and low static correlation.
Abstract: Two outstanding challenges for machine learning (ML)-accelerated chemical discovery are the synthesizability of candidate molecules or materials and the fidelity of the data used in ML model training. To address the first challenge, we construct a hypothetical design space of 32.5M transition metal complexes (TMCs), in which all of the constituent fragments (i.e., metals and ligands) and ligand symmetries are synthetically accessible. To address the second challenge, we search for consensus in predictions among 23 density functional approximations across multiple rungs of “Jacob’s ladder”. To accelerate the screening of these 32.5M TMCs, we use efficient global optimization to sample candidate low-spin chromophores that simultaneously have low absorption energies and low static correlation. Despite the scarcity (i.e., < 0.01%) of potential chromophores in this large chemical space, we identify transition metal chromophores with high likelihood (i.e., > 10%) as the ML models improve during active learning. This represents a 1,000 fold acceleration in discovery corresponding to discoveries in days instead of years. Analyses of candidate chromophores reveal a preference for Co(III) and large, strong-field ligands with more bond saturation. We compute the absorption spectra of promising chromophores on the Pareto front by time-dependent density functional theory calculations and verify that 2/3 of them have desired excited state properties. Although these complexes have never been experimentally explored, their constituent ligands demonstrated interesting optical properties in literature, exemplifying the effectiveness of our construction of realistic TMC design space and active learning approach.