Showing papers in "Nature Computational Science in 2022"

PDF

Open Access

Journal Article•DOI•

Opportunities for neuromorphic computing algorithms and applications

[...]

Catherine D. Schuman, Shruti R. Kulkarni, Maryam Parsa, J. Parker Mitchell, Prasanna Date, Bill Kay - Show less +2 more

01 Jan 2022-Nature Computational Science

TL;DR: A review of recent results in neuromorphic computing algorithms and applications can be found in this article , where the authors highlight characteristics of neuromorphic Computing technologies that make them attractive for the future of computing and discuss opportunities for future development of algorithms and application on these systems.

...read moreread less

Abstract: Neuromorphic computing technologies will be important for the future of computing, but much of the work in neuromorphic computing has focused on hardware development. Here, we review recent results in neuromorphic computing algorithms and applications. We highlight characteristics of neuromorphic computing technologies that make them attractive for the future of computing and we discuss opportunities for future development of algorithms and applications on these systems. There is still a wide variety of challenges that restrict the rapid growth of neuromorphic algorithmic and application development. Addressing these challenges is essential for the research community to be able to effectively use neuromorphic computers in the future.

...read moreread less

143 citations

Journal Article•DOI•

Challenges and opportunities in quantum machine learning

[...]

María Luisa Cortijo Cerezo, Guillaume Verdon, Hsin-Yuan Huang, Lukasz Cincio, Patrick J. Coles - Show less +1 more

01 Sep 2022-Nature Computational Science

TL;DR: In this paper , the authors highlight differences between quantum and classical machine learning, with a focus on quantum neural networks and quantum deep learning, and discuss opportunities for quantum advantage with quantum machine learning.

...read moreread less

Abstract: At the intersection of machine learning and quantum computing, quantum machine learning has the potential of accelerating data analysis, especially for quantum data, with applications for quantum materials, biochemistry and high-energy physics. Nevertheless, challenges remain regarding the trainability of quantum machine learning models. Here we review current methods and applications for quantum machine learning. We highlight differences between quantum and classical machine learning, with a focus on quantum neural networks and quantum deep learning. Finally, we discuss opportunities for quantum advantage with quantum machine learning. Quantum machine learning has become an essential tool to process and analyze the increased amount of quantum data. Despite recent progress, there are still many challenges to be addressed and myriad future avenues of research.

...read moreread less

62 citations

Journal Article•DOI•

Enhancing computational fluid dynamics with machine learning

[...]

Ricardo Vinuesa¹•Institutions (1)

Royal Institute of Technology¹

27 Jun 2022-Nature Computational Science

TL;DR: In this paper , the authors highlight some of the areas of highest potential impact, including to accelerate direct numerical simulations, to improve turbulence closure modeling, and to develop enhanced reduced-order models.

...read moreread less

Abstract: Machine learning is rapidly becoming a core technology for scientific computing, with numerous opportunities to advance the field of computational fluid dynamics. In this Perspective, we highlight some of the areas of highest potential impact, including to accelerate direct numerical simulations, to improve turbulence closure modeling, and to develop enhanced reduced-order models. We also discuss emerging areas of machine learning that are promising for computational fluid dynamics, as well as some potential limitations that should be taken into account.

...read moreread less

55 citations

Journal Article•DOI•

A universal graph deep learning interatomic potential for the periodic table

[...]

Chi Chen, Shyue Ping Ong

05 Feb 2022-Nature Computational Science

TL;DR: In this article , a universal IAP for materials based on graph neural networks with three-body interactions (M3GNet) was reported, which was trained on the massive database of structural relaxations performed by the Materials Project over the past 10 years and has broad applications in structural relaxation, dynamic simulations and property prediction of materials across diverse chemical spaces.

...read moreread less

Abstract: Interatomic potentials (IAPs), which describe the potential energy surface of atoms, are a fundamental input for atomistic simulations. However, existing IAPs are either fitted to narrow chemistries or too inaccurate for general applications. Here, we report a universal IAP for materials based on graph neural networks with three-body interactions (M3GNet). The M3GNet IAP was trained on the massive database of structural relaxations performed by the Materials Project over the past 10 years and has broad applications in structural relaxation, dynamic simulations and property prediction of materials across diverse chemical spaces. About 1.8 million materials were identified from a screening of 31 million hypothetical crystal structures to be potentially stable against existing Materials Project crystals based on M3GNet energies. Of the top 2000 materials with the lowest energies above hull, 1578 were verified to be stable using DFT calculations. These results demonstrate a machine learning-accelerated pathway to the discovery of synthesizable materials with exceptional properties.

...read moreread less

42 citations

Journal Article•DOI•

Modeling how antibody responses may determine the efficacy of COVID-19 vaccines

[...]

Pranesh Padmanabhan, Rajat Desikan, Narendra M. Dixit

01 Feb 2022-Nature Computational Science

TL;DR: In this paper , a multiscale mathematical model was proposed to predict the efficacy of COVID-19 vaccines and the neutralizing antibody responses elicited by the vaccines, based on the assumption that vaccination would elicit similar NAb responses.

...read moreread less

Abstract: Predicting the efficacy of COVID-19 vaccines would aid vaccine development and usage strategies, which is of importance given their limited supplies. Here we develop a multiscale mathematical model that proposes mechanistic links between COVID-19 vaccine efficacies and the neutralizing antibody (NAb) responses they elicit. We hypothesized that the collection of all NAbs would constitute a shape space and that responses of individuals are random samples from this space. We constructed the shape space by analyzing reported in vitro dose–response curves of ~80 NAbs. Sampling NAb subsets from the space, we recapitulated the responses of convalescent patients. We assumed that vaccination would elicit similar NAb responses. We developed a model of within-host SARS-CoV-2 dynamics, applied it to virtual patient populations and, invoking the NAb responses above, predicted vaccine efficacies. Our predictions quantitatively captured the efficacies from clinical trials. Our study thus suggests plausible mechanistic underpinnings of COVID-19 vaccines and generates testable hypotheses for establishing them. A multiscale model is presented to quantitatively predict COVID-19 vaccine efficacies by describing the generation, activity and diversity of neutralizing antibodies.

...read moreread less

28 citations

Journal Article•DOI•

Automated discovery of fundamental variables hidden in experimental data

[...]

Boyuan Chen, Kuang Huang, Sunand Raghupathi, I. Chandratreya, Qi Du, Hod Lipson - Show less +2 more

01 Jul 2022-Nature Computational Science

27 citations

Journal Article•DOI•

Experimental quantum adversarial learning with programmable superconducting qubits

[...]

Wenhui Ren, Weikang Li, Shibo Xu, Ke Wang, Wenjie Jiang, Feitong Jin, Xuhao Zhu, Jiachen Chen, Z. Song, Feng Zhang, Hang Dong, Xu Zhang, Jinfeng Deng, Yu Gao, Chuanyu Zhang, Yaozu Wu, Bing Zhang, Qiujiang Guo, Hekang Li, Zhen Wang, Jacob Biamonte, Chao Song, Dong-Ling Deng, H. Wang - Show less +20 more

04 Apr 2022-Nature Computational Science

TL;DR: In this paper , an experimental demonstration of quantum adversarial learning with programmable superconducting qubits is presented, where the vulnerability of quantum machine learning models against adversarial noises, together with a defense strategy way out of this dilemma, is demonstrated experimentally with a programmable Superconducting quantum processor.

...read moreread less

Abstract: Quantum computing promises to enhance machine learning and artificial intelligence. However, recent theoretical works show that, similar to traditional classifiers based on deep classical neural networks, quantum classifiers would suffer from adversarial perturbations as well. Here we report an experimental demonstration of quantum adversarial learning with programmable superconducting qubits. We train quantum classifiers, which are built on variational quantum circuits consisting of ten transmon qubits featuring average lifetimes of 150 μs, and average fidelities of simultaneous single- and two-qubit gates above 99.94% and 99.4%, respectively, with both real-life images (for example, medical magnetic resonance imaging scans) and quantum data. We demonstrate that these well-trained classifiers (with testing accuracy up to 99%) can be practically deceived by small adversarial perturbations, whereas an adversarial training process would substantially enhance their robustness to such perturbations. The vulnerability of quantum machine learning models against adversarial noises, together with a defense strategy way out of this dilemma, is demonstrated experimentally with a programmable superconducting quantum processor.

...read moreread less

24 citations

Journal Article•DOI•

Cell clustering for spatial transcriptomics data with graph neural networks

[...]

Jiachen Li, Siheng Chen, Xiaoyong Pan, Ye Yuan, Hong-Bin Shen - Show less +1 more

01 Jun 2022-Nature Computational Science

23 citations

Journal Article•DOI•

Accelerating minimap2 for long-read sequencing applications on modern CPUs

[...]

Saurabh Kalikar, Chirag Jain, Md. Vasimuddin, Sanchit Misra

01 Feb 2022-Nature Computational Science

22 citations

Journal Article•DOI•

The fast continuous wavelet transformation (fCWT) for real-time, high-quality, noise-resistant time–frequency analysis

[...]

Lukas P. A. Arts, Egon L. van den Broek

01 Jan 2022-Nature Computational Science

TL;DR: In this paper , an open-source algorithm to calculate the fast continuous wavelet transform (fCWT) is proposed, where the parallel environment of fCWT separates scale-independent and scale-dependent operations, while utilizing optimized fast Fourier transforms that exploit downsampled wavelets.

...read moreread less

Abstract: Abstract The spectral analysis of signals is currently either dominated by the speed–accuracy trade-off or ignores a signal’s often non-stationary character. Here we introduce an open-source algorithm to calculate the fast continuous wavelet transform (fCWT). The parallel environment of fCWT separates scale-independent and scale-dependent operations, while utilizing optimized fast Fourier transforms that exploit downsampled wavelets. fCWT is benchmarked for speed against eight competitive algorithms, tested on noise resistance and validated on synthetic electroencephalography and in vivo extracellular local field potential data. fCWT is shown to have the accuracy of CWT, to have 100 times higher spectral resolution than algorithms equal in speed, to be 122 times and 34 times faster than the reference and fastest state-of-the-art implementations and we demonstrate its real-time performance, as confirmed by the real-time analysis ratio. fCWT provides an improved balance between speed and accuracy, which enables real-time, wide-band, high-quality, time–frequency analysis of non-stationary noisy signals.

...read moreread less

20 citations

Journal Article•DOI•

Diffraction-limited molecular cluster quantification with Bayesian nonparametrics

[...]

Xakimova Saida Baxtiyorovna¹•Institutions (1)

Arizona State University¹

28 Feb 2022-Nature Computational Science

TL;DR: In this paper , the authors proposed a method to simultaneously enumerate fluorophores and determine their individual photophysical state trajectories using Bayesian nonparametrics and specialized Monte Carlo algorithms.

...read moreread less

Abstract: Life's fundamental processes involve multiple molecules operating in close proximity within cells. To probe the composition and kinetics of molecular clusters confined within small (diffraction-limited) regions, experiments often report on the total fluorescence intensity simultaneously emitted from labeled molecules confined to such regions. Methods exist to enumerate total fluorophore numbers (e.g., step counting by photobleaching). However, methods aimed at step counting by photobleaching cannot treat photophysical dynamics in counting nor learn their associated kinetic rates. Here we propose a method to simultaneously enumerate fluorophores and determine their individual photophysical state trajectories. As the number of active (fluorescent) molecules at any given time is unknown, we rely on Bayesian nonparametrics and use specialized Monte Carlo algorithms to derive our estimates. Our formulation is benchmarked on synthetic and real data sets. While our focus here is on photophysical dynamics (in which labels transition between active and inactive states), such dynamics can also serve as a proxy for other types of dynamics such as assembly and disassembly kinetics of clusters. Similarly, while we focus on the case where all labels are initially fluorescent, other regimes, more appropriate to photoactivated localization microscopy, where fluorophores are instantiated in a non-fluorescent state, fall within the scope of the framework. As such, we provide a complete and versatile framework for the interpretation of complex time traces arising from the simultaneous activity of up to 100 fluorophores.

...read moreread less

Journal Article•DOI•

Large-scale microbiome data integration enables robust biomarker identification

[...]

Liwen Xiao, Fengyi Zhang, Fangqing Zhao

01 May 2022-Nature Computational Science

TL;DR: In this paper , the authors proposed an algorithm, NetMoss, for assessing shifts of microbial network modules to identify robust biomarkers associated with various diseases, which shows better performance in removing batch effects.

...read moreread less

Abstract: Abstract The close association between gut microbiota dysbiosis and human diseases is being increasingly recognized. However, contradictory results are frequently reported, as confounding effects exist. The lack of unbiased data integration methods is also impeding the discovery of disease-associated microbial biomarkers from different cohorts. Here we propose an algorithm, NetMoss, for assessing shifts of microbial network modules to identify robust biomarkers associated with various diseases. Compared to previous approaches, the NetMoss method shows better performance in removing batch effects. Through comprehensive evaluations on both simulated and real datasets, we demonstrate that NetMoss has great advantages in the identification of disease-related biomarkers. Based on analysis of pandisease microbiota studies, there is a high prevalence of multidisease-related bacteria in global populations. We believe that large-scale data integration will help in understanding the role of the microbiome from a more comprehensive perspective and that accurate biomarker identification will greatly promote microbiome-based medical diagnosis.

...read moreread less

Journal Article•DOI•

Progressive assembly of multi-domain protein structures from cryo-EM density maps

[...]

Xiao-Gen Zhou, Yang Li, Chengxin Zhang, Wei Zheng, Guijun Zhang, Yang Zhang - Show less +2 more

01 Apr 2022-Nature Computational Science

TL;DR: In this article , the authors developed domain enhanced modeling using cryo-electron microscopy (DEMO-EM), an automatic method to assemble multi-domain structures from Cryo-Electron (CE) microscopy maps through a progressive structural refinement procedure combining rigid-body domain fitting and flexible assembly simulations with deep-neural network inter-domain distance profiles.

...read moreread less

Abstract: Progress in cryo-electron microscopy has provided the potential for large-size protein structure determination. However, the success rate for solving multi-domain proteins remains low because of the difficulty in modelling inter-domain orientations. Here we developed domain enhanced modeling using cryo-electron microscopy (DEMO-EM), an automatic method to assemble multi-domain structures from cryo-electron microscopy maps through a progressive structural refinement procedure combining rigid-body domain fitting and flexible assembly simulations with deep-neural-network inter-domain distance profiles. The method was tested on a large-scale benchmark set of proteins containing up to 12 continuous and discontinuous domains with medium- to low-resolution density maps, where DEMO-EM produced models with correct inter-domain orientations (template modeling score (TM-score) >0.5) for 97% of cases and outperformed state-of-the-art methods. DEMO-EM was applied to the severe acute respiratory syndrome coronavirus 2 genome and generated models with average TM-score and root-mean-square deviation of 0.97 and 1.3 Å, respectively, with respect to the deposited structures. These results demonstrate an efficient pipeline that enables automated and reliable large-scale multi-domain protein structure modelling from cryo-electron microscopy maps. A protocol is developed to construct multi-domain protein structures from cryo-electron microscopy density maps. The results demonstrate the effectiveness of deep-learning-guided inter-domain structure assembly and refinement simulations.

...read moreread less

Journal Article•DOI•

Rotamer-free protein sequence design based on deep learning and self-consistency

[...]

Yufeng Liu, Lu Zhang, Weilun Wang, Ming Zhu, Chenchen Wang, Fudong Li, Jiahai Zhang, Houqiang Li, Quan Chen, Haiyan Liu - Show less +6 more

01 Feb 2022-Nature Computational Science

TL;DR: ABACUS-R as mentioned in this paper uses an encoder-decoder network trained using a multitask learning strategy to predict the sidechain type of a central residue from its 3D local environment, which includes, besides other features, the types but not the conformations of the surrounding sidechains.

...read moreread less

Abstract: Several previously proposed deep learning methods to design amino acid sequences that autonomously fold into a given protein backbone yielded promising results in computational tests but did not outperform conventional energy function-based methods in wet experiments. Here we present the ABACUS-R method, which uses an encoder–decoder network trained using a multitask learning strategy to predict the sidechain type of a central residue from its three-dimensional local environment, which includes, besides other features, the types but not the conformations of the surrounding sidechains. This eliminates the need to reconstruct and optimize sidechain structures, and drastically simplifies the sequence design process. Thus iteratively applying the encoder–decoder to different central residues is able to produce self-consistent overall sequences for a target backbone. Results of wet experiments, including five structures solved by X-ray crystallography, show that ABACUS-R outperforms state-of-the-art energy function-based methods in success rate and design precision. A deep learning method for protein sequence design on given backbones, ABACUS-R, is proposed in this study. ABACUS-R shows an improved performance when compared with conventional energy function-based methods in wet experiments.

...read moreread less

Journal Article•DOI•

A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data

[...]

Daifeng Wang¹•Institutions (1)

University of Wisconsin-Madison¹

31 Jan 2022-Nature Computational Science

TL;DR: DeepManReg as mentioned in this paper employs deep neural networks to learn cross-modal manifolds and then align multimodal features onto a common latent space, and uses crossmodal manifold graphs as a feature graph to regularize the classifiers for improving phenotype predictions.

...read moreread less

Abstract: The phenotypes of complex biological systems are fundamentally driven by various multi-scale mechanisms. Multi-modal data, such as single cell multi-omics data, enables a deeper understanding of underlying complex mechanisms across scales for phenotypes. We developed an interpretable regularized learning model, deepManReg, to predict phenotypes from multi-modal data. First, deepManReg employs deep neural networks to learn cross-modal manifolds and then to align multi-modal features onto a common latent space. Second, deepManReg uses cross-modal manifolds as a feature graph to regularize the classifiers for improving phenotype predictions and also for prioritizing the multi-modal features and cross-modal interactions for the phenotypes. We applied deepManReg to (1) an image dataset of handwritten digits with multi-features and (2) single cell multi-modal data (Patch-seq data) including transcriptomics and electrophysiology for neuronal cells in the mouse brain. We show that deepManReg improved phenotype prediction in both datasets, and also prioritized genes and electrophysiological features for the phenotypes of neuronal cells.

...read moreread less

Journal Article•DOI•

Symphonizing pileup and full-alignment for deep learning-based long-read variant calling

[...]

19 Dec 2022-Nature Computational Science

TL;DR: Clair3 as mentioned in this paper leverages two major method categories: pileup calling and full alignment to maximize precision and recall, and demonstrates improved performance, especially at lower coverage, especially for lower coverage.

...read moreread less

Abstract: Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs faster than any of the other state-of-the-art variant callers and demonstrates improved performance, especially at lower coverage.

...read moreread less

Journal Article•DOI•

Unifying structural descriptors for biological and bioinspired nanoscale complexes

[...]

Minjeong Cha, Emine Sumeyra Turali Emre, Xiongye Xiao, Ji-Young Kim, Paul Bogdan, J. Scott VanEpps, Angela Violi, Nicholas A. Kotov - Show less +4 more

01 Apr 2022-Nature Computational Science

TL;DR: In this article , chemical, geometrical and graph-theoretical descriptors for protein complexes are used to predict lock-and-key protein-nanoparticle pairs.

...read moreread less

Abstract: Biomimetic nanoparticles are known to serve as nanoscale adjuvants, enzyme mimics and amyloid fibrillation inhibitors. Their further development requires better understanding of their interactions with proteins. The abundant knowledge about protein–protein interactions can serve as a guide for designing protein–nanoparticle assemblies, but the chemical and biological inputs used in computational packages for protein–protein interactions are not applicable to inorganic nanoparticles. Analysing chemical, geometrical and graph-theoretical descriptors for protein complexes, we found that geometrical and graph-theoretical descriptors are uniformly applicable to biological and inorganic nanostructures and can predict interaction sites in protein pairs with accuracy >80% and classification probability ~90%. We extended the machine-learning algorithms trained on protein–protein interactions to inorganic nanoparticles and found a nearly exact match between experimental and predicted interaction sites with proteins. These findings can be extended to other organic and inorganic nanoparticles to predict their assemblies with biomolecules and other chemical structures forming lock-and-key complexes. Unified structural descriptors of geometrical and graph-theoretical features are developed, allowing knowledge about protein lock-and-key complexes to be utilized to predict the formation of and interaction sites in protein–nanoparticle pairs.

...read moreread less

Journal Article•DOI•

Solving the electronic Schrödinger equation for multiple nuclear geometries with weight-sharing deep neural networks

[...]

Rafael Reisenhofer¹•Institutions (1)

University of Bremen¹

19 May 2022-Nature Computational Science

TL;DR: In this article , the authors show that sharing the vast majority of weights across neural network models for different geometries substantially accelerates optimization and yields pre-trained models that require only a small number of additional optimization steps to obtain high-accuracy solutions for new geometry.

...read moreread less

Abstract: The Schrödinger equation describes the quantum-mechanical behaviour of particles, making it the most fundamental equation in chemistry. A solution for a given molecule allows computation of any of its properties. Finding accurate solutions for many different molecules and geometries is thus crucial to the discovery of new materials such as drugs or catalysts. Despite its importance, the Schrödinger equation is notoriously difficult to solve even for single molecules, as established methods scale exponentially with the number of particles. Combining Monte Carlo techniques with unsupervised optimization of neural networks was recently discovered as a promising approach to overcome this curse of dimensionality, but the corresponding methods do not exploit synergies that arise when considering multiple geometries. Here we show that sharing the vast majority of weights across neural network models for different geometries substantially accelerates optimization. Furthermore, weight-sharing yields pretrained models that require only a small number of additional optimization steps to obtain high-accuracy solutions for new geometries. Weight-sharing is used to accelerate and to effectively pretrain neural network-based variational Monte Carlo methods when solving the electronic Schrödinger equation for multiple geometries.

...read moreread less

Journal Article•DOI•

Quantum embedding theories to simulate condensed systems on quantum computers

[...]

Dominique Dias¹, Emily White Johansson¹•Institutions (1)

Argonne National Laboratory¹

25 Jul 2022-Nature Computational Science

TL;DR: In this paper , the challenges and opportunities of applying different embedding frameworks to calculate solid materials properties are discussed, with a focus on electronic structures of spin defects, and examples for a specific class of materials, that is, solid materials hosting spin defects.

...read moreread less

Abstract: Quantum computers hold promise to improve the efficiency of quantum simulations of materials and to enable the investigation of systems and properties that are more complex than tractable at present on classical architectures. Here, we discuss computational frameworks to carry out electronic structure calculations of solids on noisy intermediate-scale quantum computers using embedding theories, and we give examples for a specific class of materials, that is, solid materials hosting spin defects. These are promising systems to build future quantum technologies, such as quantum computers, quantum sensors and quantum communication devices. Although quantum simulations on quantum architectures are in their infancy, promising results for realistic systems appear to be within reach. Quantum embedding theory promises the simulation of realistic materials in quantum computers. In this Perspective, challenges and opportunities of applying different embedding frameworks to calculate solid materials properties are discussed, with a focus on electronic structures of spin defects.

...read moreread less

Journal Article•DOI•

Discovering and forecasting extreme events via active learning in neural operators

[...]

Ethan Pickering, George Em Karniadakis, Themistoklis P. Sapsis

05 Apr 2022-Nature Computational Science

TL;DR: This model-agnostic framework pairs a BED scheme that actively selects data for quantifying extreme events with an ensemble of DNOs that approximate inﬁnite-dimensional nonlinear operators to form the foundation of an AI-assisted experimental infrastructure that can infer and pinpoint critical situations across many domains.

...read moreread less

Abstract: Extreme events in society and nature, such as pandemic spikes, rogue waves, or structural failures, can have catastrophic consequences. Characterizing extremes is difficult as they occur rarely, arise from seemingly benign conditions, and belong to complex and often unknown infinite-dimensional systems. Such challenges render attempts at characterizing them as moot. We address each of these difficulties by combining novel training schemes in Bayesian experimental design (BED) with an ensemble of deep neural operators (DNOs). This model-agnostic framework pairs a BED scheme that actively selects data for quantifying extreme events with an ensemble of DNOs that approximate infinite-dimensional nonlinear operators. We find that not only does this framework clearly beat Gaussian processes (GPs) but that 1) shallow ensembles of just two members perform best; 2) extremes are uncovered regardless of the state of initial data (i.e. with or without extremes); 3) our method eliminates"double-descent"phenomena; 4) the use of batches of suboptimal acquisition points compared to step-by-step global optima does not hinder BED performance; and 5) Monte Carlo acquisition outperforms standard optimizers in high-dimensions. Together these conclusions form the foundation of an AI-assisted experimental infrastructure that can efficiently infer and pinpoint critical situations across many domains, from physical to societal systems.

...read moreread less

Journal Article•DOI•

Towards practical and robust DNA-based data archiving using the yin–yang codec system

[...]

I. V. Russkikh¹, Tian-ling Ren², Shahid M. Zakaullah³•Institutions (3)

National Center for Supercomputing Applications¹, Beijing Genomics Institute², Harvard University³

25 Apr 2022-Nature Computational Science

TL;DR: In this article , a robust bit-to-base transcoding algorithm named the yin-yang codec was proposed, using two rules to encode two binary bits into one nucleotide to generate DNA sequences that are highly compatible with synthesis and sequencing technologies.

...read moreread less

Abstract: Abstract DNA is a promising data storage medium due to its remarkable durability and space-efficient storage. Early bit-to-base transcoding schemes have primarily pursued information density, at the expense of introducing biocompatibility challenges or decoding failure. Here we propose a robust transcoding algorithm named the yin–yang codec, using two rules to encode two binary bits into one nucleotide, to generate DNA sequences that are highly compatible with synthesis and sequencing technologies. We encoded two representative file formats and stored them in vitro as 200 nt oligo pools and in vivo as a ~54 kbps DNA fragment in yeast cells. Sequencing results show that the yin–yang codec exhibits high robustness and reliability for a wide variety of data types, with an average recovery rate of 99.9% above 10 4 molecule copies and an achieved recovery rate of 87.53% at ≤10 2 copies. Additionally, the in vivo storage demonstration achieved an experimentally measured physical density close to the theoretical maximum.

...read moreread less

Journal Article•DOI•

Autonomous inference of complex network dynamics from incomplete and noisy data

[...]

Ting Gao, Gang Yan

09 Feb 2022-Nature Computational Science

TL;DR: In this paper , a two-phase approach for autonomous inference of complex network dynamics is proposed, and its effectiveness is demonstrated by the tests of inferring neuronal, genetic, social, and coupled oscillators dynamics on various synthetic and real networks.

...read moreread less

Abstract: The availability of empirical data that capture the structure and behavior of complex networked systems has been greatly increased in recent years, however a versatile computational toolbox for unveiling a complex system's nodal and interaction dynamics from data remains elusive. Here we develop a two-phase approach for autonomous inference of complex network dynamics, and its effectiveness is demonstrated by the tests of inferring neuronal, genetic, social, and coupled oscillators dynamics on various synthetic and real networks. Importantly, the approach is robust to incompleteness and noises, including low resolution, observational and dynamical noises, missing and spurious links, and dynamical heterogeneity. We apply the two-phase approach to inferring the early spreading dynamics of H1N1 flu upon the worldwide airline network, and the inferred dynamical equation can also capture the spread of SARS and COVID-19 diseases. These findings together offer an avenue to discover the hidden microscopic mechanisms of a broad array of real networked systems.

...read moreread less

Journal Article•DOI•

Deep-learning density functional theory Hamiltonian for efficient ab initio electronic-structure calculation

[...]

Karl-Christof Renz¹, Shengcan Ma²•Institutions (2)

State Key Laboratory of Chemical Engineering¹, Beijing Academy of Quantum Information Sciences²

23 Jun 2022-Nature Computational Science

TL;DR: In this paper , a deep neural network approach was developed to represent the DFT Hamiltonian (DeepH) of crystalline materials, aiming to bypass the computationally demanding self-consistent field iterations of DFT and substantially improve the efficiency of ab initio electronic-structure calculations.

...read moreread less

Abstract: The marriage of density functional theory (DFT) and deep-learning methods has the potential to revolutionize modern computational materials science. Here we develop a deep neural network approach to represent the DFT Hamiltonian (DeepH) of crystalline materials, aiming to bypass the computationally demanding self-consistent field iterations of DFT and substantially improve the efficiency of ab initio electronic-structure calculations. A general framework is proposed to deal with the large dimensionality and gauge (or rotation) covariance of the DFT Hamiltonian matrix by virtue of locality, and this is realized by a message-passing neural network for deep learning. High accuracy, high efficiency and good transferability of the DeepH method are generally demonstrated for various kinds of material system and physical property. The method provides a solution to the accuracy–efficiency dilemma of DFT and opens opportunities to explore large-scale material systems, as evidenced by a promising application in the study of twisted van der Waals materials. A deep neural network method is developed to learn the mapping function from atomic structure to density functional theory (DFT) Hamiltonian, which helps address the accuracy–efficiency dilemma of DFT and is useful for studying large-scale materials.

...read moreread less

Journal Article•DOI•

Dimensionally Consistent Learning with Buckingham Pi

[...]

Joseph Bakarji, Jared L. Callaham, Steven L. Brunton, J. Nathan Kutz

09 Feb 2022-Nature Computational Science

TL;DR: In this article , three machine learning methods are developed for discovering physically meaningful dimensionless groups and scaling parameters from data, with the Buckingham Pi theorem as a constraint, using the symmetric and self-similar structure of available measurement data to discover the dimensionless group that best collapses these data to a lower dimensional space according to an optimal fit.

...read moreread less

Abstract: In the absence of governing equations, dimensional analysis is a robust technique for extracting insights and finding symmetries in physical systems. Given measurement variables and parameters, the Buckingham Pi theorem provides a procedure for finding a set of dimensionless groups that spans the solution space, although this set is not unique. We propose an automated approach using the symmetric and self-similar structure of available measurement data to discover the dimensionless groups that best collapse these data to a lower dimensional space according to an optimal fit. We develop three data-driven techniques that use the Buckingham Pi theorem as a constraint: (1) a constrained optimization problem with a non-parametric input–output fitting function, (2) a deep learning algorithm (BuckiNet) that projects the input parameter space to a lower dimension in the first layer and (3) a technique based on sparse identification of nonlinear dynamics to discover dimensionless equations whose coefficients parameterize the dynamics. We explore the accuracy, robustness and computational complexity of these methods and show that they successfully identify dimensionless groups in three example problems: a bead on a rotating hoop, a laminar boundary layer and Rayleigh–Bénard convection. Three machine learning methods are developed for discovering physically meaningful dimensionless groups and scaling parameters from data, with the Buckingham Pi theorem as a constraint.

...read moreread less

Journal Article•DOI•

A quantum-inspired approach to exploit turbulence structures

[...]

13 Jan 2022-Nature Computational Science

TL;DR: In this article , the structure of turbulent flows is analyzed by quantifying correlations between different length scales using methods inspired from quantum many-body physics, and a tensor network-based structure-resolving algorithm is proposed to simulate turbulent flows.

...read moreread less

Abstract: Understanding turbulence is key to our comprehension of many natural and technological flow processes. At the heart of this phenomenon lies its intricate multiscale nature, describing the coupling between different-sized eddies in space and time. Here we analyze the structure of turbulent flows by quantifying correlations between different length scales using methods inspired from quantum many-body physics. We present the results for interscale correlations of two paradigmatic flow examples, and use these insights along with tensor network theory to design a structure-resolving algorithm for simulating turbulent flows. With this algorithm, we find that the incompressible Navier–Stokes equations can be accurately solved even when reducing the number of parameters required to represent the velocity field by more than one order of magnitude compared to direct numerical simulation. Our quantum-inspired approach provides a pathway towards conducting computational fluid dynamics on quantum computers. Tensor networks exploit the structure of turbulence to offer a compressed description of flows, which leads to efficient fluid simulation algorithms that can be implemented on both classical and quantum computers.

...read moreread less

Journal Article•DOI•

Generative aptamer discovery using RaptGen

[...]

Natsuki Iwano, Tatsuo Adachi, Kazuteru Aoki, Yoshikazu Nakamura, M. Hamada - Show less +1 more

01 Jun 2022-Nature Computational Science

TL;DR: RaptGen as mentioned in this paper is a variational autoencoder for in silico aptamer generation, which exploits a profile hidden Markov model decoder to represent motif sequences effectively.

...read moreread less

Abstract: Abstract Nucleic acid aptamers are generated by an in vitro molecular evolution method known as systematic evolution of ligands by exponential enrichment (SELEX). Various candidates are limited by actual sequencing data from an experiment. Here we developed RaptGen, which is a variational autoencoder for in silico aptamer generation. RaptGen exploits a profile hidden Markov model decoder to represent motif sequences effectively. We showed that RaptGen embedded simulation sequence data into low-dimensional latent space on the basis of motif information. We also performed sequence embedding using two independent SELEX datasets. RaptGen successfully generated aptamers from the latent space even though they were not included in high-throughput sequencing. RaptGen could also generate a truncated aptamer with a short learning model. We demonstrated that RaptGen could be applied to activity-guided aptamer generation according to Bayesian optimization. We concluded that a generative method by RaptGen and latent representation are useful for aptamer discovery.

...read moreread less

Journal Article•DOI•

Fast multi-source nanophotonic simulations using augmented partial factorization

[...]

Ho-Chun Lin, Zeyu Wang, Chia Wei Hsu

16 May 2022-Nature Computational Science

TL;DR: In this paper , the authors propose to bypass the full-basis solutions and directly compute the quantities of interest while also eliminating the repetition over inputs by augmenting the Maxwell operator with all the input source profiles and all the output projection profiles.

...read moreread less

Abstract: Abstract Numerical solutions of Maxwell’s equations are indispensable for nanophotonics and electromagnetics but are constrained when it comes to large systems, especially multi-channel ones such as disordered media, aperiodic metasurfaces and densely packed photonic circuits where the many inputs require many large-scale simulations. Conventionally, before extracting the quantities of interest, Maxwell’s equations are first solved on every element of a discretization basis set that contains much more information than is typically needed. Furthermore, such simulations are often performed one input at a time, which can be slow and repetitive. Here we propose to bypass the full-basis solutions and directly compute the quantities of interest while also eliminating the repetition over inputs. We do so by augmenting the Maxwell operator with all the input source profiles and all the output projection profiles, followed by a single partial factorization that yields the entire generalized scattering matrix via the Schur complement, with no approximation beyond discretization. This method applies to any linear partial differential equation. Benchmarks show that this approach is 1,000–30,000,000 times faster than existing methods for two-dimensional systems with about 10,000,000 variables. As examples, we demonstrate simulations of entangled photon backscattering from disorder and high-numerical-aperture metalenses that are thousands of wavelengths wide.

...read moreread less

Journal Article•DOI•

Single-sequence protein structure prediction using supervised transformer protein language models

[...]

Mehdi Boroujerdi¹•Institutions (1)

Guangdong University of Technology¹

19 Dec 2022-Nature Computational Science

TL;DR: Zhang et al. as discussed by the authors proposed trRosettaX-Single, an automated algorithm for single-sequence protein structure prediction, which incorporates the sequence embedding from a supervised transformer protein language model into a multi-scale network enhanced by knowledge distillation to predict interresidue two-dimensional geometry, which is then used to reconstruct three-dimensional structures via energy minimization.

...read moreread less

Abstract: Significant progress has been made in protein structure prediction in recent years. However, it remains challenging for AlphaFold2 and other deep learning-based methods to predict protein structure with single-sequence input. Here we introduce trRosettaX-Single, an automated algorithm for single-sequence protein structure prediction. It incorporates the sequence embedding from a supervised transformer protein language model into a multi-scale network enhanced by knowledge distillation to predict inter-residue two-dimensional geometry, which is then used to reconstruct three-dimensional structures via energy minimization. Benchmark tests show that trRosettaX-Single outperforms AlphaFold2 and RoseTTAFold on orphan proteins and works well on human-designed proteins (with an average template modeling score (TM-score) of 0.79). An experimental test shows that the full trRosettaX-Single pipeline is two times faster than AlphaFold2, using much fewer computing resources (<10%). On 2,000 designed proteins from network hallucination, trRosettaX-Single generates structure models with high confidence. As a demonstration, trRosettaX-Single is applied to missense mutation analysis. These data suggest that trRosettaX-Single may find potential applications in protein design and related studies. In this study, a supervised protein language model is proposed to predict protein structure from a single sequence. It achieves state-of-the-art accuracy on orphan proteins and is competitive with other methods on human-designed proteins.

...read moreread less

Journal Article•DOI•

Persistent spectral theory-guided protein engineering

[...]

Yuchi Qiu, Guo Wei

19 Dec 2022-Nature Computational Science

TL;DR: In this article , a topology-offered protein fitness (TopFit) framework was proposed to complement protein sequence and structure embeddings. But the proposed framework is limited to three-dimensional protein structures and their intricate geometric complexity hinders their applications in deep mutational screening.

...read moreread less

Abstract: Although protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, is constrained by experimental capacity, various machine learning models have substantially expedited protein engineering. Three-dimensional protein structures promise further advantages, but their intricate geometric complexity hinders their applications in deep mutational screening. Persistent homology, an established algebraic topology tool for protein structural complexity reduction, fails to capture the homotopic shape evolution during filtration of given data. Here we introduce a Topology-offered Protein Fitness (TopFit) framework to complement protein sequence and structure embeddings. Equipped with an ensemble regression strategy, TopFit integrates the persistent spectral theory, which is a new topological Laplacian, and two auxiliary sequence embeddings to capture mutation-induced topological invariant, shape evolution and sequence disparity in the protein fitness landscape. The performance of TopFit is assessed by 34 benchmark datasets with 128,634 variants, involving a vast variety of protein structure acquisition modalities and training set size variations. A topological data analysis-driven machine learning model for guiding protein engineering is proposed, complementing protein sequence and structure embeddings when navigating the fitness landscape.

...read moreread less

Collapse