scispace - formally typeset
Open accessJournal ArticleDOI: 10.1021/ACS.JCTC.0C00981

Discovering Collective Variables of Molecular Transitions via Genetic Algorithms and Neural Networks

04 Mar 2021-Journal of Chemical Theory and Computation (American Chemical Society (ACS))-Vol. 17, Iss: 4, pp 2294-2306
Abstract: With the continual improvement of computing hardware and algorithms, simulations have become a powerful tool for understanding all sorts of (bio)molecular processes. To handle the large simulation data sets and to accelerate slow, activated transitions, a condensed set of descriptors, or collective variables (CVs), is needed to discern the relevant dynamics that describes the molecular process of interest. However, proposing an adequate set of CVs that can capture the intrinsic reaction coordinate of the molecular transition is often extremely difficult. Here, we present a framework to find an optimal set of CVs from a pool of candidates using a combination of artificial neural networks and genetic algorithms. The approach effectively replaces the encoder of an autoencoder network with genes to represent the latent space, i.e., the CVs. Given a selection of CVs as input, the network is trained to recover the atom coordinates underlying the CV values at points along the transition. The network performance is used as an estimator of the fitness of the input CVs. Two genetic algorithms optimize the CV selection and the neural network architecture. The successful retrieval of optimal CVs by this framework is illustrated at the hand of two case studies: the well-known conformational change in the alanine dipeptide molecule and the more intricate transition of a base pair in B-DNA from the classic Watson-Crick pairing to the alternative Hoogsteen pairing. Key advantages of our framework include the following: optimal interpretable CVs, avoiding costly calculation of committor or time-correlation functions, and automatic hyperparameter optimization. In addition, we show that applying a time-delay between the network input and output allows for enhanced selection of slow variables. Moreover, the network can also be used to generate molecular configurations of unexplored microstates, for example, for augmentation of the simulation data.

... read more

Citations
  More

6 results found


Open accessJournal ArticleDOI: 10.1021/ACS.JCTC.1C00458
Abstract: We propose to analyze molecular dynamics (MD) output via a supervised machine learning (ML) algorithm, the decision tree. The approach aims to identify the predominant geometric features which correlate with trajectories that transition between two arbitrarily defined states. The data-driven algorithm aims to identify these features without the bias of human "chemical intuition". We demonstrate the method by analyzing the proton exchange reactions in formic acid solvated in small water clusters. The simulations were performed with ab initio MD combined with a method to efficiently sample the rare event, path sampling. Our ML analysis identified relevant geometric variables involved in the proton transfer reaction and how they may change as the number of solvating water molecules changes.

... read more

1 Citations


Open accessPosted Content
Abstract: We propose a supervised machine learning algorithm, decision trees, to analyze molecular dynamics output. The approach aims to identify the predominant geometric features which correlate with trajectories that transition between two arbitrarily defined states. The data-based algorithm aims to identify such features in an approach which is unbiased by human "chemical intuition". We demonstrate the method by analyzing proton exchange reactions in formic acid (FA) solvated in small water clusters. The simulations were performed with ab initio molecular dynamics combined with a method for generating rare events, specifically path sampling. Our machine learning analysis identified mechanistic descriptions of the proton transfer reaction for the different water clusters.

... read more


Journal ArticleDOI: 10.1016/J.SBI.2021.08.007
Ali Rana Atilgan1, Canan Atilgan1Institutions (1)
Abstract: Protein function is constrained by the three-dimensional structure but is delineated by its dynamics. This framework must satisfy specificity of function along with adaptability to changing environments and evolvability under external constraints. The accessibility of the available regions of the energy landscape for a set of conditions and shifts in the populations upon their modulation have effects propagating across scales, from biomolecular interactions, to organisms, to populations. Developing the ability to detect and juggle protein conformations supplemented by a physics-based understanding has implications for not only in vivo problems but also for resistance impeding drug discovery and bionano-sensor design.

... read more

Topics: Evolvability (52%), Energy landscape (50%)

Open accessJournal ArticleDOI: 10.1140/EPJB/S10051-021-00233-5
Jutta Rogal1, Jutta Rogal2Institutions (2)
Abstract: In molecular simulations, the identification of suitable reaction coordinates is central to both the analysis and sampling of transitions between metastable states in complex systems. If sufficient simulation data are available, a number of methods have been developed to reduce the vast amount of high-dimensional data to a small number of essential degrees of freedom representing the reaction coordinate. Likewise, if the reaction coordinate is known, a variety of approaches have been proposed to enhance the sampling along the important degrees of freedom. Often, however, neither one nor the other is available. One of the key questions is therefore, how to construct reaction coordinates and evaluate their validity. Another challenges arises from the physical interpretation of reaction coordinates, which is often addressed by correlating physically meaningful parameters with conceptually well-defined but abstract reaction coordinates. Furthermore, machine learning based methods are becoming more and more applicable also to the reaction coordinate problem. This perspective highlights central aspects in the identification and evaluation of reaction coordinates and discusses recent ideas regarding automated computational frameworks to combine the optimization of reaction coordinates and enhanced sampling.

... read more

Topics: Degrees of freedom (58%), Reaction coordinate (56%)

Journal ArticleDOI: 10.1021/ACS.JCTC.1C00497
Elena Kolodzeiski1, Saeed Amirjalayer1Institutions (1)
Abstract: Mechanically interlocked molecules have gained significant attention because of their unique ability to perform well-defined motions originating from their entanglement, which is important for the design of artificial molecular machines. Atomistic simulations based on force fields (FFs) provide detailed insights into such architectures at the molecular level enabling one to predict the resulting functionalities. However, the development of reliable FFs is still challenging and time-consuming, in particular for highly dynamic and interlocked structures such as rotaxanes, which exhibit a large number of different conformers. In the present work, we present an on-the-fly training (OTFT) algorithm. By a guided and nonguided phase space sampling, relevant reference data are automatically and continuously generated and included for the on-the-fly parametrization of the FF based on a population swapping genetic algorithm (psGA). The OTFT approach provides a fast and automated FF parametrization scheme and tackles problems caused by missing phase space information or the need for big data. We demonstrate the high accuracy of the developed FF for flexible molecules with respect to equilibrium and out-of-equilibrium properties. Finally, by applying the ab initio parametrized FF, molecular dynamic simulations were performed up to experimentally relevant time scales (ca. 1 μs) enabling capture in detail of the structural evaluation and mapping out of the free-energy topology. The on-the-fly training approach thus provides a strong foundation toward automated FF developments and large-scale investigations of phenomena in and out of thermal equilibrium.

... read more

Topics: Population (51%), Molecular machine (51%)

References
  More

64 results found


Journal ArticleDOI: 10.1063/1.470117
Ulrich Essmann1, Lalith Perera1, Max L. Berkowitz, Tom Darden2  +2 moreInstitutions (2)
Abstract: The previously developed particle mesh Ewald method is reformulated in terms of efficient B‐spline interpolation of the structure factors This reformulation allows a natural extension of the method to potentials of the form 1/rp with p≥1 Furthermore, efficient calculation of the virial tensor follows Use of B‐splines in place of Lagrange interpolation leads to analytic gradients as well as a significant improvement in the accuracy We demonstrate that arbitrary accuracy can be achieved, independent of system size N, at a cost that scales as N log(N) For biomolecular systems with many thousands of atoms this method permits the use of Ewald summation at a computational cost comparable to that of a simple truncation method of 10 A or less

... read more

Topics: P3M (68%), Ewald summation (66%), Particle Mesh (56%) ... show more

15,288 Citations


Journal ArticleDOI: 10.1037/H0042519
Abstract: The first of these questions is in the province of sensory physiology, and is the only one for which appreciable understanding has been achieved. This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory. With regard to the second question, two alternative positions have been maintained. The first suggests that storage of sensory information is in the form of coded representations or images, with some sort of one-to-one mapping between the sensory stimulus

... read more

Topics: Perceptron (59%), Artificial neural network (57%), Kernel perceptron (57%) ... show more

7,401 Citations


Journal ArticleDOI: 10.1016/0021-9991(77)90121-8
G.M. Torrie1, John P. Valleau1Institutions (1)
Abstract: The free energy difference between a model system and some reference system can easily be written as an ensemble average, but the conventional Monte Carlo methods of obtaining such averages are inadequate for the free-energy case. That is because the Boltzmann-weighted sampling distribution ordinarily used is extremely inefficient for the purpose. This paper describes the use of arbitrary sampling distributions chosen to facilitate such estimates. The methods have been tested successfully on the Lennard-Jones system over a wide range of temperature and density, including the gas-liquid coexistence region, and are found to be extremely powerful and economical.

... read more

Topics: Importance sampling (64%), Slice sampling (63%), Monte Carlo integration (62%) ... show more

4,439 Citations


Open accessJournal ArticleDOI: 10.1073/PNAS.202427399
Abstract: We introduce a powerful method for exploring the properties of the multidimensional free energy surfaces (FESs) of complex many-body systems by means of coarse-grained non-Markovian dynamics in the space defined by a few collective coordinates. A characteristic feature of these dynamics is the presence of a history-dependent potential term that, in time, fills the minima in the FES, allowing the efficient exploration and accurate determination of the FES as a function of the collective coordinates. We demonstrate the usefulness of this approach in the case of the dissociation of a NaCl molecule in water and in the study of the conformational changes of a dialanine in solution.

... read more

Topics: Metadynamics (53%), Maxima and minima (51%)

3,998 Citations


Open accessJournal ArticleDOI: 10.1103/PHYSREVLETT.78.2690
Christopher Jarzynski1Institutions (1)
Abstract: An expression is derived for the equilibrium free energy difference between two configurations of a system, in terms of an ensemble of finite-time measurements of the work performed in parametrically switching from one configuration to the other. Two well-known identities emerge as limiting cases of this result.

... read more

Topics: Jarzynski equality (56%), Bennett acceptance ratio (52%), Work (thermodynamics) (51%) ... show more

3,989 Citations