scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification

01 Jun 2013-Advanced Data Analysis and Classification (Springer-Verlag)-Vol. 7, Iss: 2, pp 147-179
TL;DR: It is demonstrated in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.
Abstract: Given a row-stochastic matrix describing pairwise similarities between data objects, spectral clustering makes use of the eigenvectors of this matrix to perform dimensionality reduction for clustering in fewer dimensions. One example from this class of algorithms is the Robust Perron Cluster Analysis (PCCA+), which delivers a fuzzy clustering. Originally developed for clustering the state space of Markov chains, the method became popular as a versatile tool for general data classification problems. The robustness of PCCA+, however, cannot be explained by previous perturbation results, because the matrices in typical applications do not comply with the two main requirements: reversibility and nearly decomposability. We therefore demonstrate in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.
Citations
More filters
Journal ArticleDOI
TL;DR: The open-source Python package PyEMMA is presented, derived a systematic and accurate way to coarse-grain MSMs to few states and to illustrate the structures of the metastable states of the system.
Abstract: Markov (state) models (MSMs) and related models of molecular kinetics have recently received a surge of interest as they can systematically reconcile simulation data from either a few long or many short simulations and allow us to analyze the essential metastable structures, thermodynamics, and kinetics of the molecular system under investigation. However, the estimation, validation, and analysis of such models is far from trivial and involves sophisticated and often numerically sensitive methods. In this work we present the open-source Python package PyEMMA (http://pyemma.org) that provides accurate and efficient algorithms for kinetic model construction. PyEMMA can read all common molecular dynamics data formats, helps in the selection of input features, provides easy access to dimension reduction algorithms such as principal component analysis (PCA) and time-lagged independent component analysis (TICA) and clustering algorithms such as k-means, and contains estimators for MSMs, hidden Markov models, an...

809 citations

Journal ArticleDOI
TL;DR: A deep learning framework that automates construction of Markov state models from MD simulation data is introduced that performs equally or better than state-of-the-art Markov modeling methods and provides easily interpretable few-state kinetic models.
Abstract: There is an increasing demand for computing the relevant structures, equilibria, and long-timescale kinetics of biomolecular processes, such as protein-drug binding, from high-throughput molecular dynamics simulations. Current methods employ transformation of simulated coordinates into structural features, dimension reduction, clustering the dimension-reduced data, and estimation of a Markov state model or related model of the interconversion rates between molecular structures. This handcrafted approach demands a substantial amount of modeling expertise, as poor decisions at any step will lead to large modeling errors. Here we employ the variational approach for Markov processes (VAMP) to develop a deep learning framework for molecular kinetics using neural networks, dubbed VAMPnets. A VAMPnet encodes the entire mapping from molecular coordinates to Markov states, thus combining the whole data processing pipeline in a single end-to-end framework. Our method performs equally or better than state-of-the-art Markov modeling methods and provides easily interpretable few-state kinetic models.

474 citations


Cites background from "Fuzzy spectral clustering by PCCA+:..."

  • ...The present results do not depend on enforcing reversibility, as classical analyses such as PCCA+ [63] are avoided based on the VAMPnet structure itself....

    [...]

  • ...Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification....

    [...]

Journal ArticleDOI
TL;DR: Recent ML methods for molecular simulation are reviewed, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics.
Abstract: Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.

379 citations


Cites methods from "Fuzzy spectral clustering by PCCA+:..."

  • ...In order to obtain a propagator that can be interpreted as a Markov state model, [21] chose to use a SoftMax layer as an output layer, thus transforming the spectral representation to a soft indicator function similar to spectral clustering methods such as PCCA+ [123, 124]....

    [...]

Journal ArticleDOI
TL;DR: An in-depth, evaluatory coverage of the most fundamental methodological challenges that set the basis for the future development of the field, in particular, the current developments and inherent physical limitations of the atomistic force fields and the recent advances in a broad spectrum of enhanced sampling methods are covered.
Abstract: With both catalytic and genetic functions, ribonucleic acid (RNA) is perhaps the most pluripotent chemical species in molecular biology, and its functions are intimately linked to its structure and dynamics. Computer simulations, and in particular atomistic molecular dynamics (MD), allow structural dynamics of biomolecular systems to be investigated with unprecedented temporal and spatial resolution. We here provide a comprehensive overview of the fast-developing field of MD simulations of RNA molecules. We begin with an in-depth, evaluatory coverage of the most fundamental methodological challenges that set the basis for the future development of the field, in particular, the current developments and inherent physical limitations of the atomistic force fields and the recent advances in a broad spectrum of enhanced sampling methods. We also survey the closely related field of coarse-grained modeling of RNA systems. After dealing with the methodological aspects, we provide an exhaustive overview of the ava...

375 citations

Journal ArticleDOI
TL;DR: These wild-type simulations explore a space of conformations that can be individually stabilized by adding ligands or making suitable changes in protein sequence, and provide direct evidence of conformational plasticity in receptors.
Abstract: Conformational plasticity influences several aspects of protein function. Here the authors combine extensive MD simulations with Markov state models—using trypsin as model—to reveal new mechanistic details of how conformational plasticity influence ligand-receptors interactions.

352 citations

References
More filters
Book
01 Jan 1966
TL;DR: The monograph by T Kato as discussed by the authors is an excellent reference work in the theory of linear operators in Banach and Hilbert spaces and is a thoroughly worthwhile reference work both for graduate students in functional analysis as well as for researchers in perturbation, spectral, and scattering theory.
Abstract: "The monograph by T Kato is an excellent textbook in the theory of linear operators in Banach and Hilbert spaces It is a thoroughly worthwhile reference work both for graduate students in functional analysis as well as for researchers in perturbation, spectral, and scattering theory In chapters 1, 3, 5 operators in finite-dimensional vector spaces, Banach spaces and Hilbert spaces are introduced Stability and perturbation theory are studied in finite-dimensional spaces (chapter 2) and in Banach spaces (chapter 4) Sesquilinear forms in Hilbert spaces are considered in detail (chapter 6), analytic and asymptotic perturbation theory is described (chapter 7 and 8) The fundamentals of semigroup theory are given in chapter 9 The supplementary notes appearing in the second edition of the book gave mainly additional information concerning scattering theory described in chapter 10 The first edition is now 30 years old The revised edition is 20 years old Nevertheless it is a standard textbook for the theory of linear operators It is user-friendly in the sense that any sought after definitions, theorems or proofs may be easily located In the last two decades much progress has been made in understanding some of the topics dealt with in the book, for instance in semigroup and scattering theory However the book has such a high didactical and scientific standard that I can recomment it for any mathematician or physicist interested in this field Zentralblatt MATH, 836

19,846 citations

Journal ArticleDOI
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We applied this approach to segmenting static images, as well as motion sequences, and found the results to be very encouraging.

13,789 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

9,141 citations

Proceedings Article
03 Jan 2001
TL;DR: A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.
Abstract: Despite many empirical successes of spectral clustering methods— algorithms that cluster points using eigenvectors of matrices derived from the data—there are several unresolved issues. First. there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems.

9,043 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a geometrically motivated algorithm for representing high-dimensional data, based on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold and the connections to the heat equation.
Abstract: One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a representation for data lying on a low-dimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for representing the high-dimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.

7,210 citations