scispace - formally typeset
Search or ask a question

Showing papers by "Guo-Wei Wei published in 2018"


Journal ArticleDOI
TL;DR: The Adaptive Poisson-Boltzmann Solver (APBS) as mentioned in this paper was developed to solve the equations of continuum electrostatics for large biomolecular assemblages that have provided impact in the study of a broad range of chemical, biological and biomedical applications.
Abstract: The Adaptive Poisson-Boltzmann Solver (APBS) software was developed to solve the equations of continuum electrostatics for large biomolecular assemblages that have provided impact in the study of a broad range of chemical, biological, and biomedical applications. APBS addresses the three key technology challenges for understanding solvation and electrostatics in biomedical applications: accurate and efficient models for biomolecular solvation and electrostatics, robust and scalable software for applying those theories to biomolecular systems, and mechanisms for sharing and analyzing biomolecular electrostatics data in the scientific community. To address new research applications and advancing computational capabilities, we have continually updated APBS and its suite of accompanying software since its release in 2001. In this article, we discuss the models and capabilities that have recently been implemented within the APBS software package including a Poisson-Boltzmann analytical and a semi-analytical solver, an optimized boundary element solver, a geometry-based geometric flow solvation model, a graph theory-based algorithm for determining pKa values, and an improved web-based visualization tool for viewing electrostatics.

541 citations


Journal ArticleDOI
TL;DR: In this paper, a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homologies, and electrostatic persistence, are introduced for representation, characterization, and description of small molecules and biomolecular complexes.
Abstract: This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.

166 citations


Journal ArticleDOI
TL;DR: The present approach reveals that protein‐ligand hydrophobic interactions are extended to 40Å away from the binding site, which has a significant ramification to drug and protein design.
Abstract: Summary Protein-ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, gene expression, etc. Accurate prediction of protein-ligand binding affinities is vital to rational drug design and the understanding of protein-ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on two large data sets indicate that the proposed topology based machine learning paradigm outperforms other existing methods in protein-ligand binding affinity predictions. ESPH reveals protein-ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein-ligand hydrophobic interactions are extended to 40A away from the binding site, which has a significant ramification to drug and protein design. This article is protected by copyright. All rights reserved.

133 citations


Journal ArticleDOI
TL;DR: This work introduces element specific persistent homology (ESPH), an algebraic topology approach, for quantitative toxicity prediction, and a topology based multitask strategy to take the advantage of the availability of large data sets while dealing with small data sets.
Abstract: The understanding of toxicity is of paramount importance to human health and environmental protection. Quantitative toxicity analysis has become a new standard in the field. This work introduces element specific persistent homology (ESPH), an algebraic topology approach, for quantitative toxicity prediction. ESPH retains crucial chemical information during the topological abstraction of geometric complexity and provides a representation of small molecules that cannot be obtained by any other method. To investigate the representability and predictive power of ESPH for small molecules, ancillary descriptors have also been developed based on physical models. Topological and physical descriptors are paired with advanced machine learning algorithms, such as the deep neural network (DNN), random forest (RF), and gradient boosting decision tree (GBDT), to facilitate their applications to quantitative toxicity predictions. A topology based multitask strategy is proposed to take the advantage of the availability o...

108 citations


Journal ArticleDOI
TL;DR: In this article, an algebraic topology-based method, called element-specific persistent homology (ESPH), is introduced to describe molecular properties in terms of multiscale and multicomponent topological invariants.
Abstract: Aqueous solubility and partition coefficient are important physical properties of small molecules. Accurate theoretical prediction of aqueous solubility and partition coefficient plays an important role in drug design and discovery. The prediction accuracy depends crucially on molecular descriptors which are typically derived from a theoretical understanding of the chemistry and physics of small molecules. This work introduces an algebraic topology-based method, called element-specific persistent homology (ESPH), as a new representation of small molecules that is entirely different from conventional chemical and/or physical representations. ESPH describes molecular properties in terms of multiscale and multicomponent topological invariants. Such topological representation is systematical, comprehensive, and scalable with respect to molecular size and composition variations. However, it cannot be literally translated into a physical interpretation. Fortunately, it is readily suitable for machine learning methods, rendering topological learning algorithms. Due to the inherent correlation between solubility and partition coefficient, a uniform ESPH representation is developed for both properties, which facilitates multi-task deep neural networks for their simultaneous predictions. This strategy leads to a more accurate prediction of relatively small datasets. A total of six datasets is considered in this work to validate the proposed topological and multitask deep learning approaches. It is demonstrated that the proposed approaches achieve some of the most accurate predictions of aqueous solubility and partition coefficient. Our software is available online at http://weilab.math.msu.edu/TopP-S/. © 2018 Wiley Periodicals, Inc.

57 citations


Journal ArticleDOI
TL;DR: A paradigm-shifting geometric graph model, multiscale weighted colored graph (MWCG), is introduced to provide a new generation of computational algorithms to significantly change the current status of protein structural fluctuation analysis and provides perhaps the first reliable method for estimating protein flexibility and B-factors.
Abstract: Protein structural fluctuation, measured by Debye-Waller factors or B-factors, is known to correlate to protein flexibility and function A variety of methods has been developed for protein Debye-Waller factor prediction and related applications to domain separation, docking pose ranking, entropy calculation, hinge detection, stability analysis, etc Nevertheless, none of the current methodologies are able to deliver an accuracy of 07 in terms of the Pearson correlation coefficients averaged over a large set of proteins In this work, we introduce a paradigm-shifting geometric graph model, multiscale weighted colored graph (MWCG), to provide a new generation of computational algorithms to significantly change the current status of protein structural fluctuation analysis Our MWCG model divides a protein graph into multiple subgraphs based on interaction types between graph nodes and represents the protein rigidity by generalized centralities of subgraphs MWCGs not only predict the B-factors of protein r

37 citations


Journal ArticleDOI
TL;DR: The proposed FFT model has been extensively validated via a large dataset of 668 molecules and given an optimal root‐mean‐square error (RMSE) of 1.05 kcal/mol and was carefully compared with a classic solvation model based on weighted solvent accessible surface area.
Abstract: Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break this ad hoc division. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature-function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson-Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave-one-out test gives an optimal root-mean-square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94 molecules and its associated training set, the present approach was carefully compared with a classic solvation model based on weighted solvent accessible surface area. © 2017 Wiley Periodicals, Inc.

29 citations


Journal ArticleDOI
TL;DR: Extensive numerical tests indicate that the proposed method not only provides a unique description of pocket‐sub‐pocket relations, but also offers efficient estimations of pocket surface area, pocket volume and pocket depth.
Abstract: Motivation Protein pocket information is invaluable for drug target identification, agonist design, virtual screening and receptor-ligand binding analysis. A recent study indicates that about half holoproteins can simultaneously bind multiple interacting ligands in a large pocket containing structured sub-pockets. Although this hierarchical pocket and sub-pocket structure has a significant impact to multi-ligand synergistic interactions in the protein binding site, there is no method available for this analysis. This work introduces a computational tool based on differential geometry, algebraic topology and physics-based simulation to address this pressing issue. Results We propose to detect protein pockets by evolving the convex hull surface inwards until it touches the protein surface everywhere. The governing partial differential equations (PDEs) include the mean curvature flow combined with the eikonal equation commonly used in the fast marching algorithm in the Eulerian representation. The surface evolution induced Morse function and Reeb graph are utilized to characterize the hierarchical pocket and sub-pocket structure in controllable detail. The proposed method is validated on PDBbind refined sets of 4414 protein-ligand complexes. Extensive numerical tests indicate that the proposed method not only provides a unique description of pocket-sub-pocket relations, but also offers efficient estimations of pocket surface area, pocket volume and pocket depth. Availability and implementation Source code available at https://github.com/rdzhao/ProteinPocketDetection. Webserver available at http://weilab.math.msu.edu/PPD/.

28 citations


Journal ArticleDOI
TL;DR: Current perspectives in the biomechanics community for the sharing of computational models and related resources indicate that the community recognizes the necessity and usefulness of model sharing.
Abstract: The role of computational modeling for biomechanics research and related clinical care will be increasingly prominent. The biomechanics community has been developing computational models routinely for exploration of the mechanics and mechanobiology of diverse biological structures. As a result, a large array of models, data, and discipline-specific simulation software has emerged to support endeavors in computational biomechanics. Sharing computational models and related data and simulation software has first become a utilitarian interest, and now, it is a necessity. Exchange of models, in support of knowledge exchange provided by scholarly publishing, has important implications. Specifically, model sharing can facilitate assessment of reproducibility in computational biomechanics and can provide an opportunity for repurposing and reuse, and a venue for medical training. The community's desire to investigate biological and biomechanical phenomena crossing multiple systems, scales, and physical domains, also motivates sharing of modeling resources as blending of models developed by domain experts will be a required step for comprehensive simulation studies as well as the enhancement of their rigor and reproducibility. The goal of this paper is to understand current perspectives in the biomechanics community for the sharing of computational models and related resources. Opinions on opportunities, challenges, and pathways to model sharing, particularly as part of the scholarly publishing workflow, were sought. A group of journal editors and a handful of investigators active in computational biomechanics were approached to collect short opinion pieces as a part of a larger effort of the IEEE EMBS Computational Biology and the Physiome Technical Committee to address model reproducibility through publications. A synthesis of these opinion pieces indicates that the community recognizes the necessity and usefulness of model sharing. There is a strong will to facilitate model sharing, and there are corresponding initiatives by the scientific journals. Outside the publishing enterprise, infrastructure to facilitate model sharing in biomechanics exists, and simulation software developers are interested in accommodating the community's needs for sharing of modeling resources. Encouragement for the use of standardized markups, concerns related to quality assurance, acknowledgement of increased burden, and importance of stewardship of resources are noted. In the short-term, it is advisable that the community builds upon recent strategies and experiments with new pathways for continued demonstration of model sharing, its promotion, and its utility. Nonetheless, the need for a long-term strategy to unify approaches in sharing computational models and related resources is acknowledged. Development of a sustainable platform supported by a culture of open model sharing will likely evolve through continued and inclusive discussions bringing all stakeholders at the table, e.g., by possibly establishing a consortium.

17 citations


Journal ArticleDOI
TL;DR: In this article, multiscale weighted colored graphs (MWCGs) are used to predict the Debye-Waller factor of unknown proteins from protein X-ray crystallography.
Abstract: The Debye-Waller factor, a measure of X-ray attenuation, can be experimentally observed in protein X-ray crystallography. Previous theoretical models have made strong inroads in the analysis of beta (B)-factors by linearly fitting protein B-factors from experimental data. However, the blind prediction of B-factors for unknown proteins is an unsolved problem. This work integrates machine learning and advanced graph theory, namely, multiscale weighted colored graphs (MWCGs), to blindly predict B-factors of unknown proteins. MWCGs are local features that measure the intrinsic flexibility due to a protein structure. Global features that connect the B-factors of different proteins, e.g., the resolution of X-ray crystallography, are introduced to enable the cross-protein B-factor predictions. Several machine learning approaches, including ensemble methods and deep learning, are considered in the present work. The proposed method is validated with hundreds of thousands of experimental B-factors. Extensive numerical results indicate that the blind B-factor predictions obtained from the present method are more accurate than the least squares fittings using traditional methods.

16 citations


Journal ArticleDOI
01 Jan 2018
TL;DR: Mutational analysis of the bridge helix indicates that 778-GARKGL-783 (Escherichia coli numbering) is a homeostatic hinge that undergoes multiple bends to compensate for complex conformational dynamics during phosphodiester bond formation and translocation.
Abstract: Based on molecular dynamics simulations and functional studies, a conformational mechanism is posited for forward translocation by RNA polymerase (RNAP). In a simulation of a ternary elongation complex, the clamp and downstream cleft were observed to close. Hinges within the bridge helix and trigger loop supported generation of translocation force against the RNA-DNA hybrid resulting in opening of the furthest upstream i-8 RNA-DNA bp, establishing conditions for RNAP sliding. The β flap tip helix and the most N-terminal β' Zn finger engage the RNA, indicating a path of RNA threading out of the exit channel. Because the β flap tip connects to the RNAP active site through the β subunit double-Ψ-β-barrel and the associated sandwich barrel hybrid motif (also called the flap domain), the RNAP active site is coupled to the RNA exit channel and to the translocation of RNA-DNA. Using an exonuclease III assay to monitor translocation of RNAP elongation complexes, we show that K+ and Mg2+ and also an RNA 3'-OH or a 3'-H2 affect RNAP sliding. Because RNAP grip to template suggests a sticky translocation mechanism, and because grip is enhanced by increasing K+ and Mg2+concentration, biochemical assays are consistent with a conformational change that drives forward translocation as observed in simulations. Mutational analysis of the bridge helix indicates that 778-GARKGL-783 (Escherichia coli numbering) is a homeostatic hinge that undergoes multiple bends to compensate for complex conformational dynamics during phosphodiester bond formation and translocation.

Journal ArticleDOI
TL;DR: Several machine learning approaches, including ensemble methods and deep learning, are considered, and the proposed method is validated with hundreds of thousands of experimental B-factors, indicating that the blind B-factor predictions obtained are more accurate than the least squares fittings using traditional methods.
Abstract: Debye-Waller factor, a measure of X-ray attenuation, can be experimentally observed in protein X-ray crystallography. Previous theoretical models have made strong inroads in the analysis of B-factors by linearly fitting protein B-factors from experimental data. However, the blind prediction of B-factors for unknown proteins is an unsolved problem. This work integrates machine learning and advanced graph theory, namely, multiscale weighted colored graphs (MWCGs), to blindly predict B-factors of unknown proteins. MWCGs are local features that measure the intrinsic flexibility due to a protein structure. Global features that connect the B-factors of different proteins, e.g., the resolution of X-ray crystallography, are introduced to enable the cross-protein B-factor predictions. Several machine learning approaches, including ensemble methods and deep learning, are considered in the present work. The proposed method is validated with hundreds of thousands of experimental B-factors. Extensive numerical results indicate that the blind B-factor predictions obtained from the present method are more accurate than the least squares fittings using traditional methods.

Posted Content
TL;DR: In this paper, a mathematical framework based on persistent cohomology is proposed for characterizing the topology of a data set at various geometric scales, which can capture the multiscale geometric features and reveal certain interaction patterns in terms of topological invariants.
Abstract: Persistent homology is a powerful tool for characterizing the topology of a data set at various geometric scales. When applied to the description of molecular structures, persistent homology can capture the multiscale geometric features and reveal certain interaction patterns in terms of topological invariants. However, in addition to the geometric information, there is a wide variety of non-geometric information of molecular structures, such as element types, atomic partial charges, atomic pairwise interactions, and electrostatic potential function, that is not described by persistent homology. Although element specific homology and electrostatic persistent homology can encode some non-geometric information into geometry based topological invariants, it is desirable to have a mathematical framework to systematically embed both geometric and non-geometric information, i.e., multicomponent heterogeneous information, into unified topological descriptions. To this end, we propose a mathematical framework based on persistent cohomology. In our framework, non-geometric information can be either distributed globally or resided locally on the datasets in the geometric sense and can be properly defined on topological spaces, i.e., simplicial complexes. Using the proposed persistent cohomology based framework, enriched barcodes are extracted from datasets to represent heterogeneous information. We consider a variety of datasets to validate the present formulation and illustrate the usefulness of the proposed persistent cohomology. It is found that the proposed framework using cohomology boosts the performance of persistent homology based methods in the protein-ligand binding affinity prediction on massive biomolecular datasets.

Journal ArticleDOI
TL;DR: In this paper, an out-of-core and parallel algorithm is proposed to improve the spatial and temporal efficiency of the Eulerian solvent excluded surface (ESES) software, which is necessary for simulating many biomolecular electrostatic and ion channel models.
Abstract: Motivation Surface generation and visualization are some of the most important tasks in biomolecular modeling and computation. Eulerian solvent excluded surface (ESES) software provides analytical solvent excluded surface (SES) in the Cartesian grid, which is necessary for simulating many biomolecular electrostatic and ion channel models. However, large biomolecules and/or fine grid resolutions give rise to excessively large memory requirements in ESES construction. We introduce an out-of-core and parallel algorithm to improve the ESES software. Results The present approach drastically improves the spatial and temporal efficiency of ESES. The memory footprint and time complexity are analyzed and empirically verified through extensive tests with a large collection of biomolecule examples. Our results show that our algorithm can successfully reduce memory footprint through a straightforward divide-and-conquer strategy to perform the calculation of arbitrarily large proteins on a typical commodity personal computer. On multi-core computers or clusters, our algorithm can reduce the execution time by parallelizing most of the calculation as disjoint subproblems. Various comparisons with the state-of-the-art Cartesian grid based SES calculation were done to validate the present method and show the improved efficiency. This approach makes ESES a robust software for the construction of analytical solvent excluded surfaces. Availability and implementation http://weilab.math.msu.edu/ESES.

Posted Content
TL;DR: A new filtration function is proposed for persistent homology which takes as input the adjacent oscillator trajectories of a dynamical system, and is applied to protein residue networks for protein thermal fluctuation analysis, rendering the most accurate B-factor prediction of a set of 364 proteins.
Abstract: Time dependence is a universal phenomenon in nature, and a variety of mathematical models in terms of dynamical systems have been developed to understand the time-dependent behavior of real-world problems. Originally constructed to analyze the topological persistence over spatial scales, persistent homology has rarely been devised for time evolution. We propose the use of a new filtration function for persistent homology which takes as input the adjacent oscillator trajectories of a dynamical system. We also regulate the dynamical system by a weighted graph Laplacian matrix derived from the network of interest, which embeds the topological connectivity of the network into the dynamical system. The resulting topological signatures, which we call evolutionary homology (EH) barcodes, reveal the topology-function relationship of the network and thus give rise to the quantitative analysis of nodal properties. The proposed EH is applied to protein residue networks for protein thermal fluctuation analysis, rendering the most accurate B-factor prediction of a set of 364 proteins. This work extends the utility of dynamical systems to the quantitative modeling and analysis of realistic physical systems.

Posted Content
TL;DR: Numerical results indicate that the proposed AGL method outperforms the other state-of-the-art methods in the binding affinity predictions of the protein-ligand complexes.
Abstract: Although algebraic graph theory based models have been widely applied in physical modeling and molecular studies, they are typically incompetent in the analysis and prediction of biomolecular properties when compared with other quantitative approaches. There is a need to explore the capability and limitation of algebraic graph theory for molecular and biomolecular modeling, analysis, and prediction. In this work, we propose novel algebraic graph learning (AGL) models that encode high-dimensional physical and biological information into intrinsically low-dimensional representations. The proposed AGL model introduces multiscale weighted colored subgraphs to describe crucial molecular and biomolecular interactions via graph invariants associated with the graph Laplacian, its pseudo-inverse, and adjacent matrix. Additionally, the AGL models are incorporated with an advanced machine learning algorithm to connect the low-dimensional graph representation of biomolecular structures with their macroscopic properties. Three popular protein-ligand binding affinity benchmarks, namely CASF-2007, CASF-2013, and CASF-2016, are employed to validate the accuracy, robustness, and reliability of the present AGL model. Numerical results indicate that the proposed AGL method outperforms the other state-of-the-art methods in the binding affinity predictions of the protein-ligand complexes.

Posted Content
Duc Duy Nguyen1, Zixuan Cang1, Kedi Wu1, Menglun Wang1, Yin Cao1, Guo-Wei Wei1 
TL;DR: The D3R Grand Challenge 3 (GC3) as discussed by the authors is the most difficult challenge so far, with five subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-$\alpha, TIE2, and ABL1.
Abstract: Advanced mathematics, such as multiscale weighted colored graph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R grand challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 (GC2) focused on the pose prediction and binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy Set 1 in Stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has 5 subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-$\alpha$, TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of 26 official competitive tasks.

Posted Content
TL;DR: This work introduces an out-of-core and parallel algorithm to improve the ESES software and shows that the algorithm can successfully reduce memory footprint through a straightforward divide-and-conquer strategy to perform the calculation of arbitrarily large proteins on a typical commodity personal computer.
Abstract: Motivation: Surface generation and visualization are some of the most important tasks in biomolecular modeling and computation. Eulerian solvent excluded surface (ESES) software provides analytical solvent excluded surface (SES) in the Cartesian grid, which is necessary for simulating many biomolecular electrostatic and ion channel models. However, large biomolecules and/or fine grid resolutions give rise to excessively large memory requirements in ESES construction. We introduce an out-of-core and parallel algorithm to improve the ESES software. Results: The present approach drastically improves the spatial and temporal efficiency of ESES. The memory footprint and time complexity are analyzed and empirically verified through extensive tests with a large collection of biomolecule examples. Our results show that our algorithm can successfully reduce memory footprint through a straightforward divide-and-conquer strategy to perform the calculation of arbitrarily large proteins on a typical commodity personal computer. On multi-core computers or clusters, our algorithm can reduce the execution time by parallelizing most of the calculation as disjoint subproblems. Various comparisons with the state-of-the-art Cartesian grid based SES calculation were done to validate the present method and show the improved efficiency. This approach makes ESES a robust software for the construction of analytical solvent excluded surfaces. Availability and implementation: this http URL.

Posted Content
TL;DR: In this article, the intrinsic physics of 3D molecular structures lies on a family of low-dimensional manifolds embedded in a high-dimensional data space, and they encode crucial chemical, physical and biological information into 2D element interactive manifolds, extracted from a highdimensional structural data space via a multiscale discrete-to-continuum mapping using differentiable density estimators.
Abstract: Motivation: Despite its great success in various physical modeling, differential geometry (DG) has rarely been devised as a versatile tool for analyzing large, diverse and complex molecular and biomolecular datasets due to the limited understanding of its potential power in dimensionality reduction and its ability to encode essential chemical and biological information in differentiable manifolds. Results: We put forward a differential geometry based geometric learning (DG-GL) hypothesis that the intrinsic physics of three-dimensional (3D) molecular structures lies on a family of low-dimensional manifolds embedded in a high-dimensional data space. We encode crucial chemical, physical and biological information into 2D element interactive manifolds, extracted from a high-dimensional structural data space via a multiscale discrete-to-continuum mapping using differentiable density estimators. Differential geometry apparatuses are utilized to construct element interactive curvatures %Gaussian curvature, mean curvature, minimum curvature and maximum curvature in analytical forms for certain analytically differentiable density estimators. These low-dimensional differential geometry representations are paired with a robust machine learning algorithm to showcase their descriptive and predictive powers for large, diverse and complex molecular and biomolecular datasets. Extensive numerical experiments are carried out to demonstrated that the proposed DG-GL strategy outperforms other advanced methods in the predictions of drug discovery related protein-ligand binding affinity, drug toxicity, and molecular solvation free energy.