scispace - formally typeset
Search or ask a question

Showing papers by "Gianluca Pollastri published in 2019"


Journal ArticleDOI
TL;DR: Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades but even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy, while only a few predict more than the 3 traditional Helix, Strand and Coil classes.
Abstract: Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88-90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.

68 citations


Journal ArticleDOI
TL;DR: A deep neural network architecture composed of stacks of bidirectional recurrent neural networks and convolutional layers which is capable of mining information from long-range interactions within a protein sequence and apply it to the prediction of protein RSA using a novel encoding method that is called “clipped” is presented.
Abstract: Predicting the three-dimensional structure of proteins is a long-standing challenge of computational biology, as the structure (or lack of a rigid structure) is well known to determine a protein’s function. Predicting relative solvent accessibility (RSA) of amino acids within a protein is a significant step towards resolving the protein structure prediction challenge especially in cases in which structural information about a protein is not available by homology transfer. Today, arguably the core of the most powerful prediction methods for predicting RSA and other structural features of proteins is some form of deep learning, and all the state-of-the-art protein structure prediction tools rely on some machine learning algorithm. In this article we present a deep neural network architecture composed of stacks of bidirectional recurrent neural networks and convolutional layers which is capable of mining information from long-range interactions within a protein sequence and apply it to the prediction of protein RSA using a novel encoding method that we shall call “clipped”. The final system we present, PaleAle 5.0, which is available as a public server, predicts RSA into two, three and four classes at an accuracy exceeding 80% in two classes, surpassing the performances of all the other predictors we have benchmarked.

25 citations


Book ChapterDOI
28 Mar 2019
TL;DR: This chapter presents the key types of protein structure annotation and the methods and algorithms for predicting them, with the aim to give both a historical perspective on their development and a snapshot of their current state of the art.
Abstract: This chapter aims to introduce to the specifics of protein structure annotations and their fundamental position in structural bioinformatics, bioinformatics in general. Proteins are profoundly characterised by their structure in every aspect of their functioning and, while over the last decades there has been a close to exponential growth of known protein sequences, the growth of known protein structures has been closer to linear because of the high complexity and cost of determining them. Thus, protein structure predictors are among the most thoroughly assessed tools in bioinformatics (in venues such as CASP or CAMEO) because they allow the structural study of proteins on a large scale. This chapter presents the key types of protein structure annotation and the methods and algorithms for predicting them, with the aim to give both a historical perspective on their development and a snapshot of their current state of the art. From one-dimensional protein annotations – i.e. secondary structure, solvent accessibility and torsion angles – to more complex and informative two-dimensional protein abstractions, i.e. contact maps, both mature and currently developing methods for protein structure annotations are introduced. The aim of this overview is to facilitate the adoption and development of state-of-the-art protein structural predictors. Particular attention is given to some of the best performing and freely available web servers and standalone programmes to predict protein structure annotations.

8 citations


Book ChapterDOI
01 Jan 2019
TL;DR: Protein tertiary structure prediction is a research field which aims to create models and software tools able to predict the three-dimensional shape of protein molecules by describing the spatial disposition of each of its atoms starting from the sequence of its amino acids.
Abstract: Proteins are involved in many cell activities (e.g., molecular transport, mechanical functions, message exchange) thus knowing their 3D structure is crucial in order to understand their function. Protein tertiary structure prediction is a research field which aims to create models and software tools able to predict the three-dimensional shape of protein molecules by describing the spatial disposition of each of its atoms starting from the sequence of its amino acids. There exist exact methods to resolve the molecular structure with high precision, but they are both time and resource consuming. Computational based software techniques can predict the tertiary structure of a protein with acceptable precision for many applications with high efficiency allowing for genome-wide investigations, otherwise not feasible. These tools use various intermediate steps, evolutionary considerations and chemical functionals to improve the predicted structure. Nevertheless, due to the high dimensionality of the problem, some of the available computational techniques, e.g., Density Functional Theory, are not efficient enough to be used in practical application scenarios.

1 citations