scispace - formally typeset
Search or ask a question

Showing papers by "Gianluca Pollastri published in 2020"


Journal ArticleDOI
TL;DR: This review discusses the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade.
Abstract: Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.

137 citations


Posted Content
TL;DR: A set of community-wide recommendations aiming to help establish standards of supervised machine learning validation in biology are presented, including a structured methods description for machine learning based on data, optimization, model, evaluation (DOME).
Abstract: Modern biology frequently relies on machine learning to provide predictions and improve decision processes. There have been recent calls for more scrutiny on machine learning performance and possible limitations. Here we present a set of community-wide recommendations aiming to help establish standards of supervised machine learning validation in biology. Adopting a structured methods description for machine learning based on data, optimization, model, evaluation (DOME) will aim to help both reviewers and readers to better understand and assess the performance and limitations of a method or outcome. The recommendations are formulated as questions to anyone wishing to pursue implementation of a machine learning algorithm. Answers to these questions can be easily included in the supplementary material of published papers.

62 citations


Journal ArticleDOI
TL;DR: SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks and outperforming the other state-of-the-art web servers the authors tested.
Abstract: MOTIVATION The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. RESULTS Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75-0.86 outperforming the other state-of-the-art web servers we tested. AVAILABILITY AND IMPLEMENTATION SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. CONTACT catherine.mooney@ucd.ie.

18 citations


Journal ArticleDOI
TL;DR: PPIIPRED is useful in large-scale studies, such as evolutionary analyses of PPIIH, or computationally reducing large datasets of candidate binding peptides for further experimental validation.
Abstract: Background: The polyproline II helix (PPIIH) is an extended protein left-handed secondary structure that usually but not necessarily involves prolines. Short PPIIHs are frequently, but not exclusiv...

12 citations


Journal ArticleDOI
TL;DR: In this paper, the authors use evolutionary profiles to predict protein secondary structure, as well as other protein structural features, and show that using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions.
Abstract: The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profiles may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be mitigated by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also reduces the impact of choosing a given similarity cutoff when selecting test proteins. The EVALpro program is available in the SCRATCH suite ( www.scratch.proteomics.ics.uci.edu ) and can be downloaded at: www.download.igb.uci.edu/#evalpro .

7 citations


Journal ArticleDOI
TL;DR: The proposed Brewery is a suite of ab initio predictors of 1D Protein Structural Annotations that uses multiple sources of evolutionary information to achieve state-of-the-art predictions of Secondary Structure, Structural Motifs, Relative Solvent Accessibility and Contact Density.
Abstract: Motivation Protein structural annotations (PSAs) are essential abstractions to deal with the prediction of protein structures. Many increasingly sophisticated PSAs have been devised in the last few decades. However, the need for annotations that are easy to compute, process and predict has not diminished. This is especially true for protein structures that are hardest to predict, such as novel folds. Results We propose Brewery, a suite of ab initio predictors of 1D PSAs. Brewery uses multiple sources of evolutionary information to achieve state-of-the-art predictions of secondary structure, structural motifs, relative solvent accessibility and contact density. Availability and implementation The web server, standalone program, Docker image and training sets of Brewery are available at http://distilldeep.ucd.ie/brewery/. Contact gianluca.pollastri@ucd.ie.

5 citations


Posted ContentDOI
14 Jun 2020-bioRxiv
TL;DR: A new protocol is implemented (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins, which completely removes the need for selecting arbitrary similarity cutoffs when selecting test proteins.
Abstract: The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictors. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profile may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be avoided by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also completely removes the need for selecting arbitrary similarity cutoffs when selecting test proteins. The EVALpro program is available for download from the SCRATCH suite (http://scratch.proteomics.ics.uci.edu).