Showing papers by "Gianluca Pollastri published in 2020"

PDF

Open Access

Journal Article•DOI•

Deep learning methods in protein structure prediction.

[...]

Mirko Torrisi¹, Gianluca Pollastri¹, Quan Le¹•Institutions (1)

22 Jan 2020-Computational and structural biotechnology journal

TL;DR: This review discusses the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade.

...read moreread less

Abstract: Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.

...read moreread less

137 citations

Posted Content•

DOME: Recommendations for supervised machine learning validation in biology

[...]

Ian Walsh¹, Dmytro Fishman², Dario Garcia-Gasulla³, Tiina Titma⁴, Gianluca Pollastri⁵, Jennifer Harrow, Fotis Psomopoulos, Silvio C. E. Tosatto⁶ - Show less +4 more•Institutions (6)

Agency for Science, Technology and Research¹, University of Tartu², Barcelona Supercomputing Center³, Tallinn University of Technology⁴, University College Dublin⁵, University of Padua⁶

25 Jun 2020-arXiv: Other Quantitative Biology

TL;DR: A set of community-wide recommendations aiming to help establish standards of supervised machine learning validation in biology are presented, including a structured methods description for machine learning based on data, optimization, model, evaluation (DOME).

...read moreread less

Abstract: Modern biology frequently relies on machine learning to provide predictions and improve decision processes. There have been recent calls for more scrutiny on machine learning performance and possible limitations. Here we present a set of community-wide recommendations aiming to help establish standards of supervised machine learning validation in biology. Adopting a structured methods description for machine learning based on data, optimization, model, evaluation (DOME) will aim to help both reviewers and readers to better understand and assess the performance and limitations of a method or outcome. The recommendations are formulated as questions to anyone wishing to pursue implementation of a machine learning algorithm. Answers to these questions can be easily included in the supplementary material of published papers.

...read moreread less

62 citations

Journal Article•DOI•

SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks.

[...]

Manaz Kaleel¹, Yandan Zheng², Jialiang Chen², Xuanming Feng², Jeremy C. Simpson¹, Gianluca Pollastri¹, Catherine Mooney² - Show less +3 more•Institutions (2)

University College Dublin¹, Beijing University of Technology²

01 Jun 2020-Bioinformatics

TL;DR: SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks and outperforming the other state-of-the-art web servers the authors tested.

...read moreread less

Abstract: MOTIVATION The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. RESULTS Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75-0.86 outperforming the other state-of-the-art web servers we tested. AVAILABILITY AND IMPLEMENTATION SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. CONTACT catherine.mooney@ucd.ie.

...read moreread less

18 citations

Journal Article•DOI•

Prediction of polyproline II secondary structure propensity in proteins.

[...]

Kevin T. O’Brien¹, Catherine Mooney¹, Cyril Lopez¹, Gianluca Pollastri¹, Denis C. Shields¹ - Show less +1 more•Institutions (1)

University College Dublin¹

15 Jan 2020-Royal Society Open Science

TL;DR: PPIIPRED is useful in large-scale studies, such as evolutionary analyses of PPIIH, or computationally reducing large datasets of candidate binding peptides for further experimental validation.

...read moreread less

Abstract: Background: The polyproline II helix (PPIIH) is an extended protein left-handed secondary structure that usually but not necessarily involves prolines. Short PPIIHs are frequently, but not exclusiv...

...read moreread less

12 citations

Journal Article•DOI•

Protein profiles: Biases and protocols.

[...]

Gregor Urban¹, Mirko Torrisi², Christophe Magnan¹, Gianluca Pollastri², Pierre Baldi¹ - Show less +1 more•Institutions (2)

University of California, Irvine¹, University College Dublin²

01 Jan 2020-Computational and structural biotechnology journal

TL;DR: In this paper, the authors use evolutionary profiles to predict protein secondary structure, as well as other protein structural features, and show that using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions.

...read moreread less

Abstract: The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profiles may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be mitigated by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also reduces the impact of choosing a given similarity cutoff when selecting test proteins. The EVALpro program is available in the SCRATCH suite ( www.scratch.proteomics.ics.uci.edu ) and can be downloaded at: www.download.igb.uci.edu/#evalpro .

...read moreread less

7 citations

Journal Article•DOI•

Brewery: deep learning and deeper profiles for the prediction of 1D protein structure annotations.

[...]

Mirko Torrisi¹, Gianluca Pollastri¹•Institutions (1)

University College Dublin¹

01 Jun 2020-Bioinformatics

TL;DR: The proposed Brewery is a suite of ab initio predictors of 1D Protein Structural Annotations that uses multiple sources of evolutionary information to achieve state-of-the-art predictions of Secondary Structure, Structural Motifs, Relative Solvent Accessibility and Contact Density.

...read moreread less

Abstract: Motivation Protein structural annotations (PSAs) are essential abstractions to deal with the prediction of protein structures. Many increasingly sophisticated PSAs have been devised in the last few decades. However, the need for annotations that are easy to compute, process and predict has not diminished. This is especially true for protein structures that are hardest to predict, such as novel folds. Results We propose Brewery, a suite of ab initio predictors of 1D PSAs. Brewery uses multiple sources of evolutionary information to achieve state-of-the-art predictions of secondary structure, structural motifs, relative solvent accessibility and contact density. Availability and implementation The web server, standalone program, Docker image and training sets of Brewery are available at http://distilldeep.ucd.ie/brewery/. Contact gianluca.pollastri@ucd.ie.

...read moreread less

5 citations

Posted Content•DOI•

Protein Profiles: Biases and Protocols

[...]

Gregor Urban¹, Mirko Torrisi², Christophe Magnan¹, Gianluca Pollastri², Pierre Baldi¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, University College Dublin²

14 Jun 2020-bioRxiv

TL;DR: A new protocol is implemented (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins, which completely removes the need for selecting arbitrary similarity cutoffs when selecting test proteins.

...read moreread less

Abstract: The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictors. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profile may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be avoided by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also completely removes the need for selecting arbitrary similarity cutoffs when selecting test proteins. The EVALpro program is available for download from the SCRATCH suite (http://scratch.proteomics.ics.uci.edu).

...read moreread less