scispace - formally typeset
Search or ask a question

Showing papers by "Gianluca Pollastri published in 2011"


Journal ArticleDOI
TL;DR: CSpritz is a web server for the prediction of intrinsic protein disorder that is a combination of previous Spritz with two novel orthogonal systems developed by the group (Punch and ESpritz).
Abstract: CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/.

86 citations


Journal ArticleDOI
TL;DR: A subcellular localization predictor (SCLpred), which predicts the location of a protein into four classes for animals and fungi and five classes for plants using machine learning models trained on large non-redundant sets of protein sequences using a novel Neural Network.
Abstract: Summary: Knowledge of the subcellular location of a protein provides valuable information about its function and possible interaction with other proteins. In the post-genomic era, fast and accurate predictors of subcellular location are required if this abundance of sequence data is to be fully exploited. We have developed a subcellular localization predictor (SCLpred), which predicts the location of a protein into four classes for animals and fungi and five classes for plants (secreted, cytoplasm, nucleus, mitochondrion and chloroplast) using machine learning models trained on large non-redundant sets of protein sequences. The algorithm powering SCLpred is a novel Neural Network (N-to-1 Neural Network, or N1-NN) we have developed, which is capable of mapping whole sequences into single properties (a functional class, in this work) without resorting to predefined transformations, but rather by adaptively compressing the sequence into a hidden feature vector. We benchmark SCLpred against other publicly available predictors using two benchmarks including a new subset of Swiss-Prot Release 2010_06. We show that SCLpred surpasses the state of the art. The N1-NN algorithm is fully general and may be applied to a host of problems of similar shape, that is, in which a whole sequence needs to be mapped into a fixed-size array of properties, and the adaptive compression it operates may shed light on the space of protein sequences. Availability: The predictive systems described in this article are publicly available as a web server at http://distill.ucd.ie/distill/. Contact: gianluca.pollastri@ucd.ie.

58 citations


Book ChapterDOI
TL;DR: This chapter collects some of the most common and useful tools available for protein motif discovery and secondary and tertiary structure prediction from a primary amino acid sequence and provides pointers to many other tools.
Abstract: A wealth of in silico tools is available for protein motif discovery and structural analysis The aim of this chapter is to collect some of the most common and useful tools and to guide the biologist in their use A detailed explanation is provided for the use of Distill, a suite of web servers for the prediction of protein structural features and the prediction of full-atom 3D models from a protein sequence Besides this, we also provide pointers to many other tools available for motif discovery and secondary and tertiary structure prediction from a primary amino acid sequence The prediction of protein intrinsic disorder and the prediction of functional sites and SLiMs are also briefly discussed Given that user queries vary greatly in size, scope and character, the trade-offs in speed, accuracy and scale need to be considered when choosing which methods to adopt

3 citations


Journal ArticleDOI
TL;DR: A new knowledge-based MQAP which evaluates single protein structure models, using a tree representation of the Cα trace to train a novel Neural Network Pairwise Interaction Field (NN-PIF) to predict the global quality of a model.
Abstract: In order to use a predicted protein structure one needs to know how good it is, as the utility of a model depends on its quality. To this aim, many Model Quality Assessment Programs (MQAP) have been developed over the last decade, with MQAP also being assessed at the CASP competition. We present a new knowledge-based MQAP which evaluates single protein structure models. We use a tree representation of the Cα trace to train a novel Neural Network Pairwise Interaction Field (NN-PIF) to predict the global quality of a model. NN-PIF allows fast evaluation of multiple structure models for a single sequence. In our tests on a large set of structures, our networks outperform most other methods based on different and more complex protein structure representations in global model quality prediction. Moreover, given NN-PIF can evaluate protein conformations very fast, we train a separate version of the model to gauge its ability to fold protein structures ab initio. We show that the resulting system, which relies only on basic information about the sequence and the Cα trace of a conformation, generally improves the quality of the structures it is presented with and may yield promising predictions in the absence of structural templates, although more research is required to harness the full potential of the model.

3 citations