scispace - formally typeset
Search or ask a question
Book ChapterDOI

Sequence Analysis and Discrimination of Amyloid and Non-amyloid Peptides

TL;DR: This work has systematically analyzed 139 amyloid and 168 non-amyloid hexapeptides and revealed the preference of residues at six different positions and devised a statistical method for discriminating amlyloids and non-AMyloids, which showed an accuracy of 89% and 54%, respectively.
Abstract: The main cause of several neurodegenerative diseases such as Alzhemier, Parkinson and spongiform encephalopathies is the formation of amyloid fibrils in proteins. The systematic analysis of amyloid and non-amyloid sequences provide deep insights on the preference of amino acid residues at different locations of amyloid and non-amyloid peptides. In this work, we have systematically analyzed 139 amyloid and 168 non-amyloid hexapeptides and revealed the preference of residues at six different positions. We observed that Glu, Ile, Ser, Thr and Val are dominant at positions six, five, one, two and three, respectively in amyloid peptides. In non-amyloids, similar trend is noticed for few residues whereas the residues Ala in position 2, Asn in position 4, Gly in position 6 etc, showed different trends to that of amyloids. Utilizing the positional preference of 20 amino acid residues, we devised a statistical method for discriminating amlyloids and non-amyloids, which showed an accuracy of 89% and 54%, respectively. Further, we have examined various machine learning techniques, and a method based on Random Forest showed an overall accuracy of 99.7% and 83% using self-consistent and 10-fold cross-validation, respectively using the positional preference of amyloids and non-amyloids along with three selected amino acid properties.
Citations
More filters
Journal ArticleDOI
TL;DR: Amyloid-fibril forming hexa-peptides show position specific sequence features that are different from those which may form amorphous β-aggregates, and these positional preferences are found to be important features for discriminating amyloid- fibrils forming peptides from their homologues that don't form amyloids-fibils.
Abstract: Comparison of short peptides which form amyloid-fibrils with their homologues that may form amorphous β-aggregates but not fibrils, can aid development of novel amyloid-containing nanomaterials with well defined morphologies and characteristics. The knowledge gained from the comparative analysis could also be applied towards identifying potential aggregation prone regions in proteins, which are important for biotechnology applications or have been implicated in neurodegenerative diseases. In this work we have systematically analyzed a set of 139 amyloid-fibril hexa-peptides along with a highly homologous set of 168 hexa-peptides that do not form amyloid fibrils for their position-wise as well as overall amino acid compositions and averages of 49 selected amino acid properties. Amyloid-fibril forming peptides show distinct preferences and avoidances for amino acid residues to occur at each of the six positions. As expected, the amyloid fibril peptides are also more hydrophobic than non-amyloid peptides. We have used the results of this analysis to develop statistical potential energy values for the 20 amino acid residues to occur at each of the six different positions in the hexa-peptides. The distribution of the potential energy values in 139 amyloid and 168 non-amyloid fibrils are distinct and the amyloid-fibril peptides tend to be more stable (lower total potential energy values) than non-amyloid peptides. The average frequency of occurrence of these peptides with lower than specific cutoff energies at different positions is 72% and 50%, respectively. The potential energy values were used to devise a statistical discriminator to distinguish between amyloid-fibril and non-amyloid peptides. Our method could identify the amyloid-fibril forming hexa-peptides to an accuracy of 89%. On the other hand, the accuracy of identifying non-amyloid peptides was only 54%. Further attempts were made to improve the prediction accuracy via machine learning. This resulted in an overall accuracy of 82.7% with the sensitivity and specificity of 81.3% and 83.9%, respectively, in 10-fold cross-validation method. Amyloid-fibril forming hexa-peptides show position specific sequence features that are different from those which may form amorphous β-aggregates. These positional preferences are found to be important features for discriminating amyloid-fibril forming peptides from their homologues that don't form amyloid-fibrils.

15 citations

Book ChapterDOI
06 Apr 2017
TL;DR: Here it is illustrated how the present understanding of the physico-chemical and structural basis of protein aggregation has crystalized in the development of algorithms able to forecast the aggregation properties of proteins both from their primary and tertiary structures.
Abstract: Protein aggregation accounts for the onset of more than 40 human disorders, including neurodegenerative diseases like Alzheimer’s and Parkinson’s but also non-neuropathic pathologies like Diabetes type II or some types of cancers. In all these diseases, the toxic effect is associated with the self-assembly of proteins into insoluble amyloid fibrils displaying a common regular cross-β structure. Surprisingly, cells also exploit the amyloid fold for important physiological processes, from structure scaffolding to heritable information transmission. In addition, protein aggregation often occurs during the recombinant production and downstream processing of therapeutic proteins, becoming the main bottleneck in the marketing of these drugs. In this context, approaches aiming to predict the aggregation and amyloid formation propensities of proteins are receiving increasing interest, both because they can lead us to the development of novel therapeutic strategies and because they are providing us with a global understanding of the role of protein aggregation in physiological and pathological processes. Here we illustrate how our present understanding of the physico-chemical and structural basis of protein aggregation has crystalized in the development of algorithms able to forecast the aggregation properties of proteins both from their primary and tertiary structures. A detailed description of these computational approaches and their application is provided.

7 citations

References
More filters
Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations

Journal ArticleDOI
TL;DR: The results confirm the model of intermolecular β-sheet formation as a widespread underlying mechanism of protein aggregation and opens the door to a fully automated, sequence-based design strategy to improve the aggregation properties of proteins of scientific or industrial interest.
Abstract: We have developed a statistical mechanics algorithm, TANGO, to predict protein aggregation. TANGO is based on the physico-chemical principles of beta-sheet formation, extended by the assumption that the core regions of an aggregate are fully buried. Our algorithm accurately predicts the aggregation of a data set of 179 peptides compiled from the literature as well as of a new set of 71 peptides derived from human disease-related proteins, including prion protein, lysozyme and beta2-microglobulin. TANGO also correctly predicts pathogenic as well as protective mutations of the Alzheimer beta-peptide, human lysozyme and transthyretin, and discriminates between beta-sheet propensity and aggregation. Our results confirm the model of intermolecular beta-sheet formation as a widespread underlying mechanism of protein aggregation. Furthermore, the algorithm opens the door to a fully automated, sequence-based design strategy to improve the aggregation properties of proteins of scientific or industrial interest.

1,446 citations

Journal ArticleDOI
TL;DR: The algorithm is shown to identify a series of protein fragments involved in the aggregation of disease-related proteins and to predict the effect of genetic mutations on their deposition propensities, which shall facilitate the identification of possible therapeutic targets for anti-depositional strategies in conformational diseases.
Abstract: Protein aggregation correlates with the development of several debilitating human disorders of growing incidence, such as Alzheimer's and Parkinson's diseases. On the biotechnological side, protein production is often hampered by the accumulation of recombinant proteins into aggregates. Thus, the development of methods to anticipate the aggregation properties of polypeptides is receiving increasing attention. AGGRESCAN is a web-based software for the prediction of aggregation-prone segments in protein sequences, the analysis of the effect of mutations on protein aggregation propensities and the comparison of the aggregation properties of different proteins or protein sets. AGGRESCAN is based on an aggregation-propensity scale for natural amino acids derived from in vivo experiments and on the assumption that short and specific sequence stretches modulate protein aggregation. The algorithm is shown to identify a series of protein fragments involved in the aggregation of disease-related proteins and to predict the effect of genetic mutations on their deposition propensities. It also provides new insights into the differential aggregation properties displayed by globular proteins, natively unfolded polypeptides, amyloidogenic proteins and proteins found in bacterial inclusion bodies. By identifying aggregation-prone segments in proteins, AGGRESCAN http://bioinf.uab.es/aggrescan/ shall facilitate (i) the identification of possible therapeutic targets for anti-depositional strategies in conformational diseases and (ii) the anticipation of aggregation phenomena during storage or recombinant production of bioactive polypeptides or polypeptide sets.

844 citations

Journal ArticleDOI
TL;DR: Waltz as mentioned in this paper is a web-based tool that uses a position-specific scoring matrix to determine amyloid-forming sequences, which allows users to identify and better distinguish between Amyloid sequences and amorphous beta-sheet aggregates.
Abstract: Protein aggregation results in beta-sheet-like assemblies that adopt either a variety of amorphous morphologies or ordered amyloid-like structures. These differences in structure also reflect biological differences; amyloid and amorphous beta-sheet aggregates have different chaperone affinities, accumulate in different cellular locations and are degraded by different mechanisms. Further, amyloid function depends entirely on a high intrinsic degree of order. Here we experimentally explored the sequence space of amyloid hexapeptides and used the derived data to build Waltz, a web-based tool that uses a position-specific scoring matrix to determine amyloid-forming sequences. Waltz allows users to identify and better distinguish between amyloid sequences and amorphous beta-sheet aggregates and allowed us to identify amyloid-forming regions in functional amyloids.

536 citations

Journal ArticleDOI
TL;DR: This study provides the potential for a proteome-wide scanning to detect fibril-forming regions in proteins, from which molecules can be designed to prevent and/or disrupt this process.
Abstract: The establishment of rules that link sequence and amyloid feature is critical for our understanding of misfolding diseases. To this end, we have performed a saturation mutagenesis analysis on the de novo-designed amyloid peptide STVIIE (1). The positional scanning mutagenesis has revealed that there is a position dependence on mutation of amyloid fibril formation and that both very tolerant and restrictive positions to mutation can be found within an amyloid sequence. In this system, mutations that accelerate β-sheet polymerization do not always lead to an increase of amyloid products. On the contrary, abundant fibrils are typically found for mutants that polymerize slowly. From these experiments, we have extracted a sequence pattern to identify amyloidogenic stretches in proteins. The pattern has been validated experimentally. In silico sequence scanning of amyloid proteins also supports the pattern. Analysis of protein databases has shown that highly amyloidogenic sequences matching the pattern are less frequent in proteins than innocuous amino acid combinations and that, if present, they are surrounded by amino acids that disrupt their aggregating capability (amyloid breakers). This study provides the potential for a proteome-wide scanning to detect fibril-forming regions in proteins, from which molecules can be designed to prevent and/or disrupt this process.

367 citations