Author

Gregory E. Sims

Bio: Gregory E. Sims is an academic researcher. The author has contributed to research in topics: Protein sequencing & UniProt. The author has an hindex of 1, co-authored 1 publications receiving 2097 citations.

Topics: Protein sequencing, UniProt, Sequence (medicine), Sequence alignment ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Predicting the Functional Effect of Amino Acid Substitutions and Indels

[...]

Yongwook Choi¹, Gregory E. Sims, Sean M. Murphy, Jason R. Miller, Agnes P. Chan¹ - Show less +1 more•Institutions (1)

J. Craig Venter Institute¹

08 Oct 2012-PLOS ONE

TL;DR: A new algorithm, PROVEAN (Protein Variation Effect Analyzer), is developed, which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions.

...read moreread less

Abstract: As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.

...read moreread less

2,533 citations

Cited by

PDF

Open Access

More filters

Collapse