scispace - formally typeset
Search or ask a question
Topic

Protein sequencing

About: Protein sequencing is a research topic. Over the lifetime, 2234 publications have been published within this topic receiving 99686 citations. The topic is also known as: amino acid sequencing & peptide sequencing.


Papers
More filters
Journal ArticleDOI
TL;DR: This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects protein function.
Abstract: The effect of genetic mutation on phenotype is of significant interest in genetics. The type of genetic mutation that causes a single amino acid substitution (AAS) in a protein sequence is called a non-synonymous single nucleotide polymorphism (nsSNP). An nsSNP could potentially affect the function of the protein, subsequently altering the carrier's phenotype. This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects protein function. To assess the effect of a substitution, SIFT assumes that important positions in a protein sequence have been conserved throughout evolution and therefore substitutions at these positions may affect protein function. Thus, by using sequence homology, SIFT predicts the effects of all possible substitutions at each position in the protein sequence. The protocol typically takes 5–20 min, depending on the input. SIFT is available as an online tool ( http://sift-dna.org ).

6,154 citations

Journal ArticleDOI
08 Oct 2012-PLOS ONE
TL;DR: A new algorithm, PROVEAN (Protein Variation Effect Analyzer), is developed, which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions.
Abstract: As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.

2,533 citations

Journal ArticleDOI
01 Feb 1996-Nature
TL;DR: A simple and robust technique for the sequencing of proteins isolated by polyacrylamide gel electro-phoresis, using nano-electrospray3,4 tandem mass spectrometry5,6 and multiple-sequence stretches of up to 16 amino acids are obtained.
Abstract: Molecular analysis of complex biological structures and processes increasingly requires sensitive methods for protein sequencing. Electrospray mass spectrometry has been applied to the high-sensitivity sequencing of short peptides, but technical difficulties have prevented similar success with gel-isolated proteins. Here we report a simple and robust technique for the sequencing of proteins isolated by polyacrylamide gel electrophoresis, using nano-electrospray tandem mass spectrometry. As little as 5 ng protein starting material on Coomassie- or silver-stained gels can be sequenced. Multiple-sequence stretches of up to 16 amino acids are obtained, which identify the protein unambiguously if already present in databases or provide information to clone the corresponding gene. We have applied this method to the sequencing and cloning of a protein which inhibits the proliferation of capillary endothelial cells in vitro and thus may have potential antiangiogenic effects on solid tumours.

1,695 citations

Journal ArticleDOI
TL;DR: A rapid method for the identification of known proteins separated by two-dimensional gel electrophoresis is described in which molecular masses of peptide fragments are used to search a protein sequence database and each protein was uniquely identified from over 91,000 protein sequences.
Abstract: A rapid method for the identification of known proteins separated by two-dimensional gel electrophoresis is described in which molecular masses of peptide fragments are used to search a protein sequence database. The peptides are generated by in situ reduction, alkylation, and tryptic digestion of proteins electroblotted from two-dimensional gels. Masses are determined at the subpicomole level by matrix-assisted laser desorption/ionization mass spectrometry of the unfractionated digest. A computer program has been developed that searches the protein sequence database for multiple peptides of individual proteins that match the measured masses. To ensure that the most recent database updates are included, a theoretical digest of the entire database is generated each time the program is executed. This method facilitates simultaneous processing of a large number of two-dimensional gel spots. The method was applied to a two-dimensional gel of a crude Escherichia coli extract that was electroblotted onto poly(vinylidene difluoride) membrane. Ten randomly chosen spots were analyzed. With as few as three peptide masses, each protein was uniquely identified from over 91,000 protein sequences. All identifications were verified by concurrent N-terminal sequencing of identical spots from a second blot. One of the spots contained an N-terminally blocked protein that required enzymatic cleavage, peptide separation, and Edman degradation for confirmation of its identity.

1,290 citations

Journal ArticleDOI
03 Oct 1997-Science
TL;DR: The first fully automated design and experimental validation of a novel sequence for an entire protein is described, and a BLAST search shows that the designed sequence, full sequence design 1 (FSD-1), has very low identity to any known protein sequence.
Abstract: The first fully automated design and experimental validation of a novel sequence for an entire protein is described. A computational design algorithm based on physical chemical potential functions and stereochemical constraints was used to screen a combinatorial library of 1.9 x 10^(27) possible amino acid sequences for compatibility with the design target, a ββα protein motif based on the polypeptide backbone structure of a zinc finger domain. A BLAST search shows that the designed sequence, full sequence design 1 (FSD-1), has very low identity to any known protein sequence. The solution structure of FSD-1 was solved by nuclear magnetic resonance spectroscopy and indicates that FSD-1 forms a compact well-ordered structure, which is in excellent agreement with the design target structure. This result demonstrates that computational methods can perform the immense combinatorial search required for protein design, and it suggests that an unbiased and quantitative algorithm can be used in various structural contexts.

1,208 citations


Network Information
Related Topics (5)
Peptide sequence
84.1K papers, 4.3M citations
88% related
RNA
111.6K papers, 5.4M citations
85% related
Gene
211.7K papers, 10.3M citations
84% related
Regulation of gene expression
85.4K papers, 5.8M citations
84% related
Gene expression
113.3K papers, 5.5M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202337
202269
202154
202044
201963
201867