scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

TL;DR: A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed and it is estimated that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%.
About: This article is published in Journal of Molecular Biology.The article was published on 2000-07-21. It has received 4268 citations till now. The article focuses on the topics: Chloroplast localization & Signal peptide.
Citations
More filters
Journal ArticleDOI
14 Dec 2000-Nature
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

8,742 citations

Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,312 citations

Journal ArticleDOI
TL;DR: The properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts are described and a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties are sketched.
Abstract: Determining the subcellular localization of a protein is an important first step toward understanding its function. Here, we describe the properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts, and sketch a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties. We then outline how to use a number of internet-accessible tools to arrive at a reliable subcellular localization prediction for eukaryotic and prokaryotic proteins. In particular, we provide detailed step-by-step instructions for the coupled use of the amino-acid sequence-based predictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted at the Center for Biological Sequence Analysis, Technical University of Denmark. In addition, we describe and provide web references to other useful subcellular localization predictors. Finally, we discuss predictive performance measures in general and the performance of TargetP and SignalP in particular.

3,235 citations


Cites background from "Predicting subcellular localization..."

  • ...A successor of ChloroP is TargetP, which provides prediction of cTPs, mTPs and secretory SP...

    [...]

Journal ArticleDOI
11 Jul 2008-Cell
TL;DR: This work predicts 19 proteins to be important for the function of complex I (CI) of the electron transport chain and validate a subset of these predictions using RNAi, including C8orf38, which is shown to have an inherited mutation in a lethal, infantile CI deficiency.

1,836 citations


Cites background or methods from "Predicting subcellular localization..."

  • ...1 confidence score (Emanuelsson et al., 2000)....

    [...]

  • ...We can assess accuracy at each score by using a corrected false discovery rate statistic (cFDR), which accounts for the sizes of our training sets (see the Experimental Procedures)....

    [...]

  • ...…Kislinger et al., 2006; Mootha et al., 2003a; Taylor et al., 2003) and yeast (Reinders et al., 2006; Sickmann et al., 2003), epitope tagging combined with microscopy in yeast (Huh et al., 2003; Kumar et al., 2002), and computation (Calvo et al., 2006; Emanuelsson et al., 2000; Guda et al., 2004)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a series of fluorescent organelle markers based on well-established targeting sequences that can be used for co-localization studies was generated for the Arabidopsis organelle marker set.
Abstract: Genome sequencing has resulted in the identification of a large number of uncharacterized genes with unknown functions It is widely recognized that determination of the intracellular localization of the encoded proteins may aid in identifying their functions To facilitate these localization experiments, we have generated a series of fluorescent organelle markers based on well-established targeting sequences that can be used for co-localization studies In particular, this organelle marker set contains indicators for the endoplasmic reticulum, the Golgi apparatus, the tonoplast, peroxisomes, mitochondria, plastids and the plasma membrane All markers were generated with four different fluorescent proteins (FP) (green, cyan, yellow or red FPs) in two different binary plasmids for kanamycin or glufosinate selection, respectively, to allow for flexible combinations The labeled organelles displayed characteristic morphologies consistent with previous descriptions that could be used for their positive identification Determination of the intracellular distribution of three previously uncharacterized proteins demonstrated the usefulness of the markers in testing predicted subcellular localizations This organelle marker set should be a valuable resource for the plant community for such co-localization studies In addition, the Arabidopsis organelle marker lines can also be employed in plant cell biology teaching labs to demonstrate the distribution and dynamics of these organelles

1,782 citations

References
More filters
Book ChapterDOI
01 Jan 1988
TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

17,604 citations


"Predicting subcellular localization..." refers methods in this paper

  • ...All neural networks in the predictor are of the feed-forward type with sigmoidal neurons (Minsky & Papert, 1968) and zero or one layer of hidden neurons, trained using error backpropagation (Rumelhart et al., 1986) but the implementations and chosen parameter values differ somewhat....

    [...]

  • ...All neural networks in the predictor are of the feed-forward type with sigmoidal neurons (Minsky & Papert, 1968) and zero or one layer of hidden neurons, trained using error backpropagation (Rumelhart et al., 1986) but the implementations and chosen parameter values differ somewhat....

    [...]

Journal ArticleDOI
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

10,262 citations


"Predicting subcellular localization..." refers methods in this paper

  • ...Pairwise alignment was performed using the full Smith-Waterman algorithm and the PAM250 scoring matrix, as implemented in the search program of the FASTA package ( Smith & Waterman, 1981; Pearson, 1990)....

    [...]

  • ...Pairwise alignment was performed using the full Smith-Waterman algorithm and the PAM250 scoring matrix, as implemented in the search program of the FASTA package (Smith & Waterman, 1981; Pearson, 1990)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors describe a self-organizing system in which the signal representations are automatically mapped onto a set of output responses in such a way that the responses acquire the same topological order as that of the primary events.
Abstract: This work contains a theoretical study and computer simulations of a new self-organizing process. The principal discovery is that in a simple network of adaptive physical elements which receives signals from a primary event space, the signal representations are automatically mapped onto a set of output responses in such a way that the responses acquire the same topological order as that of the primary events. In other words, a principle has been discovered which facilitates the automatic formation of topologically correct maps of features of observable events. The basic self-organizing system is a one- or two-dimensional array of processing units resembling a network of threshold-logic units, and characterized by short-range lateral feedback between neighbouring units. Several types of computer simulations are used to demonstrate the ordering process as well as the conditions under which it fails.

8,247 citations

Journal ArticleDOI
TL;DR: A new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence that performs significantly better than previous prediction schemes and can easily be applied on genome-wide data sets.
Abstract: We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence. The method performs significantly better than previous prediction schemes and can easily be applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server.

5,480 citations


"Predicting subcellular localization..." refers background or methods in this paper

  • ...To this end, we used an experimental hidden Markov model-based version of SignalP (SignalP-HMM) ( Nielsen & Krogh, 1998 ) which offers a better discrimination between cleaved signal peptides and uncleaved signal anchors than does the original neural networkbased SignalP....

    [...]

  • ...SP cleavage is determined by processing the sequences through SignalP (Nielsen et al., 1997)....

    [...]

  • ...The full-size sets were also tested on PSORT (Nakai & Kanehisa, 1992; Horton & Nakai, 1997) and MitoProt (Claros, 1995; Claros & Vincens, 1996) as well as on TargetP's predecessors SignalP (Nielsen et al., 1997) and ChloroP (Emanuelsson et al., 1999)....

    [...]

  • ...From this set, entries with cleavage sites (CS), predicted by the SPpredictor SignalP (prokaryotic, gram-negative networks) (Nielsen et al., 1997) to lie within 5 residues from annotated CS were removed since it could not be excluded that these cleavage sites resulted from the second cleavage of a…...

    [...]

  • ...We have reported subcellular localization predictors designed to identify either SPs (SignalP) (Nielsen et al., 1997) or cTPs (ChloroP) (Emanuelsson et al., 1999) in a protein sequence....

    [...]

01 Jan 1997
TL;DR: In this paper, a new method for the identification of in performance compared with the weight matrix method signal peptides and their cleavage sites based on neural (Arrigo et al., 1991; Ladunga et al, 1991; Schneider and networks trained on separate sets of prokaryotic and eukaryotic sequence.
Abstract: applicable prediction methods with significant improvements We have developed a new method for the identification of in performance compared with the weight matrix method signal peptides and their cleavage sites based on neural (Arrigo et al., 1991; Ladunga et al., 1991; Schneider and networks trained on separate sets of prokaryotic and Wrede, 1993). eukaryotic sequence. The method performs significantly better than previous prediction schemes and can easily be Materials and methods applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal- The data were taken from SWISS-PROT version 29 (Bairoch anchor sequences is also possible, though with lower preci- and Boeckmann, 1994). The data sets were divided into sion. Predictions can be made on a publicly available prokaryotic and eukaryotic entries and the prokaryotic data sets WWW server.

5,191 citations