scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A draft map of the human proteome

29 May 2014-Nature (Nature Publishing Group)-Vol. 509, Iss: 7502, pp 575-581
TL;DR: A draft map of the human proteome is presented using high-resolution Fourier-transform mass spectrometry to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-c coding RNAs and upstream open reading frames.
Abstract: The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here we present a draft map of the human proteome using high-resolution Fourier-transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames. This large human proteome catalogue (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
23 Jan 2015-Science
TL;DR: In this paper, a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level.
Abstract: Resolving the molecular details of proteome variation in the different tissues and organs of the human body will greatly increase our knowledge of human biology and disease. Here, we present a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level. Our tissue-based analysis detected more than 90% of the putative protein-coding genes. We used this approach to explore the human secretome, the membrane proteome, the druggable proteome, the cancer proteome, and the metabolic functions in 32 different tissues and organs. All the data are integrated in an interactive Web-based database that allows exploration of individual proteins, as well as navigation of global expression patterns, in all major tissues and organs in the human body.

9,745 citations

Journal ArticleDOI
TL;DR: A significant update to one of the tools in this domain called Enrichr, a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries is presented.
Abstract: Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

6,201 citations


Cites methods from "A draft map of the human proteome"

  • ...For commercial re-use, please contact journals.permissions@oup.com metabolic pathway resource stored in BioPAX format (16); gene and small-molecule perturbations from the LINCS L1000 data set; NCI-Nature pathways (17); protein complexes from the NURSA project (18); pathways from the PANTHER resource (19); targets of phosphatases from DEPOD (20); human phenotypes from the Human Phenotype Ontology (HPO) (21); genes associated with grants using NIH RePORTER and GeneRIF (22); transcription factor targets computed from the ChIP-seq data from the ENCODE project (23); differentially expressed genes from the Allen Brain Atlas (24); tissue expression extracted from the Genotype-Tissue Expression (GTEx) project (25); protein expression in tissues and cell types from ProteomicsDB (26) and the Human Proteome Map (HPM) (27); genes associated with cell survival from the Achilles Project (28); and more....

    [...]

  • ...metabolic pathway resource stored in BioPAX format (16); gene and small-molecule perturbations from the LINCS L1000 data set; NCI-Nature pathways (17); protein complexes from the NURSA project (18); pathways from the PANTHER resource (19); targets of phosphatases from DEPOD (20); human phenotypes from the Human Phenotype Ontology (HPO) (21); genes associated with grants using NIH RePORTER and GeneRIF (22); transcription factor targets computed from the ChIP-seq data from the ENCODE project (23); differentially expressed genes from the Allen Brain Atlas (24); tissue expression extracted from the Genotype-Tissue Expression (GTEx) project (25); protein expression in tissues and cell types from ProteomicsDB (26) and the Human Proteome Map (HPM) (27); genes associated with cell survival from the Achilles Project (28); and more....

    [...]

Journal ArticleDOI
TL;DR: The developments in PRIDE resources and related tools are summarized and a brief update on the resources under development 'PRIDE Cluster' and 'PRide Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available inPRIDE Archive are given.
Abstract: The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data Since the beginning of 2014, PRIDE Archive (http://wwwebiacuk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013 PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components PRIDE Archive supports the most-widely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month) We outline some statistics on the current PRIDE Archive data contents We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the ProteomeXchange submission tool Finally, we will give a brief update on the resources under development 'PRIDE Cluster' and 'PRIDE Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive

3,375 citations


Cites background from "A draft map of the human proteome"

  • ...Two of the most highly accessed data sets are those coming from the two drafts of the human proteome published in Nature in 2014 (11,12) (PXD000561 and PXD000865, respectively)....

    [...]

  • ...In addition to the PX resources, there are other valuable proteomics databases and resources available, providing protein expression information derived from MS proteomics data, most notably the Global Proteome Machine Database (GPMDB) (10), ProteomicsDB (11), the Human Proteome Map (12), MaxQB (13), Chorus, PaxDb (14) and MOPED (Multi-Omics Profiling Expression Database) (15), among others....

    [...]

Journal ArticleDOI
TL;DR: The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.
Abstract: Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.

2,209 citations

Journal ArticleDOI
TL;DR: The evidence for and against the ceRNA hypothesis are critically evaluated to assess the impact of endogenous miRNA-sponge interactions and to propose an alternative function for messenger RNAs.
Abstract: The competitive endogenous RNA (ceRNA) hypothesis proposes that transcripts with shared microRNA (miRNA) binding sites compete for post-transcriptional control. This hypothesis has gained substantial attention as a unifying function for long non-coding RNAs, pseudogene transcripts and circular RNAs, as well as an alternative function for messenger RNAs. Empirical evidence supporting the hypothesis is accumulating but not without attracting scepticism. Recent studies that model transcriptome-wide binding-site abundance suggest that physiological changes in expression of most individual transcripts will not compromise miRNA activity. In this Review, we critically evaluate the evidence for and against the ceRNA hypothesis to assess the impact of endogenous miRNA-sponge interactions.

1,463 citations

References
More filters
Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: A new computer program, Mascot, is presented, which integrates all three types of search for protein identification by searching a sequence database using mass spectrometry data, and the scoring algorithm is probability based.
Abstract: Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

8,195 citations

Journal Article
01 Jan 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

8,106 citations

Journal ArticleDOI
01 Nov 2012-Nature
TL;DR: It is shown that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites.
Abstract: By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.

7,710 citations

Journal ArticleDOI
13 Mar 2003-Nature
TL;DR: The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.
Abstract: Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein-protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.

6,597 citations


"A draft map of the human proteome" refers background or methods in this paper

  • ...The peptides identified were categorized either as (1) mapping intergenic regions (2) overlapping annotated genes (3) mapping to the intronic regions of existing gene models (4) overlapping annotated genes but translated in alternate reading frame....

    [...]

  • ...The databases used were: (1) six-frame-translated human genome database (2) threeframe-translated RefSeq mRNA sequences (3) three-frame-translated pseudogene database with sequences derived from NCBI and Gerstein’s pseudogene database (4) three-frame-translated non-coding RNAs from NONCODE (5) N-terminal peptide database derived from RefSeq mRNA sequences from NCBI and (6) signal peptide database from SignalP and HPRD....

    [...]

Related Papers (5)