Gene Ontology: tool for the unification of biology

doi:10.1038/75556

Home
/
Papers
/
Gene Ontology: tool for the unification of biology

Journal Article•DOI•

Gene Ontology: tool for the unification of biology

M Ashburner¹, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. M. Cherry, Allan Peter Davis, Kara Dolinski, Selina S. Dwight, J.T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna E. Lewis, John C. Matese, Joel E. Richardson, M. Ringwald, Gerald M. Rubin, Gavin Sherlock - Show less +16 more•Institutions (1)

Stanford University¹

01 May 2000-Nature Genetics (NIH Public Access)-Vol. 25, Iss: 1, pp 25-29

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

read less

Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

STRING v10: protein–protein interaction networks, integrated over the tree of life

[...]

Damian Szklarczyk¹, Andrea Franceschini¹, Stefan Wyder¹, Kristoffer Forslund, Davide Heller¹, Jaime Huerta-Cepas, Milan Simonovic¹, Alexander Roth¹, Alberto Santos², Kalliopi Tsafou², Michael Kuhn³, Peer Bork, Lars Juhl Jensen², Christian von Mering¹ - Show less +10 more•Institutions (3)

Swiss Institute of Bioinformatics¹, University of Copenhagen², Dresden University of Technology³

28 Jan 2015-Nucleic Acids Research

TL;DR: H hierarchical and self-consistent orthology annotations are introduced for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution in the STRING database.

...read moreread less

Abstract: The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.

...read moreread less

8,224 citations

Book Chapter•DOI•

Protein identification and analysis tools in the ExPASy server

[...]

Marc R. Wilkins¹, Elisabeth Gasteiger², Amos Marc Bairoch², Jean Emmanuel Sanchez³, Keith L. Williams¹, Ron D. Appel³, Denis Hochstrasser³ - Show less +3 more•Institutions (3)

Macquarie University¹, University of Geneva², Geneva College³

01 Jan 1999-Methods of Molecular Biology

TL;DR: Details are given about protein identification and analysis software that is available through the ExPASy World Wide Web server and the extensive annotation available in the Swiss-Prot database is used.

...read moreread less

Abstract: Protein identification and analysis software performs a central role in the investigation of proteins from two-dimensional (2-D) gels and mass spectrometry. For protein identification, the user matches certain empirically acquired information against a protein database to define a protein as already known or as novel. For protein analysis, information in protein databases can be used to predict certain properties about a protein, which can be useful for its empirical investigation. The two processes are thus complementary. Although there are numerous programs available for those applications, we have developed a set of original tools with a few main goals in mind. Specifically, these are: 1. To utilize the extensive annotation available in the Swiss-Prot database wherever possible, in particular the position-specific annotation in the Swiss-Prot feature tables to take into account posttranslational modifications and protein processing. 2. To develop tools specifically, but not exclusively, applicable to proteins prepared by two dimensional gel electrophoresis and peptide mass fingerprinting experiments. 3. To make all tools available on the World-Wide Web (WWW), and freely usable by the scientific community. In this chapter we give details about protein identification and analysis software that is available through the ExPASy World Wide Web server.

...read moreread less

8,007 citations

Journal Article•DOI•

UniProt: the Universal Protein knowledgebase

[...]

Rolf Apweiler¹, Amos Marc Bairoch, Cathy H. Wu, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria Jesus Martin, Darren A. Natale, Claire O'Donovan, Nicole Redaschi, Lai-Su L. Yeh - Show less +11 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2004-Nucleic Acids Research

TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.

...read moreread less

Abstract: To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.

...read moreread less

7,298 citations

Cites background from "Gene Ontology: tool for the unifica..."

...This enables them to contribute to the work of the gene ontology (GO) consortium (9) by assigning GO terms during the annotation process as they extract information related to each of the GO ontologies, i....
[...]

Journal Article•DOI•

Metascape provides a biologist-oriented resource for the analysis of systems-level datasets.

[...]

Yingyao Zhou¹, Bin Zhou¹, Lars Pache², Max W. Chang³, Alireza Hadj Khodabakhshi¹, Olga Tanaseichuk¹, Christopher Benner³, Sumit K. Chanda² - Show less +4 more•Institutions (3)

Genomics Institute of the Novartis Research Foundation¹, Discovery Institute², University of California, San Diego³

03 Apr 2019-Nature Communications

TL;DR: A biologist-oriented portal that provides a gene list annotation, enrichment and interactome resource and enables integrated analysis of multi-OMICs datasets, Metascape is an effective and efficient tool for experimental biologists to comprehensively analyze and interpret OMICs-based studies in the big data era.

...read moreread less

Abstract: A critical component in the interpretation of systems-level studies is the inference of enriched biological pathways and protein complexes contained within OMICs datasets Successful analysis requires the integration of a broad set of current biological databases and the application of a robust analytical pipeline to produce readily interpretable results Metascape is a web-based portal designed to provide a comprehensive gene list annotation and analysis resource for experimental biologists In terms of design features, Metascape combines functional enrichment, interactome analysis, gene annotation, and membership search to leverage over 40 independent knowledgebases within one integrated portal Additionally, it facilitates comparative analyses of datasets across multiple independent and orthogonal experiments Metascape provides a significantly simplified user experience through a one-click Express Analysis interface to generate interpretable outputs Taken together, Metascape is an effective and efficient tool for experimental biologists to comprehensively analyze and interpret OMICs-based studies in the big data era

...read moreread less

6,282 citations

Journal Article•DOI•

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

[...]

Maxim V. Kuleshov¹, Matthew R. Jones¹, Andrew D. Rouillard¹, Nicolas F. Fernandez¹, Qiaonan Duan¹, Zichen Wang¹, Simon Koplev¹, Sherry L. Jenkins¹, Kathleen M. Jagodnik², Alexander Lachmann¹, Michael G. McDermott¹, Caroline D. Monteiro¹, Gregory W. Gundersen¹, Avi Ma'ayan¹ - Show less +10 more•Institutions (2)

Icahn School of Medicine at Mount Sinai¹, Glenn Research Center²

08 Jul 2016-Nucleic Acids Research

TL;DR: A significant update to one of the tools in this domain called Enrichr, a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries is presented.

...read moreread less

Abstract: Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

...read moreread less

6,201 citations

Cites background or methods from "Gene Ontology: tool for the unifica..."

...To extract gene sets from gene expression data deposited in the GEO (29), we established a crowdsourcing microtask project that asks participants to extract gene sets from GEO for the following categories: (1) single-gene perturbations in mammalian cells; (2) comparison of diseased versus normal tissues; (3) single-drug perturbations in mammalian cells; (4) perturbations applied to MCF7 cells; (5) comparison between young and old mammalian tissues; (6) endogenous ligand perturbations of mammalian cells; and (7) comparison of before and after pathogen infection of human cells....
[...]
...The Gene Ontology (GO), which was first published in the year 2000 (1), introduced the concept of associating a collection of genes with a functional biological term in a systematic way....
[...]

1
2
3
4
5
6
…
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Cluster analysis and display of genome-wide expression patterns

[...]

Michael B. Eisen¹, Paul T. Spellman¹, Patrick O. Brown¹, David Botstein¹•Institutions (1)

Stanford University¹

08 Dec 1998-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.

...read moreread less

Abstract: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

...read moreread less

16,371 citations

Journal Article•DOI•

The Pfam protein families database

[...]

Marco Punta¹, Penny Coggill¹, Ruth Y. Eberhardt¹, Jaina Mistry¹, John Tate¹, Chris Boursnell¹, Ningze Pang¹, Kristoffer Forslund¹, Goran Ceric¹, Jody Clements¹, Andreas Heger¹, Liisa Holm¹, Erik L. L. Sonnhammer¹, Sean R. Eddy¹, Alex Bateman¹, Robert D. Finn¹ - Show less +12 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

...read moreread less

14,075 citations

Journal Article•DOI•

The genome sequence of Drosophila melanogaster

[...]

Mark Raymond Adams¹, Susan E. Celniker², Robert A. Holt¹, Cheryl A. Evans¹ +191 more•Institutions (23)

24 Mar 2000-Science

TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.

...read moreread less

Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

...read moreread less

6,180 citations

"Gene Ontology: tool for the unifica..." refers background in this paper

...2 ); and the fruitfly Drosophila melanogaster , completed earlier this yea...
[...]

Journal Article•DOI•

Comprehensive Identification of Cell Cycle–regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization

[...]

Paul T. Spellman¹, Gavin Sherlock¹, Gavin Sherlock², Michael Q. Zhang², Vishwanath R. Iyer¹, Kirk R. Anders¹, Michael B. Eisen¹, Patrick O. Brown¹, Patrick O. Brown³, David Botstein¹, Bruce Futcher² - Show less +7 more•Institutions (3)

Stanford University¹, Cold Spring Harbor Laboratory², Howard Hughes Medical Institute³

01 Dec 1998-Molecular Biology of the Cell

TL;DR: A comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle is created, and it is found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins.

...read moreread less

Abstract: We sought to create a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. To this end, we used DNA microarrays and samples from yeast cultures sync...

...read moreread less

5,176 citations

"Gene Ontology: tool for the unifica..." refers background in this paper

...Another use for GO ontologies that is gaining rapid adherence is the annotation of gene-expression data, especially after these have been clustered by similarities in pattern of gene expressio...
[...]

Journal Article•DOI•

Life with 6000 Genes

[...]

André Goffeau¹, Bart Barrell, Howard Bussey², Ronald W. Davis³, Bernard Dujon⁴, Horst Feldmann⁵, Francis Galibert⁶, J D Hoheisel, Claude Jacq⁷, Mark Johnston⁸, Edward J. Louis⁹, Hans-Werner Mewes¹⁰, Yasufumi Murakami, Peter Philippsen¹¹, Hervé Tettelin¹, Stephen G. Oliver¹² - Show less +12 more•Institutions (12)

Université catholique de Louvain¹, McGill University², Stanford University³, Pierre-and-Marie-Curie University⁴, Ludwig Maximilian University of Munich⁵, Centre national de la recherche scientifique⁶, École Normale Supérieure⁷, Washington University in St. Louis⁸, John Radcliffe Hospital⁹, Max Planck Society¹⁰, University of Basel¹¹, University of Manchester¹²

25 Oct 1996-Science

TL;DR: The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration and provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history.

...read moreread less

Abstract: The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.

...read moreread less

4,254 citations

"Gene Ontology: tool for the unifica..." refers background in this paper

...Functional conservation requires a common language for annotation Nowhere is the impact of the grand biological unification more evident than in the eukaryotes, where the genomic sequences of three model systems are already available (budding yeast, Saccharomyces cerevisiae , completed in 1996 (ref...
[...]