Gene Ontology: tool for the unification of biology

doi:10.1038/75556

Home
/
Papers
/
Gene Ontology: tool for the unification of biology

Journal Article•DOI•

Gene Ontology: tool for the unification of biology

M Ashburner¹, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. M. Cherry, Allan Peter Davis, Kara Dolinski, Selina S. Dwight, J.T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna E. Lewis, John C. Matese, Joel E. Richardson, M. Ringwald, Gerald M. Rubin, Gavin Sherlock - Show less +16 more•Institutions (1)

Stanford University¹

01 May 2000-Nature Genetics (NIH Public Access)-Vol. 25, Iss: 1, pp 25-29

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

read less

Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

[...]

Paul Shannon¹, Andrew Markiel, Owen Ozier, Nitin S. Baliga, Jonathan T. Wang, Daniel Ramage, Nada Amin, Benno Schwikowski, Trey Ideker - Show less +5 more•Institutions (1)

Institute for Systems Biology¹

01 Nov 2003-Genome Research

TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

...read moreread less

Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

...read moreread less

32,980 citations

Cites background from "Gene Ontology: tool for the unifica..."

...Annotations typically correspond to an existing repository of knowledge, such as the Gene Ontology database (2)....
[...]

Journal Article•DOI•

limma powers differential expression analyses for RNA-sequencing and microarray studies

[...]

Matthew E. Ritchie¹, Belinda Phipson², Di Wu³, Yifang Hu¹, Charity W. Law⁴, Wei Shi¹, Gordon K. Smyth¹, Gordon K. Smyth⁵ - Show less +4 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, Royal Children's Hospital², Harvard University³, University of Zurich⁴, University of Melbourne⁵

20 Apr 2015-Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

22,147 citations

Journal Article•DOI•

clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters

[...]

Guangchuang Yu¹, Li Gen Wang, Yanyan Han, Qing-Yu He•Institutions (1)

Jinan University¹

03 May 2012-Omics A Journal of Integrative Biology

TL;DR: An R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters and can be easily extended to other species and ontologies is presented.

...read moreread less

Abstract: Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters The analysis module and visualization module were combined into a reusable workflow Currently, clusterProfiler supports three species, including humans, mice, and yeast Methods provided in this package can be easily extended to other species and ontologies The clusterProfiler package is released under Artistic-20 License within Bioconductor project The source code and vignette are freely available at http://bioconductororg/packages/release/bioc/html/clusterProfilerhtml

...read moreread less

16,644 citations

Cites background from "Gene Ontology: tool for the unifica..."

...For instance, Gene Ontology (GO) (Ashburner et al., 2000) annotates genes to biological processes, molecular functions, and cellular components in a directed acyclic graph structure, Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al., 2010) annotates genes to pathways, and Disease…...
[...]

Journal Article•DOI•

WGCNA: an R package for weighted correlation network analysis.

[...]

Peter Langfelder¹, Steve Horvath¹•Institutions (1)

University of California, Los Angeles¹

29 Dec 2008-BMC Bioinformatics

TL;DR: The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis that includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software.

...read moreread less

Abstract: Correlation networks are increasingly being used in bioinformatics applications For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets These methods have been successfully applied in various biological contexts, eg cancer, mouse genetics, yeast genetics, and analysis of brain imaging data While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software Along with the R package we also present R software tutorials While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings The WGCNA package provides R functions for weighted correlation network analysis, eg co-expression network analysis of gene expression data The R package along with its source code and additional material are freely available at http://wwwgeneticsuclaedu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA

...read moreread less

14,243 citations

Journal Article•DOI•

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

[...]

Da-Wei Huang¹, Brad T. Sherman¹, Richard A. Lempicki¹•Institutions (1)

Science Applications International Corporation¹

01 Jan 2009-Nucleic Acids Research

TL;DR: The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

...read moreread less

Abstract: Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

...read moreread less

13,102 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Cluster analysis and display of genome-wide expression patterns

[...]

Michael B. Eisen¹, Paul T. Spellman¹, Patrick O. Brown¹, David Botstein¹•Institutions (1)

Stanford University¹

08 Dec 1998-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.

...read moreread less

Abstract: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

...read moreread less

16,371 citations

Journal Article•DOI•

The Pfam protein families database

[...]

Marco Punta¹, Penny Coggill¹, Ruth Y. Eberhardt¹, Jaina Mistry¹, John Tate¹, Chris Boursnell¹, Ningze Pang¹, Kristoffer Forslund¹, Goran Ceric¹, Jody Clements¹, Andreas Heger¹, Liisa Holm¹, Erik L. L. Sonnhammer¹, Sean R. Eddy¹, Alex Bateman¹, Robert D. Finn¹ - Show less +12 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

...read moreread less

14,075 citations

Journal Article•DOI•

The genome sequence of Drosophila melanogaster

[...]

Mark Raymond Adams¹, Susan E. Celniker², Robert A. Holt¹, Cheryl A. Evans¹ +191 more•Institutions (23)

24 Mar 2000-Science

TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.

...read moreread less

Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

...read moreread less

6,180 citations

"Gene Ontology: tool for the unifica..." refers background in this paper

...2 ); and the fruitfly Drosophila melanogaster , completed earlier this yea...
[...]

Journal Article•DOI•

Comprehensive Identification of Cell Cycle–regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization

[...]

Paul T. Spellman¹, Gavin Sherlock¹, Gavin Sherlock², Michael Q. Zhang², Vishwanath R. Iyer¹, Kirk R. Anders¹, Michael B. Eisen¹, Patrick O. Brown¹, Patrick O. Brown³, David Botstein¹, Bruce Futcher² - Show less +7 more•Institutions (3)

Stanford University¹, Cold Spring Harbor Laboratory², Howard Hughes Medical Institute³

01 Dec 1998-Molecular Biology of the Cell

TL;DR: A comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle is created, and it is found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins.

...read moreread less

Abstract: We sought to create a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. To this end, we used DNA microarrays and samples from yeast cultures sync...

...read moreread less

5,176 citations

"Gene Ontology: tool for the unifica..." refers background in this paper

...Another use for GO ontologies that is gaining rapid adherence is the annotation of gene-expression data, especially after these have been clustered by similarities in pattern of gene expressio...
[...]

Journal Article•DOI•

Life with 6000 Genes

[...]

André Goffeau¹, Bart Barrell, Howard Bussey², Ronald W. Davis³, Bernard Dujon⁴, Horst Feldmann⁵, Francis Galibert⁶, J D Hoheisel, Claude Jacq⁷, Mark Johnston⁸, Edward J. Louis⁹, Hans-Werner Mewes¹⁰, Yasufumi Murakami, Peter Philippsen¹¹, Hervé Tettelin¹, Stephen G. Oliver¹² - Show less +12 more•Institutions (12)

Université catholique de Louvain¹, McGill University², Stanford University³, Pierre-and-Marie-Curie University⁴, Ludwig Maximilian University of Munich⁵, Centre national de la recherche scientifique⁶, École Normale Supérieure⁷, Washington University in St. Louis⁸, John Radcliffe Hospital⁹, Max Planck Society¹⁰, University of Basel¹¹, University of Manchester¹²

25 Oct 1996-Science

TL;DR: The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration and provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history.

...read moreread less

Abstract: The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.

...read moreread less

4,254 citations

"Gene Ontology: tool for the unifica..." refers background in this paper

...Functional conservation requires a common language for annotation Nowhere is the impact of the grand biological unification more evident than in the eukaryotes, where the genomic sequences of three model systems are already available (budding yeast, Saccharomyces cerevisiae , completed in 1996 (ref...
[...]