Gene Ontology: tool for the unification of biology

doi:10.1038/75556

Home
/
Papers
/
Gene Ontology: tool for the unification of biology

Journal Article•DOI•

Gene Ontology: tool for the unification of biology

M Ashburner¹, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. M. Cherry, Allan Peter Davis, Kara Dolinski, Selina S. Dwight, J.T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna E. Lewis, John C. Matese, Joel E. Richardson, M. Ringwald, Gerald M. Rubin, Gavin Sherlock - Show less +16 more•Institutions (1)

Stanford University¹

01 May 2000-Nature Genetics (NIH Public Access)-Vol. 25, Iss: 1, pp 25-29

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

read less

Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

[...]

Paul Shannon¹, Andrew Markiel, Owen Ozier, Nitin S. Baliga, Jonathan T. Wang, Daniel Ramage, Nada Amin, Benno Schwikowski, Trey Ideker - Show less +5 more•Institutions (1)

Institute for Systems Biology¹

01 Nov 2003-Genome Research

TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

...read moreread less

Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

...read moreread less

32,980 citations

Cites background from "Gene Ontology: tool for the unifica..."

...Annotations typically correspond to an existing repository of knowledge, such as the Gene Ontology database (2)....
[...]

Journal Article•DOI•

limma powers differential expression analyses for RNA-sequencing and microarray studies

[...]

Matthew E. Ritchie¹, Belinda Phipson², Di Wu³, Yifang Hu¹, Charity W. Law⁴, Wei Shi¹, Gordon K. Smyth¹, Gordon K. Smyth⁵ - Show less +4 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, Royal Children's Hospital², Harvard University³, University of Zurich⁴, University of Melbourne⁵

20 Apr 2015-Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

22,147 citations

Journal Article•DOI•

clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters

[...]

Guangchuang Yu¹, Li Gen Wang, Yanyan Han, Qing-Yu He•Institutions (1)

Jinan University¹

03 May 2012-Omics A Journal of Integrative Biology

TL;DR: An R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters and can be easily extended to other species and ontologies is presented.

...read moreread less

Abstract: Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters The analysis module and visualization module were combined into a reusable workflow Currently, clusterProfiler supports three species, including humans, mice, and yeast Methods provided in this package can be easily extended to other species and ontologies The clusterProfiler package is released under Artistic-20 License within Bioconductor project The source code and vignette are freely available at http://bioconductororg/packages/release/bioc/html/clusterProfilerhtml

...read moreread less

16,644 citations

Cites background from "Gene Ontology: tool for the unifica..."

...For instance, Gene Ontology (GO) (Ashburner et al., 2000) annotates genes to biological processes, molecular functions, and cellular components in a directed acyclic graph structure, Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al., 2010) annotates genes to pathways, and Disease…...
[...]

Journal Article•DOI•

WGCNA: an R package for weighted correlation network analysis.

[...]

Peter Langfelder¹, Steve Horvath¹•Institutions (1)

University of California, Los Angeles¹

29 Dec 2008-BMC Bioinformatics

TL;DR: The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis that includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software.

...read moreread less

Abstract: Correlation networks are increasingly being used in bioinformatics applications For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets These methods have been successfully applied in various biological contexts, eg cancer, mouse genetics, yeast genetics, and analysis of brain imaging data While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software Along with the R package we also present R software tutorials While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings The WGCNA package provides R functions for weighted correlation network analysis, eg co-expression network analysis of gene expression data The R package along with its source code and additional material are freely available at http://wwwgeneticsuclaedu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA

...read moreread less

14,243 citations

Journal Article•DOI•

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

[...]

Da-Wei Huang¹, Brad T. Sherman¹, Richard A. Lempicki¹•Institutions (1)

Science Applications International Corporation¹

01 Jan 2009-Nucleic Acids Research

TL;DR: The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

...read moreread less

Abstract: Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

...read moreread less

13,102 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The COG database: a tool for genome-scale analysis of protein functions and evolution

[...]

Roman L. Tatusov¹, Michael Y. Galperin¹, Darren A. Natale¹, Eugene V. Koonin¹•Institutions (1)

National Institutes of Health¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes.

...read moreread less

Abstract: Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.

...read moreread less

3,656 citations

Journal Article•DOI•

The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999.

[...]

Amos Marc Bairoch¹, Rolf Apweiler²•Institutions (2)

University of Geneva¹, European Bioinformatics Institute²

01 Jan 1998-Nucleic Acids Research

TL;DR: The Human Proteomics Initiative (HPI), a major project to annotate all known human sequences according to the quality standards of SWISS-PROT, is described.

...read moreread less

Abstract: SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include: cross-references to additional databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except the CDS already included in SWISS-PROT. The URLs for SWISS-PROT on the WWW are: http://www.expasy.ch/sprot and http://www. ebi.ac.uk/sprot

...read moreread less

3,244 citations

Journal Article•

Genome sequence of the nematode C-elegans: A platform for investigating biology

[...]

Andrew R. Smith

11 Dec 1998-Science

3,185 citations

Journal Article•DOI•

The SWISS-PROT protein sequence data bank and its supplement TrEMBL

[...]

Amos Marc Bairoch¹, Rolf Apweiler²•Institutions (2)

University of Geneva¹, European Bioinformatics Institute²

01 Jan 1997-Nucleic Acids Research

TL;DR: This supplement consists of entries in SWiss-PROT-like format derived from the translation of all coding sequences in the EMBL nucleotide sequence database, except the CDS already included in SWISS- PROT.

...read moreread less

Abstract: SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, structure of its domains, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include: an increase in the number and scope of model organisms; cross-references to two additional databases; a variety of new documentation files and the creation of TrEMBL, a computer annotated supplement to SWISS-PROT. This supplement consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except the CDS already included in SWISS-PROT.

...read moreread less

1,828 citations

Journal Article•DOI•

Comparative Genomics of the Eukaryotes

[...]

Gerald M. Rubin¹, Mark Yandell², Jennifer R. Wortman², George L. Gabor, Miklos, Catherine R. Nelson³, Iswar K. Hariharan⁴, Mark E. Fortini⁵, Peter W. Li², Rolf Apweiler⁶, Wolfgang Fleischmann⁶, J. Michael Cherry⁷, Steven Henikoff⁸, Marian P. Skupski², Sima Misra³, Michael Ashburner⁶, Ewan Birney⁶, Mark S. Boguski⁹, Thomas Brody⁹, Peter Brokstein³, Susan E. Celniker¹⁰, Stephen A. Chervitz, David Coates¹¹, Anibal Cravchik², Andrei Gabrielian², Richard F. Galle¹⁰, William M. Gelbart⁴, Reed A. George¹⁰, Lawrence S.B. Goldstein¹², Fangcheng Gong², Ping Guan², Nomi L. Harris¹⁰, Bruce A. Hay¹³, Roger A. Hoskins¹⁰, Jiayin Li², Zhenya Li², Richard O. Hynes¹⁴, Steven J.M. Jones¹⁵, Peter M. Kuehl¹⁶, Bruno Lemaitre¹⁷, J. Troy Littleton¹⁴, Deborah K. Morrison⁹, Christopher J. Mungall¹⁰, Patrick H. O'Farrell¹⁸, Oxana K. Pickeral⁹, Chris Shue², Leslie B. Vosshall¹⁹, Jiong Zhang⁹, Qi Zhao², Xiangqun H. Zheng², Fei Zhong², Wenyan Zhong², Richard A. Gibbs²⁰, J. Craig Venter², Mark Raymond Adams², Suzanna E. Lewis³ - Show less +52 more•Institutions (20)

Howard Hughes Medical Institute¹, Celera Corporation², University of California, Berkeley³, Harvard University⁴, University of Pennsylvania⁵, Wellcome Trust⁶, Stanford University⁷, Fred Hutchinson Cancer Research Center⁸, National Institutes of Health⁹, Lawrence Berkeley National Laboratory¹⁰, University of Leeds¹¹, University of California, San Diego¹², California Institute of Technology¹³, Massachusetts Institute of Technology¹⁴, BC Cancer Research Centre¹⁵, University of Maryland, Baltimore¹⁶, Centre national de la recherche scientifique¹⁷, University of California, San Francisco¹⁸, Columbia University¹⁹, Baylor College of Medicine²⁰

24 Mar 2000-Science

TL;DR: The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.

...read moreread less

Abstract: A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae-and the proteins they are predicted to encode-was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.

...read moreread less

1,563 citations