Transcriptional regulatory code of a eukaryotic genome

doi:10.1038/NATURE02800

Home
/
Papers
/
Transcriptional regulatory code of a eukaryotic genome

Journal Article•DOI•

Transcriptional regulatory code of a eukaryotic genome

Christopher T. Harbison¹, D. Benjamin Gordon¹, Tong Ihn Lee¹, Nicola J. Rinaldi¹, Kenzie D MacIsaac¹, Timothy Danford¹, Nancy M. Hannett¹, Jean-Bosco Tagne¹, David B. Reynolds¹, Jane Yoo¹, Ezra G. Jennings¹, Julia Zeitlinger¹, Dmitry K. Pokholok¹, Manolis Kellis¹, Manolis Kellis², P. Alex Rolfe¹, Ken T. Takusagawa¹, Eric S. Lander¹, Eric S. Lander², David K. Gifford², David K. Gifford¹, Ernest Fraenkel¹, Richard A. Young¹, Richard A. Young² - Show less +20 more•Institutions (2)

Massachusetts Institute of Technology¹, Broad Institute²

02 Sep 2004-Nature (Nature Publishing Group)-Vol. 431, Iss: 7004, pp 99-104

TL;DR: An initial map of yeast's transcriptional regulatory code is constructed by identifying the sequence elements that are bound by regulators under various conditions and that are conserved among Saccharomyces species.

read less

Abstract: DNA-binding transcriptional regulators interpret the genome's regulatory code by binding to specific sequences to induce or repress gene expression. Comparative genomics has recently been used to identify potential cis-regulatory sequences within the yeast genome on the basis of phylogenetic conservation, but this information alone does not reveal if or when transcriptional regulators occupy these binding sites. We have constructed an initial map of yeast's transcriptional regulatory code by identifying the sequence elements that are bound by regulators under various conditions and that are conserved among Saccharomyces species. The organization of regulatory elements in promoters and the environment-dependent use of these elements by regulators are discussed. We find that environment-specific use of regulatory elements predicts mechanistic models for the function of a large population of yeast's transcriptional regulators.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.

[...]

Jeremiah J. Faith¹, Boris Hayete¹, Joshua T. Thaden¹, Ilaria Mogno¹, Ilaria Mogno², Jamey Wierzbowski¹, Guillaume Cottarel¹, Simon Kasif¹, James J. Collins¹, Timothy S. Gardner¹ - Show less +6 more•Institutions (2)

Boston University¹, Sapienza University of Rome²

09 Jan 2007-PLOS Biology

TL;DR: The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference algorithms using experimental data.

...read moreread less

Abstract: Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the global performance of four existing classes of inference algorithms using 445 Escherichia coli Affymetrix arrays and 3,216 known E. coli regulatory interactions from RegulonDB. We also developed and applied the context likelihood of relatedness (CLR) algorithm, a novel extension of the relevance networks class of algorithms. CLR demonstrates an average precision gain of 36% relative to the next-best performing algorithm. At a 60% true positive rate, CLR identifies 1,079 regulatory interactions, of which 338 were in the previously known network and 741 were novel predictions. We tested the predicted interactions for three transcription factors with chromatin immunoprecipitation, confirming 21 novel interactions and verifying our RegulonDB-based performance estimates. CLR also identified a regulatory link providing central metabolic control of iron transport, which we confirmed with real-time quantitative PCR. The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference algorithms using experimental data.

...read moreread less

1,587 citations

Journal Article•DOI•

Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise

[...]

John R. S. Newman¹, Sina Ghaemmaghami², Sina Ghaemmaghami¹, Jan Ihmels¹, David K. Breslow¹, Matthew Noble¹, Joseph L. DeRisi¹, Joseph L. DeRisi², Jonathan S. Weissman¹ - Show less +5 more•Institutions (2)

Howard Hughes Medical Institute¹, University of California, San Francisco²

15 Jun 2006-Nature

TL;DR: A strategy that pairs high-throughput flow cytometry and a library of GFP-tagged yeast strains to monitor rapidly and precisely protein levels at single-cell resolution is presented, revealing a remarkable structure to biological noise.

...read moreread less

Abstract: A major goal of biology is to provide a quantitative description of cellular behaviour. This task, however, has been hampered by the difficulty in measuring protein abundances and their variation. Here we present a strategy that pairs high-throughput flow cytometry and a library of GFP-tagged yeast strains to monitor rapidly and precisely protein levels at single-cell resolution. Bulk protein abundance measurements of >2,500 proteins in rich and minimal media provide a detailed view of the cellular response to these conditions, and capture many changes not observed by DNA microarray analyses. Our single-cell data argue that noise in protein expression is dominated by the stochastic production/destruction of messenger RNAs. Beyond this global trend, there are dramatic protein-specific differences in noise that are strongly correlated with a protein's mode of transcription and its function. For example, proteins that respond to environmental changes are noisy whereas those involved in protein synthesis are quiet. Thus, these studies reveal a remarkable structure to biological noise and suggest that protein noise levels have been selected to reflect the costs and potential benefits of this variation.

...read moreread less

1,550 citations

Journal Article•DOI•

Genome-wide map of nucleosome acetylation and methylation in yeast.

[...]

Dmitry K. Pokholok¹, Christopher T. Harbison¹, Stuart S. Levine¹, Megan F. Cole¹, Nancy M. Hannett¹, Tong Ihn Lee¹, George W. Bell¹, Kimberly Walker¹, P. Alex Rolfe¹, Elizabeth Herbolsheimer¹, Julia Zeitlinger¹, Fran Lewitter¹, David K. Gifford¹, Richard A. Young¹ - Show less +10 more•Institutions (1)

Massachusetts Institute of Technology¹

26 Aug 2005-Cell

TL;DR: These maps take into account changes in nucleosome occupancy at actively transcribed genes and, in doing so, revise previous assessments of the modifications associated with gene expression, providing the foundation for further understanding the roles of chromatin in gene expression and genome maintenance.

...read moreread less

1,483 citations

Cites background or methods from "Transcriptional regulatory code of ..."

...001) (Harbison et al., 2004), the presence of a perfect or near perfect Gcn4 consensus binding site (TGASTCA) in the region of −400 bp to +50 bp, and a greater than 2-fold change in steady-state mRNA levels dependent on Gcn4 when shifted to amino acid starvation medium (Natarajan et al....
[...]
...Conserved binding sites for transcriptional regulators (Harbison et al., 2004) are depicted as colored boxes....
[...]
...Global Map of Histone Marks We recently mapped the locations of conserved transcription-factor binding sites throughout the yeast genome (Harbison et al., 2004)....
[...]
...These esults demonstrate that the new array and protocol odifications provide substantially higher resolution nd accuracy than our previous method using selfrinted arrays (Harbison et al., 2004; Lee et al., 2002)....
[...]
...A positive list of 84 genes (Table S1) was selected on the basis of previous highconfidence binding data (p % 0.001) (Harbison et al., 2004), the presence of a perfect or near perfect Gcn4 consensus binding site (TGASTCA) in the region of −400 bp to +50 bp, and a greater than 2-fold change in…...
[...]

Journal Article•DOI•

Wisdom of crowds for robust gene network inference

[...]

Daniel Marbach¹, James C. Costello², Robert Küffner³, Nicole M. Vega², Robert J. Prill⁴, Diogo M. Camacho⁵, Kyle R. Allison², Manolis Kellis⁶, James J. Collins⁷, Gustavo Stolovitzky⁴ - Show less +6 more•Institutions (7)

Massachusetts Institute of Technology¹, Boston University², Ludwig Maximilian University of Munich³, IBM⁴, Pfizer⁵, Broad Institute⁶, Harvard University⁷

01 Aug 2012-Nature Methods

TL;DR: In this paper, the authors performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data.

...read moreread less

Abstract: Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.

...read moreread less

1,424 citations

Journal Article•

The Genomic Code for Nucleosome Positioning

[...]

Jonathan Widom

10 Mar 2008-Bulletin of the American Physical Society

TL;DR: In this article, a nucleosome-DNA interaction model was proposed to predict the genome-wide organization of nucleosomes, and it was shown that genomes encode an intrinsic nucleosomal organization and that this intrinsic organization can explain ∼50% of the in-vivo positions.

...read moreread less

Abstract: Eukaryotic genomes are packaged into nucleosome particles that occlude the DNA from interacting with most DNA binding proteins. Nucleosomes have higher affinity for particular DNA sequences, reflecting the ability of the sequence to bend sharply, as required by the nucleosome structure. However, it is not known whether these sequence preferences have a significant influence on nucleosome position in vivo, and thus regulate the access of other proteins to DNA. Here we isolated nucleosome-bound sequences at high resolution from yeast and used these sequences in a new computational approach to construct and validate experimentally a nucleosome–DNA interaction model, and to predict the genome-wide organization of nucleosomes. Our results demonstrate that genomes encode an intrinsic nucleosome organization and that this intrinsic organization can explain ∼50% of the in vivo nucleosome positions. This nucleosome positioning code may facilitate specific chromosome functions including transcription factor binding, transcription initiation, and even remodelling of the nucleosomes themselves.

...read moreread less

1,399 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

[...]

Julie D. Thompson, Desmond G. Higgins, Toby J. Gibson

11 Nov 1994-Nucleic Acids Research

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.

...read moreread less

Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

...read moreread less

63,427 citations

Book•

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

[...]

Trevor Hastie¹, Robert Tibshirani, Jerome H. Friedman•Institutions (1)

University of New South Wales¹

28 Jul 2013

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.

...read moreread less

Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

...read moreread less

19,261 citations

Journal Article•DOI•

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

[...]

David Ruppert

01 Jun 2004-Journal of the American Statistical Association

TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.

...read moreread less

Abstract: (2004). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Journal of the American Statistical Association: Vol. 99, No. 466, pp. 567-567.

...read moreread less

10,549 citations

Journal Article•DOI•

Initial sequencing and comparative analysis of the mouse genome.

[...]

Robert H. Waterston¹, Kerstin Lindblad-Toh², Ewan Birney, Jane Rogers³ +219 more•Institutions (26)

05 Dec 2002-Nature

TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.

...read moreread less

Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

...read moreread less

6,643 citations