Model-based Analysis of ChIP-Seq (MACS)

doi:10.1186/GB-2008-9-9-R137

Home
/
Papers
/
Model-based Analysis of ChIP-Seq (MACS)

Journal Article•DOI•

Model-based Analysis of ChIP-Seq (MACS)

Yong Zhang¹, Tao Liu¹, Clifford A. Meyer¹, Jérôme Eeckhoute², David S. Johnson, Bradley E. Bernstein¹, Bradley E. Bernstein³, Chad Nusbaum³, Richard M. Myers⁴, Myles Brown², Wei Li⁵, X. Shirley Liu¹ - Show less +8 more•Institutions (5)

Harvard University¹, Brigham and Women's Hospital², Broad Institute³, Stanford University⁴, Baylor College of Medicine⁵

17 Sep 2008-Genome Biology (BioMed Central)-Vol. 9, Iss: 9, pp 1-9

TL;DR: This work presents Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer, and uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.

read less

Abstract: We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features

[...]

Yang Liao¹, Gordon K. Smyth¹, Wei Shi¹•Institutions (1)

Walter and Eliza Hall Institute of Medical Research¹

01 Apr 2014-Bioinformatics

TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.

...read moreread less

Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

...read moreread less

14,103 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•

An integrated encyclopedia of DNA elements in the human genome.

[...]

ENCODEConsortium

01 Jan 2012-Nature

...read moreread less

8,106 citations

Journal Article•DOI•

High-resolution profiling of histone methylations in the human genome.

[...]

Artem Barski¹, Suresh Cuddapah¹, Kairong Cui¹, Tae-Young Roh¹, Dustin E. Schones¹, Zhibin Wang¹, Gang Wei¹, Iouri Chepelev², Keji Zhao¹ - Show less +5 more•Institutions (2)

National Institutes of Health¹, University of California, Los Angeles²

18 May 2007-Cell

TL;DR: High-resolution maps for the genome-wide distribution of 20 histone lysine and arginine methylations as well as histone variant H2A.Z, RNA polymerase II, and the insulator binding protein CTCF across the human genome using the Solexa 1G sequencing technology are generated.

...read moreread less

6,488 citations

Journal Article•DOI•

Topological domains in mammalian genomes identified by analysis of chromatin interactions

[...]

Jesse R. Dixon¹, Siddarth Selvaraj¹, Siddarth Selvaraj², Feng Yue¹, Audrey Kim¹, Yan-Yan Li¹, Yin-Zhong Shen¹, Ming Hu³, Jun Liu³, Bing Ren², Bing Ren¹ - Show less +7 more•Institutions (3)

Ludwig Institute for Cancer Research¹, University of California, San Diego², Harvard University³

17 May 2012-Nature

TL;DR: It is found that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.

...read moreread less

Abstract: The spatial organization of the genome is intimately linked to its biological function, yet our understanding of higher order genomic structure is coarse, fragmented and incomplete. In the nucleus of eukaryotic cells, interphase chromosomes occupy distinct chromosome territories, and numerous models have been proposed for how chromosomes fold within chromosome territories. These models, however, provide only few mechanistic details about the relationship between higher order chromatin structure and genome function. Recent advances in genomic technologies have led to rapid advances in the study of three-dimensional genome organization. In particular, Hi-C has been introduced as a method for identifying higher order chromatin interactions genome wide. Here we investigate the three-dimensional organization of the human and mouse genomes in embryonic stem cells and terminally differentiated cell types at unprecedented resolution. We identify large, megabase-sized local chromatin interaction domains, which we term 'topological domains', as a pervasive structural feature of the genome organization. These domains correlate with regions of the genome that constrain the spread of heterochromatin. The domains are stable across different cell types and highly conserved across species, indicating that topological domains are an inherent property of mammalian genomes. Finally, we find that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.

...read moreread less

5,774 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

High-resolution profiling of histone methylations in the human genome.

[...]

Artem Barski¹, Suresh Cuddapah¹, Kairong Cui¹, Tae-Young Roh¹, Dustin E. Schones¹, Zhibin Wang¹, Gang Wei¹, Iouri Chepelev², Keji Zhao¹ - Show less +5 more•Institutions (2)

National Institutes of Health¹, University of California, Los Angeles²

18 May 2007-Cell

...read moreread less

6,488 citations

"Model-based Analysis of ChIP-Seq (M..." refers background or result in this paper

...This implies that the λlocal is critical for ChIP-Seq studies when matching control samples are not available [5,9]....
[...]
...When applied to three human ChIP-Seq datasets to identify binding sites of FoxA1 in MCF7 cells, NRSF (neuron-restrictive silencer factor) in Jurkat T cells [8], and CTCF (CCCTC-binding factor) in CD4+ T cells [5] (summarized in Table S1 in Additional data file 1), MACS gives results superior to those produced by other published ChIP-Seq peak finding algorithms [8,11,12]....
[...]
...and sequencing (ChIP-Seq) [5-8] have become popular tech-...
[...]
...However, among the four recently published ChIP-Seq studies [5-8], one did not have a control sample [5] and only one of the three with control samples systematically used them to guide peak finding [8]....
[...]

Journal Article•DOI•

Global variation in copy number in the human genome

[...]

Richard Redon¹, Shumpei Ishikawa², Karen R. Fitch³, Lars Feuk⁴, George H. Perry⁵, T. Daniel Andrews¹, Heike Fiegler¹, Michael H. Shapero³, Andrew R. Carson⁴, Wenwei Chen³, Eun Kyung Cho⁶, Stephanie Dallaire⁶, Jennifer L. Freeman⁶, Juan R. González⁷, Mònica Gratacòs⁷, Jing Huang³, Dimitrios Kalaitzopoulos¹, Daisuke Komura², Jeffrey R. MacDonald⁴, Christian R. Marshall⁴, Rui Mei³, Lyndal Montgomery¹, Keunihiro Nishimura², Kohji Okamura⁴, Fan Shen³, Martin J. Somerville⁸, Joelle Tchinda⁶, Armand Valsesia¹, Cara Woodwark¹, Fengtang Yang¹, Junjun Zhang⁴, Tatiana Zerjal¹, Jane Zhang³, Lluís Armengol⁷, Donald F. Conrad⁹, Xavier Estivill⁷, Chris Tyler-Smith¹, Nigel P. Carter¹, Hiroyuki Aburatani², Charles Lee⁶, Keith W. Jones³, Stephen W. Scherer⁴, Matthew E. Hurles¹ - Show less +39 more•Institutions (9)

Wellcome Trust Sanger Institute¹, University of Tokyo², Thermo Fisher Scientific³, University of Toronto⁴, Brigham and Women's Hospital⁵, Harvard University⁶, Pompeu Fabra University⁷, University of Alberta⁸, University of Chicago⁹

23 Nov 2006-Nature

TL;DR: A first-generation CNV map of the human genome is constructed through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia, underscoring the importance of CNV in genetic diversity and evolution and the utility of this resource for genetic disease studies.

...read moreread less

Abstract: Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

...read moreread less

4,275 citations

Additional excerpts

...Second, ChIP-Seq data exhibit regional biases along the genome due to sequencing and mapping biases, chromatin structure and genome copy number variations [10]....
[...]

Journal Article•DOI•

Genome-wide maps of chromatin state in pluripotent and lineage-committed cells

[...]

Tarjei S. Mikkelsen¹, Manching Ku², Manching Ku¹, David B. Jaffe¹, Biju Issac¹, Biju Issac², Erez Lieberman Aiden³, Erez Lieberman Aiden¹, Georgia Giannoukos¹, Pablo Alvarez¹, William Brockman¹, Tae Kyung Kim⁴, Richard Koche², Richard Koche³, Richard Koche¹, William Lee¹, Eric M. Mendenhall¹, Eric M. Mendenhall², Aisling O'Donovan², Aviva Presser¹, Carsten Russ¹, Xiaohui Xie¹, Alexander Meissner³, Marius Wernig³, Rudolf Jaenisch³, Chad Nusbaum¹, Eric S. Lander¹, Eric S. Lander³, Bradley E. Bernstein², Bradley E. Bernstein¹ - Show less +26 more•Institutions (4)

Broad Institute¹, Harvard University², Massachusetts Institute of Technology³, Boston Children's Hospital⁴

02 Aug 2007-Nature

TL;DR: The application of single-molecule-based sequencing technology for high-throughput profiling of histone modifications in mammalian cells is reported and it is shown that chromatin state can be read in an allele-specific manner by using single nucleotide polymorphisms.

...read moreread less

Abstract: We report the application of single-molecule-based sequencing technology for high-throughput profiling of histone modifications in mammalian cells By obtaining over four billion bases of sequence from chromatin immunoprecipitated DNA, we generated genome-wide chromatin-state maps of mouse embryonic stem cells, neural progenitor cells and embryonic fibroblasts We find that lysine 4 and lysine 27 trimethylation effectively discriminates genes that are expressed, poised for expression, or stably repressed, and therefore reflect cell state and lineage potential Lysine 36 trimethylation marks primary coding and non-coding transcripts, facilitating gene annotation Trimethylation of lysine 9 and lysine 20 is detected at satellite, telomeric and active long-terminal repeats, and can spread into proximal unique sequences Lysine 4 and lysine 9 trimethylation marks imprinting control regions Finally, we show that chromatin state can be read in an allele-specific manner by using single nucleotide polymorphisms This study provides a framework for the application of comprehensive chromatin profiling towards characterization of diverse mammalian cell populations

...read moreread less

4,166 citations

"Model-based Analysis of ChIP-Seq (M..." refers background or methods in this paper

...With the current genome coverage of most ChIP-Seq experiments, tag distribution along the genome could be modeled by a Poisson distribution [7]....
[...]
...and sequencing (ChIP-Seq) [5-8] have become popular tech-...
[...]
...However, among the four recently published ChIP-Seq studies [5-8], one did not have a control sample [5] and only one of the three with control samples systematically used them to guide peak finding [8]....
[...]
...[7], we find that while the ChIP-Seq efficiency of the active mark H3K4me3 remains high as pluripotent cells differentiate, that of repressive marks H3K27me3 and H3K9me3 becomes lower with differentiation (Table S2 in Additional data file 1), even though it is likely that there are more targets for these repressive marks as cells differentiate....
[...]

Journal Article•DOI•

Genome-Wide Mapping of in Vivo Protein-DNA Interactions

[...]

David S. Johnson¹, Ali Mortazavi¹, Ali Mortazavi², Richard M. Myers², Richard M. Myers¹, Barbara J. Wold¹, Barbara J. Wold² - Show less +3 more•Institutions (2)

Stanford University¹, California Institute of Technology²

08 Jun 2007-Science

TL;DR: A large-scale chromatin immunoprecipitation assay based on direct ultrahigh-throughput DNA sequencing was developed, which was then used to map in vivo binding of the neuron-restrictive silencer factor (NRSF; also known as REST) to 1946 locations in the human genome.

...read moreread less

Abstract: In vivo protein-DNA interactions connect each transcription factor with its direct targets to form a gene network scaffold. To map these protein-DNA interactions comprehensively across entire mammalian genomes, we developed a large-scale chromatin immunoprecipitation assay (ChIPSeq) based on direct ultrahigh-throughput DNA sequencing. This sequence census method was then used to map in vivo binding of the neuron-restrictive silencer factor (NRSF; also known as REST, for repressor element–1 silencing transcription factor) to 1946 locations in the human genome. The data display sharp resolution of binding position [±50 base pairs (bp)], which facilitated our finding motifs and allowed us to identify noncanonical NRSF-binding motifs. These ChIPSeq data also have high sensitivity and specificity [ROC (receiver operator characteristic) area ≥ 0.96] and statistical confidence (P <10^(–4)), properties that were important for inferring new candidate interactions. These include key transcription factors in the gene network that regulates pancreatic islet cell development.

...read moreread less

2,789 citations

"Model-based Analysis of ChIP-Seq (M..." refers background or methods or result in this paper

...On the Genome Biology 2008, 9:R137 Genome Biology 2008, 9:R137 Comparison of MACS with ChIPSeq Peak Finder, FindPeaks and QuESTFigure 2 Comparison of MACS with ChIPSeq Peak Finder, FindPeaks and QuEST....
[...]
...and sequencing (ChIP-Seq) [5-8] have become popular tech-...
[...]
...Libraries were prepared as described in [8] using a PCR preamplification step and size selection for DNA fragments between 150 and 400 bp....
[...]
...When applied to three human ChIP-Seq datasets to identify binding sites of FoxA1 in MCF7 cells, NRSF (neuron-restrictive silencer factor) in Jurkat T cells [8], and CTCF (CCCTC-binding factor) in CD4+ T cells [5] (summarized in Table S1 in Additional data file 1), MACS gives results superior to those produced by other published ChIP-Seq peak finding algorithms [8,11,12]....
[...]
...For CTCF, since QuEST does not run on samples without controls, we only compared MACS to ChIPSeq Peak Finder and FindPeaks....
[...]

Patent•DOI•

Genome-wide location and function of dna binding proteins

[...]

John J. Wyrick¹, Richard A. Young¹, Bing Ren¹, François Robert¹, Itamar Simon¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

21 Dec 2001-Science

TL;DR: In this paper, a method for identifying a set of genes where cell cycle regulator binding correlates with gene expression and identifying genomic targets of cell cycle transcription activators in living cells is also encompassed.

...read moreread less

Abstract: The present invention relates to a method of identifying a region (one or more) of a genome of a cell to which a protein of interest binds. In the methods described herein, DNA binding protein of a cell is linked (e.g., covalently crosslinked) to genomic DNA of a cell. The genomic DNA to which the DNA binding protein is linked is removed and combined or contacted with DNA comprising a sequence complementary to genomic DNA of the cell under conditions in which hybridization between the identified genomic DNA and the sequence complementary to genomic DNA occurs. Region(s) of hybridization are region(s) of the genome of the cell to which the protein of binds. A method of identifying a set of genes where cell cycle regulator binding correlates with gene expression and of identifying genomic targets of cell cycle transcription activators in living cells is also encompassed.

...read moreread less

1,931 citations