scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A comparative encyclopedia of DNA elements in the mouse genome

Feng Yue1, Feng Yue2, Yong Cheng3, Alessandra Breschi, Jeff Vierstra4, Weisheng Wu5, Weisheng Wu1, Tyrone Ryba6, Tyrone Ryba7, Richard Sandstrom4, Zhihai Ma3, Carrie A. Davis8, Benjamin D. Pope6, Yin Shen2, Dmitri D. Pervouchine, Sarah Djebali, Robert E. Thurman4, Rajinder Kaul4, Eric Rynes4, Anthony Kirilusha9, Georgi K. Marinov9, Brian A. Williams9, Diane Trout9, Henry Amrhein9, Katherine I. Fisher-Aylor9, Igor Antoshechkin9, Gilberto DeSalvo9, Lei Hoon See8, Meagan Fastuca8, Jorg Drenkow8, Chris Zaleski8, Alexander Dobin8, Pablo Prieto, Julien Lagarde, Giovanni Bussotti, Andrea Tanzer10, Olgert Denas11, Kanwei Li11, M. A. Bender4, M. A. Bender12, Miaohua Zhang12, Rachel Byron12, Mark Groudine12, Mark Groudine4, David McCleary2, Long Pham2, Zhen Ye2, Samantha Kuan2, Lee Edsall2, Yi-Chieh Wu13, Matthew D. Rasmussen13, Mukul S. Bansal13, Manolis Kellis13, Manolis Kellis14, Cheryl A. Keller1, Christapher S. Morrissey1, Tejaswini Mishra1, Deepti Jain1, Nergiz Dogan1, Robert S. Harris1, Philip Cayting3, Trupti Kawli3, Alan P. Boyle5, Alan P. Boyle3, Ghia Euskirchen3, Anshul Kundaje3, Shin Lin3, Yiing Lin3, Camden Jansen15, Venkat S. Malladi3, Melissa S. Cline16, Drew T. Erickson3, Vanessa M. Kirkup16, Katrina Learned16, Cricket A. Sloan3, Kate R. Rosenbloom16, Beatriz Lacerda de Sousa17, Kathryn Beal, Miguel Pignatelli, Paul Flicek, Jin Lian18, Tamer Kahveci19, Dongwon Lee20, W. James Kent16, Miguel Santos17, Javier Herrero21, Cedric Notredame, Audra K. Johnson4, Shinny Vong4, Kristen Lee4, Daniel Bates4, Fidencio Neri4, Morgan Diegel4, Theresa K. Canfield4, Peter J. Sabo4, Matthew S. Wilken4, Thomas A. Reh4, Erika Giste4, Anthony Shafer4, Tanya Kutyavin4, Eric Haugen4, Douglas Dunn4, Alex Reynolds4, Shane Neph4, Richard Humbert4, R. Scott Hansen4, Marella F. T. R. de Bruijn22, Licia Selleri23, Alexander Y. Rudensky24, Steven Z. Josefowicz24, Robert M. Samstein24, Evan E. Eichler4, Stuart H. Orkin25, Dana N. Levasseur26, Thalia Papayannopoulou4, Kai Hsin Chang4, Arthur I. Skoultchi27, Srikanta Gosh27, Christine M. Disteche4, Piper M. Treuting4, Yanli Wang1, Mitchell J. Weiss, Gerd A. Blobel28, Xiaoyi Cao2, Sheng Zhong2, Ting Wang29, Peter J. Good30, Rebecca F. Lowdon30, Rebecca F. Lowdon29, Leslie B. Adams31, Leslie B. Adams30, Xiao Qiao Zhou30, Michael J. Pazin30, Elise A. Feingold30, Barbara J. Wold9, James Taylor11, Ali Mortazavi15, Sherman M. Weissman18, John A. Stamatoyannopoulos4, Michael Snyder3, Roderic Guigó, Thomas R. Gingeras8, David M. Gilbert6, Ross C. Hardison1, Michael A. Beer20, Bing Ren2 
20 Nov 2014-Nature (Nature Publishing Group)-Vol. 515, Iss: 7527, pp 355-364
TL;DR: The mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types as mentioned in this paper.
Abstract: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
29 Jul 2020-Nature
TL;DR: The authors summarize the data produced by phase III of the Encyclopedia of DNA Elements (ENCODE) project, a resource for better understanding of the human and mouse genomes, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development.
Abstract: The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.

999 citations

Journal ArticleDOI
27 Sep 2017-Nature
TL;DR: It is shown that deletion of the cohesin-loading factor Nipbl in mouse liver leads to a marked reorganization of chromosomal folding, and the disappearance of TADs unmasks a finer compartment structure that accurately reflects the underlying epigenetic landscape.
Abstract: Imaging and chromosome conformation capture studies have revealed several layers of chromosome organization, including segregation into megabase-sized active and inactive compartments, and partitioning into sub-megabase domains (TADs) It remains unclear, however, how these layers of organization form, interact with one another and influence genome function Here we show that deletion of the cohesin-loading factor Nipbl in mouse liver leads to a marked reorganization of chromosomal folding TADs and associated Hi-C peaks vanish globally, even in the absence of transcriptional changes By contrast, compartmental segregation is preserved and even reinforced Strikingly, the disappearance of TADs unmasks a finer compartment structure that accurately reflects the underlying epigenetic landscape These observations demonstrate that the three-dimensional organization of the genome results from the interplay of two independent mechanisms: cohesin-independent segregation of the genome into fine-scale compartments, defined by chromatin state; and cohesin-dependent formation of TADs, possibly by loop extrusion, which helps to guide distant enhancers to their target genes

893 citations

Journal ArticleDOI
TL;DR: The ENCODE blacklist is defined- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.
Abstract: Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.

850 citations

Journal ArticleDOI
TL;DR: Long intergenic non-coding RNA genes have diverse features that distinguish them from mRNA-encoding genes and exercise functions such as remodelling chromatin and genome architecture, RNA stabilization and transcription regulation, including enhancer-associated activity.
Abstract: Long intergenic non-coding RNA (lincRNA) genes have diverse features that distinguish them from mRNA-encoding genes and exercise functions such as remodelling chromatin and genome architecture, RNA stabilization and transcription regulation, including enhancer-associated activity. Some genes currently annotated as encoding lincRNAs include small open reading frames (smORFs) and encode functional peptides and thus may be more properly classified as coding RNAs. lincRNAs may broadly serve to fine-tune the expression of neighbouring genes with remarkable tissue specificity through a diversity of mechanisms, highlighting our rapidly evolving understanding of the non-coding genome.

829 citations

Journal ArticleDOI
20 Nov 2014-Nature
TL;DR: It is demonstrated that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replicationdomain boundaries, largely accounting for the previously reported lack of alignment.
Abstract: Eukaryotic chromosomes replicate in a temporal order known as the replication-timing program. In mammals, replication timing is cell-type-specific with at least half the genome switching replication timing during development, primarily in units of 400-800 kilobases ('replication domains'), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements. Early and late replication correlate, respectively, with open and closed three-dimensional chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, late replication correlates with lamina-associated domains (LADs). Recent Hi-C mapping has unveiled substructure within chromatin compartments called topologically associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to replication domains. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure. Here we localize boundaries of replication domains to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replication domain boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type-specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell-type-specific sub-nuclear compartmentalization and replication timing with developmentally stable structural domains and offer a unified model for large-scale chromosome structure and function.

783 citations

References
More filters
Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,643 citations

Journal ArticleDOI
17 May 2012-Nature
TL;DR: It is found that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.
Abstract: The spatial organization of the genome is intimately linked to its biological function, yet our understanding of higher order genomic structure is coarse, fragmented and incomplete. In the nucleus of eukaryotic cells, interphase chromosomes occupy distinct chromosome territories, and numerous models have been proposed for how chromosomes fold within chromosome territories. These models, however, provide only few mechanistic details about the relationship between higher order chromatin structure and genome function. Recent advances in genomic technologies have led to rapid advances in the study of three-dimensional genome organization. In particular, Hi-C has been introduced as a method for identifying higher order chromatin interactions genome wide. Here we investigate the three-dimensional organization of the human and mouse genomes in embryonic stem cells and terminally differentiated cell types at unprecedented resolution. We identify large, megabase-sized local chromatin interaction domains, which we term 'topological domains', as a pervasive structural feature of the genome organization. These domains correlate with regions of the genome that constrain the spread of heterochromatin. The domains are stable across different cell types and highly conserved across species, indicating that topological domains are an inherent property of mammalian genomes. Finally, we find that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.

5,774 citations

Journal ArticleDOI
14 Jun 2007-Nature
TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.
Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

5,091 citations

Journal ArticleDOI
Sarah Djebali, Carrie A. Davis1, Angelika Merkel, Alexander Dobin1, Timo Lassmann, Ali Mortazavi2, Ali Mortazavi3, Andrea Tanzer, Julien Lagarde, Wei Lin1, Felix Schlesinger1, Chenghai Xue1, Georgi K. Marinov3, Jainab Khatun4, Brian A. Williams3, Chris Zaleski1, Joel Rozowsky5, Marion S. Röder, Felix Kokocinski6, Rehab F. Abdelhamid, Tyler Alioto, Igor Antoshechkin3, Michael T. Baer1, Nadav Bar7, Philippe Batut1, Kimberly Bell1, Ian Bell8, Sudipto K. Chakrabortty1, Xian Chen9, Jacqueline Chrast10, Joao Curado, Thomas Derrien, Jorg Drenkow1, Erica Dumais8, Jacqueline Dumais8, Radha Duttagupta8, Emilie Falconnet11, Meagan Fastuca1, Kata Fejes-Toth1, Pedro G. Ferreira, Sylvain Foissac8, Melissa J. Fullwood12, Hui Gao8, David Gonzalez, Assaf Gordon1, Harsha P. Gunawardena9, Cédric Howald10, Sonali Jha1, Rory Johnson, Philipp Kapranov8, Brandon King3, Colin Kingswood, Oscar Junhong Luo12, Eddie Park2, Kimberly Persaud1, Jonathan B. Preall1, Paolo Ribeca, Brian A. Risk4, Daniel Robyr11, Michael Sammeth, Lorian Schaffer3, Lei-Hoon See1, Atif Shahab12, Jørgen Skancke7, Ana Maria Suzuki, Hazuki Takahashi, Hagen Tilgner13, Diane Trout3, Nathalie Walters10, Huaien Wang1, John A. Wrobel4, Yanbao Yu9, Xiaoan Ruan12, Yoshihide Hayashizaki, Jennifer Harrow6, Mark Gerstein5, Tim Hubbard6, Alexandre Reymond10, Stylianos E. Antonarakis11, Gregory J. Hannon1, Morgan C. Giddings4, Morgan C. Giddings9, Yijun Ruan12, Barbara J. Wold3, Piero Carninci, Roderic Guigó14, Thomas R. Gingeras1, Thomas R. Gingeras8 
06 Sep 2012-Nature
TL;DR: Evidence that three-quarters of the human genome is capable of being transcribed is reported, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs that prompt a redefinition of the concept of a gene.
Abstract: Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

4,450 citations