scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking.

TL;DR: A framework for inferring cancer-related gene overexpression resulting from CRE reorganization by integrating SCNAs, gene expression data and information on topologically associating domains (TADs) is presented and enables systematic inference of CRE rearrangements mediating dysregulation in cancer.
Abstract: Extensive prior research focused on somatic copy-number alterations (SCNAs) affecting cancer genes, yet the extent to which recurrent SCNAs exert their influence through rearrangement of cis-regulatory elements (CREs) remains unclear. Here we present a framework for inferring cancer-related gene overexpression resulting from CRE reorganization (e.g., enhancer hijacking) by integrating SCNAs, gene expression data and information on topologically associating domains (TADs). Analysis of 7,416 cancer genomes uncovered several pan-cancer candidate genes, including IRS4, SMARCA1 and TERT. We demonstrate that IRS4 overexpression in lung cancer is associated with recurrent deletions in cis, and we present evidence supporting a tumor-promoting role. We additionally pursued cancer-type-specific analyses and uncovered IGF2 as a target for enhancer hijacking in colorectal cancer. Recurrent tandem duplications intersecting with a TAD boundary mediate de novo formation of a 3D contact domain comprising IGF2 and a lineage-specific super-enhancer, resulting in high-level gene activation. Our framework enables systematic inference of CRE rearrangements mediating dysregulation in cancer.
Citations
More filters
01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Posted ContentDOI
12 Jul 2017-bioRxiv
TL;DR: The integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types represents the most comprehensive look at cancer whole genomes to date.
Abstract: We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient9s tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.

735 citations

Journal ArticleDOI
Paul A. Northcott1, Paul A. Northcott2, Ivo Buchhalter3, Ivo Buchhalter2, A. Sorana Morrissy, Volker Hovestadt2, Joachim Weischenfeldt4, Tobias Ehrenberger5, Susanne Gröbner2, Maia Segura-Wang6, Thomas Zichner6, Vasilisa A. Rudneva, Hans-Jörg Warnatz7, Nikos Sidiropoulos4, Aaron H. Phillips1, Steven E. Schumacher8, Kortine Kleinheinz2, Sebastian M. Waszak6, Serap Erkek6, Serap Erkek2, David T.W. Jones2, Barbara C. Worst2, Marcel Kool2, Marc Zapatka2, Natalie Jäger2, Lukas Chavez2, Barbara Hutter2, Matthias Bieg2, Nagarajan Paramasivam2, Nagarajan Paramasivam3, Michael Heinold2, Michael Heinold3, Zuguang Gu2, Naveed Ishaque2, Christina Jäger-Schmidt2, Charles D. Imbusch2, Alke Jugold2, Daniel Hübschmann2, Daniel Hübschmann9, Daniel Hübschmann3, Thomas Risch7, Vyacheslav Amstislavskiy7, Francisco German Rodriguez Gonzalez4, Ursula D. Weber2, Stephan Wolf2, Giles W. Robinson1, Xin Zhou1, Gang Wu1, David Finkelstein1, Yanling Liu1, Florence M.G. Cavalli, Betty Luu, Vijay Ramaswamy, Xiaochong Wu, Jan Koster, Marina Ryzhova, Yoon Jae Cho10, Scott L. Pomeroy11, Christel Herold-Mende3, Martin U. Schuhmann12, Martin Ebinger, Linda M. Liau13, Jaume Mora14, Roger E. McLendon15, Nada Jabado16, Toshihiro Kumabe17, Eric Chuah18, Yussanne Ma18, Richard A. Moore18, Andrew J. Mungall18, Karen Mungall18, Nina Thiessen18, Kane Tse18, Tina Wong18, Steven J.M. Jones18, Olaf Witt9, Till Milde9, Andreas von Deimling9, David Capper9, Andrey Korshunov9, Marie-Laure Yaspo7, Richard W. Kriwacki1, Amar Gajjar1, Jinghui Zhang1, Rameen Beroukhim8, Ernest Fraenkel5, Jan O. Korbel6, Benedikt Brors2, Matthias Schlesner2, Roland Eils3, Roland Eils2, Marco A. Marra18, Stefan M. Pfister2, Stefan M. Pfister9, Michael D. Taylor19, Peter Lichter2 
19 Jul 2017-Nature
TL;DR: The application of integrative genomics to an extensive cohort of clinical samples derived from a single childhood cancer entity revealed a series of cancer genes and biologically relevant subtype diversity that represent attractive therapeutic targets for the treatment of patients with medulloblastoma.
Abstract: Current therapies for medulloblastoma, a highly malignant childhood brain tumour, impose debilitating effects on the developing child, and highlight the need for molecularly targeted treatments with reduced toxicity. Previous studies have been unable to identify the full spectrum of driver genes and molecular processes that operate in medulloblastoma subgroups. Here we analyse the somatic landscape across 491 sequenced medulloblastoma samples and the molecular heterogeneity among 1,256 epigenetically analysed cases, and identify subgroup-specific driver alterations that include previously undiscovered actionable targets. Driver mutations were confidently assigned to most patients belonging to Group 3 and Group 4 medulloblastoma subgroups, greatly enhancing previous knowledge. New molecular subtypes were differentially enriched for specific driver events, including hotspot in-frame insertions that target KBTBD4 and 'enhancer hijacking' events that activate PRDM6. Thus, the application of integrative genomics to an extensive cohort of clinical samples derived from a single childhood cancer entity revealed a series of cancer genes and biologically relevant subtype diversity that represent attractive therapeutic targets for the treatment of patients with medulloblastoma.

706 citations

01 Nov 2015
TL;DR: Wala et al. as discussed by the authors investigated whether proto-oncogenes occur within these structures and whether oncogene activation can occur via disruption of insulated neighborhood boundaries in cancer cells.
Abstract: The spread of bad neighborhoods Our genomes have complex three-dimensional (3D) arrangements that partition and regulate gene expression. Cancer cells frequently have their genomes grossly rearranged, disturbing this intricate 3D organization. Hnisz et al. show that the disruption of these 3D neighborhoods can bring oncogenes under the control of regulatory elements normally kept separate from them (see the Perspective by Wala and Beroukim). These novel juxtapositions can result in the inappropriate activation of oncogenes. Science, this issue p. 1454; see also p. 1398 Disrupting the boundaries between three-dimensional neighborhoods in the genome can activate cancer-promoting genes. [Also see Perspective by Wala and Beroukim] Oncogenes are activated through well-known chromosomal alterations such as gene fusion, translocation, and focal amplification. In light of recent evidence that the control of key genes depends on chromosome structures called insulated neighborhoods, we investigated whether proto-oncogenes occur within these structures and whether oncogene activation can occur via disruption of insulated neighborhood boundaries in cancer cells. We mapped insulated neighborhoods in T cell acute lymphoblastic leukemia (T-ALL) and found that tumor cell genomes contain recurrent microdeletions that eliminate the boundary sites of insulated neighborhoods containing prominent T-ALL proto-oncogenes. Perturbation of such boundaries in nonmalignant cells was sufficient to activate proto-oncogenes. Mutations affecting chromosome neighborhood boundaries were found in many types of cancer. Thus, oncogene activation can occur via genetic alterations that disrupt insulated neighborhoods in malignant cells.

553 citations

Journal ArticleDOI
TL;DR: The authors review the role of genetic structural variation in disease and the pathogenic potential of changes to the 3D genome.
Abstract: Structural and quantitative chromosomal rearrangements, collectively referred to as structural variation (SV), contribute to a large extent to the genetic diversity of the human genome and thus are of high relevance for cancer genetics, rare diseases and evolutionary genetics. Recent studies have shown that SVs can not only affect gene dosage but also modulate basic mechanisms of gene regulation. SVs can alter the copy number of regulatory elements or modify the 3D genome by disrupting higher-order chromatin organization such as topologically associating domains. As a result of these position effects, SVs can influence the expression of genes distant from the SV breakpoints, thereby causing disease. The impact of SVs on the 3D genome and on gene expression regulation has to be considered when interpreting the pathogenic potential of these variant types.

451 citations

References
More filters
Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: It is demonstrated in macrophages and B cells that collaborative interactions of the common factor PU.1 with small sets of macrophage- or B cell lineage-determining transcription factors establish cell-specific binding sites that are associated with the majority of promoter-distal H3K4me1-marked genomic regions.

9,620 citations

Journal ArticleDOI
TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Abstract: Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers. Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies 1‐8 . Because the effects of stratification vary in proportion to the number of samples 9 , stratification will be an increasing problem in the large-scale association studies of the future, which will analyze thousands of samples in an effort to detect common genetic variants of weak effect. The two prevailing methods for dealing with stratification are genomic control and structured association 9‐14 . Although genomic control and structured association have proven useful in a variety of contexts, they have limitations. Genomic control corrects for stratification by adjusting association statistics at each marker by a uniform overall inflation factor. However, some markers differ in their allele frequencies across ancestral populations more than others. Thus, the uniform adjustment applied by genomic control may be insufficient at markers having unusually strong differentiation across ancestral populations and may be superfluous at markers devoid of such differentiation, leading to a loss in power. Structured association uses a program such as STRUCTURE 15 to assign the samples to discrete subpopulation clusters and then aggregates evidence of association within each cluster. If fractional membership in more than one cluster is allowed, the method cannot currently be applied to genome-wide association studies because of its intensive computational cost on large data sets. Furthermore, assignments of individuals to clusters are highly sensitive to the number of clusters, which is not well defined 14,16 .

9,387 citations

Related Papers (5)