scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2017"


Journal ArticleDOI
12 Oct 2017-Nature
TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Abstract: Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

3,289 citations


Journal ArticleDOI
15 Jun 2017-Cell
TL;DR: Integrative molecular HCC subtyping incorporating unsupervised clustering of five data platforms identified three subtypes, one of which was associated with poorer prognosis in three HCC cohorts and development of a p53 target gene expression signature correlating with poor survival was enabled.

1,623 citations


Journal ArticleDOI
19 Oct 2017-Cell
TL;DR: This work comprehensively mapped 3D chromatin organization during mouse neural differentiation in vitro and in vivo, generating the highest-resolution Hi-C maps available to date and shows that multiple factors influence the dynamics of chromatin interactions in development.

973 citations


Journal ArticleDOI
27 Sep 2017-Nature
TL;DR: It is shown that deletion of the cohesin-loading factor Nipbl in mouse liver leads to a marked reorganization of chromosomal folding, and the disappearance of TADs unmasks a finer compartment structure that accurately reflects the underlying epigenetic landscape.
Abstract: Imaging and chromosome conformation capture studies have revealed several layers of chromosome organization, including segregation into megabase-sized active and inactive compartments, and partitioning into sub-megabase domains (TADs) It remains unclear, however, how these layers of organization form, interact with one another and influence genome function Here we show that deletion of the cohesin-loading factor Nipbl in mouse liver leads to a marked reorganization of chromosomal folding TADs and associated Hi-C peaks vanish globally, even in the absence of transcriptional changes By contrast, compartmental segregation is preserved and even reinforced Strikingly, the disappearance of TADs unmasks a finer compartment structure that accurately reflects the underlying epigenetic landscape These observations demonstrate that the three-dimensional organization of the genome results from the interplay of two independent mechanisms: cohesin-independent segregation of the genome into fine-scale compartments, defined by chromatin state; and cohesin-dependent formation of TADs, possibly by loop extrusion, which helps to guide distant enhancers to their target genes

893 citations


Journal ArticleDOI
TL;DR: This updated Arabidopsis genome annotation with a substantially increased resolution of gene models will not only further the understanding of the biological processes of this plant model but also of other species.
Abstract: Summary The flowering plant Arabidopsis thaliana is a dicot model organism for research in many aspects of plant biology. A comprehensive annotation of its genome paves the way for understanding the functions and activities of all types of transcripts, including mRNA, the various classes of non-coding RNA, and small RNA. The TAIR10 annotation update had a profound impact on Arabidopsis research but was released more than 5 years ago. Maintaining the accuracy of the annotation continues to be a prerequisite for future progress. Using an integrative annotation pipeline, we assembled tissue-specific RNA-Seq libraries from 113 datasets and constructed 48 359 transcript models of protein-coding genes in eleven tissues. In addition, we annotated various classes of non-coding RNA including microRNA, long intergenic RNA, small nucleolar RNA, natural antisense transcript, small nuclear RNA, and small RNA using published datasets and in-house analytic results. Altogether, we identified 635 novel protein-coding genes, 508 novel transcribed regions, 5178 non-coding RNAs, and 35 846 small RNA loci that were formerly unannotated. Analysis of the splicing events and RNA-Seq based expression profiles revealed the landscapes of gene structures, untranslated regions, and splicing activities to be more intricate than previously appreciated. Furthermore, we present 692 uniformly expressed housekeeping genes, 43% of whose human orthologs are also housekeeping genes. This updated Arabidopsis genome annotation with a substantially increased resolution of gene models will not only further our understanding of the biological processes of this plant model but also of other species.

769 citations


Journal ArticleDOI
16 Feb 2017-Nature
TL;DR: This work measures the entire transcriptome of thousands of mouse liver cells and infer their lobule coordinates on the basis of a panel of zonated landmark genes, characterized with single-molecule fluorescence in situ hybridization and finds that around 50% of liver genes are significantly zonation and uncover abundant non-monotonic profiles that peak at the mid-lobule layers.
Abstract: The mammalian liver consists of hexagon-shaped lobules that are radially polarized by blood flow and morphogens. Key liver genes have been shown to be differentially expressed along the lobule axis, a phenomenon termed zonation, but a detailed genome-wide reconstruction of this spatial division of labour has not been achieved. Here we measure the entire transcriptome of thousands of mouse liver cells and infer their lobule coordinates on the basis of a panel of zonated landmark genes, characterized with single-molecule fluorescence in situ hybridization. Using this approach, we obtain the zonation profiles of all liver genes with high spatial resolution. We find that around 50% of liver genes are significantly zonated and uncover abundant non-monotonic profiles that peak at the mid-lobule layers. These include a spatial order of bile acid biosynthesis enzymes that matches their position in the enzymatic cascade. Our approach can facilitate the reconstruction of similar spatial genomic blueprints for other mammalian organs.

732 citations


Journal ArticleDOI
07 Dec 2017-Nature
TL;DR: Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.
Abstract: N6-methyladenosine (m6A) is an abundant internal RNA modification in both coding and non-coding RNAs that is catalysed by the METTL3-METTL14 methyltransferase complex. However, the specific role of these enzymes in cancer is still largely unknown. Here we define a pathway that is specific for METTL3 and is implicated in the maintenance of a leukaemic state. We identify METTL3 as an essential gene for growth of acute myeloid leukaemia cells in two distinct genetic screens. Downregulation of METTL3 results in cell cycle arrest, differentiation of leukaemic cells and failure to establish leukaemia in immunodeficient mice. We show that METTL3, independently of METTL14, associates with chromatin and localizes to the transcriptional start sites of active genes. The vast majority of these genes have the CAATT-box binding protein CEBPZ present at the transcriptional start site, and this is required for recruitment of METTL3 to chromatin. Promoter-bound METTL3 induces m6A modification within the coding region of the associated mRNA transcript, and enhances its translation by relieving ribosome stalling. We show that genes regulated by METTL3 in this way are necessary for acute myeloid leukaemia. Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.

705 citations


Journal ArticleDOI
TL;DR: An overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data are provided, and it is explained how these can be used to identify genes with a regulatory role in disease.
Abstract: Gene co-expression networks can be used to associate genes of unknown function with biological processes, to prioritize candidate disease genes or to discern transcriptional regulatory programmes. With recent advances in transcriptomics and next-generation sequencing, co-expression networks constructed from RNA sequencing data also enable the inference of functions and disease associations for non-coding genes and splice variants. Although gene co-expression networks typically do not provide information about causality, emerging methods for differential co-expression analysis are enabling the identification of regulatory genes underlying various phenotypes. Here, we introduce and guide researchers through a (differential) co-expression analysis. We provide an overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data, and we explain how these can be used to identify genes with a regulatory role in disease. Furthermore, we discuss the integration of other data types with co-expression networks and offer future perspectives of co-expression analysis.

700 citations


Journal ArticleDOI
TL;DR: It is shown that the ability of Cpf1 to process its own CRISPR RNA (crRNA) can be used to simplify multiplexed genome editing.
Abstract: Targeting of multiple genomic loci with Cas9 is limited by the need for multiple or large expression constructs. Here we show that the ability of Cpf1 to process its own CRISPR RNA (crRNA) can be used to simplify multiplexed genome editing. Using a single customized CRISPR array, we edit up to four genes in mammalian cells and three in the mouse brain, simultaneously.

673 citations


Journal ArticleDOI
TL;DR: Evidence suggests indels are a highly immunogenic mutational class, which can trigger an increased abundance of neoantigens and greater mutant-binding specificity.
Abstract: Summary Background The focus of tumour-specific antigen analyses has been on single nucleotide variants (SNVs), with the contribution of small insertions and deletions (indels) less well characterised. We investigated whether the frameshift nature of indel mutations, which create novel open reading frames and a large quantity of mutagenic peptides highly distinct from self, might contribute to the immunogenic phenotype. Methods We analysed whole-exome sequencing data from 5777 solid tumours, spanning 19 cancer types from The Cancer Genome Atlas. We compared the proportion and number of indels across the cohort, with a subset of results replicated in two independent datasets. We assessed in-silico tumour-specific neoantigen predictions by mutation type with pan-cancer analysis, together with RNAseq profiling in renal clear cell carcinoma cases (n=392), to compare immune gene expression across patient subgroups. Associations between indel burden and treatment response were assessed across four checkpoint inhibitor datasets. Findings We observed renal cell carcinomas to have the highest proportion (0·12) and number of indel mutations across the pan-cancer cohort (p −16 ), more than double the median proportion of indel mutations in all other cancer types examined. Analysis of tumour-specific neoantigens showed that enrichment of indel mutations for high-affinity binders was three times that of non-synonymous SNV mutations. Furthermore, neoantigens derived from indel mutations were nine times enriched for mutant specific binding, as compared with non-synonymous SNV derived neoantigens. Immune gene expression analysis in the renal clear cell carcinoma cohort showed that the presence of mutant-specific neoantigens was associated with upregulation of antigen presentation genes, which correlated ( r =0·78) with T-cell activation as measured by CD8-positive expression. Finally, analysis of checkpoint inhibitor response data revealed frameshift indel count to be significantly associated with checkpoint inhibitor response across three separate melanoma cohorts (p=4·7 × 10 −4 ). Interpretation Renal cell carcinomas have the highest pan-cancer proportion and number of indel mutations. Evidence suggests indels are a highly immunogenic mutational class, which can trigger an increased abundance of neoantigens and greater mutant-binding specificity. Funding Cancer Research UK, UK National Institute for Health Research (NIHR) at the Royal Marsden Hospital National Health Service Foundation Trust, Institute of Cancer Research and University College London Hospitals Biomedical Research Centres, the UK Medical Research Council, the Rosetrees Trust, Novo Nordisk Foundation, the Prostate Cancer Foundation, the Breast Cancer Research Foundation, the European Research Council.

666 citations


Journal ArticleDOI
14 Dec 2017-Cell
TL;DR: It is shown that the ubiquitously expressed transcription factor Yin Yang 1 (YY1) contributes to enhancer-promoter structural interactions in a manner analogous to DNA interactions mediated by CTCF.

Journal ArticleDOI
13 Mar 2017-Oncogene
TL;DR: In this paper, a survey of p53 target genes is presented, and the results show that high-confidence p53 targets are involved in multiple cellular responses, including cell cycle arrest, DNA repair, apoptosis, metabolism, autophagy, mRNA translation and feedback mechanisms.
Abstract: The tumor suppressor p53 functions primarily as a transcription factor. Mutation of the TP53 gene alters its response pathway, and is central to the development of many cancers. The discovery of a large number of p53 target genes, which confer p53's tumor suppressor function, has led to increasingly complex models of p53 function. Recent meta-analysis approaches, however, are simplifying our understanding of how p53 functions as a transcription factor. In the survey presented here, a total set of 3661 direct p53 target genes is identified that comprise 3509 potential targets from 13 high-throughput studies, and 346 target genes from individual gene analyses. Comparison of the p53 target genes reported in individual studies with those identified in 13 high-throughput studies reveals limited consistency. Here, p53 target genes have been evaluated based on the meta-analysis data, and the results show that high-confidence p53 target genes are involved in multiple cellular responses, including cell cycle arrest, DNA repair, apoptosis, metabolism, autophagy, mRNA translation and feedback mechanisms. However, many p53 target genes are identified only in a small number of studies and have a higher likelihood of being false positives. While numerous mechanisms have been proposed for mediating gene regulation in response to p53, recent advances in our understanding of p53 function show that p53 itself is solely an activator of transcription, and gene downregulation by p53 is indirect and requires p21. Taking into account the function of p53 as an activator of transcription, recent results point to an unsophisticated means of regulation.

Journal ArticleDOI
Aldo Scarpa, David K. Chang, Katia Nones1, Katia Nones2, Vincenzo Corbo, Ann-Marie Patch1, Ann-Marie Patch2, Peter Bailey3, Peter Bailey1, Rita T. Lawlor, Amber L. Johns4, David Miller1, Andrea Mafficini, Borislav Rusev, Maria Scardoni, Davide Antonello, Stefano Barbi, Katarzyna O. Sikora, Sara Cingarlini, Caterina Vicentini, Skye McKay4, Michael C.J. Quinn2, Michael C.J. Quinn1, Timothy J. C. Bruxner1, Angelika N. Christ1, Ivon Harliwong1, Senel Idrisoglu1, Suzanne McLean1, Craig Nourse1, Craig Nourse3, Ehsan Nourbakhsh1, Peter J. Wilson1, Matthew J. Anderson1, J. Lynn Fink1, Felicity Newell1, Felicity Newell2, Nick Waddell1, Oliver Holmes2, Oliver Holmes1, Stephen H. Kazakoff2, Stephen H. Kazakoff1, Conrad Leonard1, Conrad Leonard2, Scott Wood1, Scott Wood2, Qinying Xu1, Qinying Xu2, Shivashankar H. Nagaraj1, Eliana Amato, Irene Dalai, Samantha Bersani, Ivana Cataldo, Angelo Paolo Dei Tos5, Paola Capelli, Maria Vittoria Davì, Luca Landoni, Anna Malpaga, Marco Miotto, Vicki L. J. Whitehall1, Vicki L. J. Whitehall2, Barbara A. Leggett6, Barbara A. Leggett1, Barbara A. Leggett2, Janelle L. Harris2, Jonathan M. Harris7, Marc D. Jones3, Jeremy L. Humphris4, Lorraine A. Chantrill4, Venessa T. Chin4, Adnan Nagrial4, Marina Pajic4, Christopher J. Scarlett4, Christopher J. Scarlett8, Andreia V. Pinho4, Ilse Rooman4, Christopher W. Toon4, Jianmin Wu9, Jianmin Wu4, Mark Pinese4, Mark J. Cowley4, Andrew Barbour10, Amanda Mawson4, Emily S. Humphrey4, Emily K. Colvin4, Angela Chou4, Angela Chou11, Jessica A. Lovell4, Nigel B. Jamieson3, Nigel B. Jamieson12, Fraser Duthie3, Marie-Claude Gingras13, Marie-Claude Gingras14, William E. Fisher14, Rebecca A. Dagg15, Loretta Lau15, Michael Lee16, Hilda A. Pickett16, Roger R. Reddel16, Jaswinder S. Samra17, Jaswinder S. Samra18, James G. Kench19, James G. Kench18, James G. Kench4, Neil D. Merrett18, Neil D. Merrett20, Krishna Epari21, Nam Q. Nguyen22, Nikolajs Zeps23, Nikolajs Zeps24, Massimo Falconi, Michele Simbolo, Giovanni Butturini, George Van Buren14, Stefano Partelli, Matteo Fassan, Kum Kum Khanna2, Anthony J. Gill4, Anthony J. Gill18, David A. Wheeler13, Richard A. Gibbs13, Elizabeth A. Musgrove3, Claudio Bassi, Giampaolo Tortora, Paolo Pederzoli, John V. Pearson2, John V. Pearson1, Nicola Waddell1, Nicola Waddell2, Andrew V. Biankin, Sean M. Grimmond25 
02 Mar 2017-Nature
TL;DR: In this paper, the authors performed whole-genome sequencing of 102 primary pancreatic neuroendocrine tumours and defined the genomic events that characterize their pathogenesis, including a deficiency in G:C,>T:A base excision repair due to inactivation of MUTYH, which encodes a DNA glycosylase.
Abstract: The diagnosis of pancreatic neuroendocrine tumours (PanNETs) is increasing owing to more sensitive detection methods, and this increase is creating challenges for clinical management. We performed whole-genome sequencing of 102 primary PanNETs and defined the genomic events that characterize their pathogenesis. Here we describe the mutational signatures they harbour, including a deficiency in G:C > T:A base excision repair due to inactivation of MUTYH, which encodes a DNA glycosylase. Clinically sporadic PanNETs contain a larger-than-expected proportion of germline mutations, including previously unreported mutations in the DNA repair genes MUTYH, CHEK2 and BRCA2. Together with mutations in MEN1 and VHL, these mutations occur in 17% of patients. Somatic mutations, including point mutations and gene fusions, were commonly found in genes involved in four main pathways: chromatin remodelling, DNA damage repair, activation of mTOR signalling (including previously undescribed EWSR1 gene fusions), and telomere maintenance. In addition, our gene expression analyses identified a subgroup of tumours associated with hypoxia and HIF signalling.

Journal ArticleDOI
TL;DR: Using the RNA expression and protein sequencing assay (REAP-seq), the costimulatory effects of a CD27 agonist on human CD8+ lymphocytes and to identify and characterize an unknown cell type are assessed.
Abstract: We present a tool to measure gene and protein expression levels in single cells with DNA-labeled antibodies and droplet microfluidics. Using the RNA expression and protein sequencing assay (REAP-seq), we quantified proteins with 82 barcoded antibodies and >20,000 genes in a single workflow. We used REAP-seq to assess the costimulatory effects of a CD27 agonist on human CD8+ lymphocytes and to identify and characterize an unknown cell type.

Journal ArticleDOI
TL;DR: A model is emerging whereby lncRNA bridges DNA and protein by binding to chromatin and serving as a scaffold for modifying protein complexes, and can bridge promoters to enhancers or enhancer-like non-coding genes by regulating chromatin looping.

Journal ArticleDOI
TL;DR: The generation of the highly selective BCL-2 inhibitor venetoclax, which is now approved in the United States for the treatment of patients with chronic lymphocytic leukaemia with 17p deletion who have received at least one prior therapy, is reviewed.
Abstract: The B cell lymphoma 2 (BCL-2) family of proteins has a key role in regulating apoptosis and is often dysregulated in cancer. This has led to the development of several inhibitors of pro-survival BCL-2 family proteins such as BCL-2, BCL-XL and MCL1, including the BCL-2 inhibitor venetoclax, which has recently gained regulatory approval. Here, Ashkenazi and colleagues discuss the latest progress in developing small-molecule inhibitors of pro-survival BCL-2 family proteins.

Journal ArticleDOI
TL;DR: CjCas9, delivered via AAV, induces targeted mutations at high frequencies in mouse muscle cells or retinal pigment epithelium (RPE) cells, and reduces the size of laser-induced choroidal neovascularization, suggesting that in vivo genome editing with Cj Cas9 is a new option for the treatment of age-related macular degeneration.
Abstract: Several CRISPR-Cas9 orthologues have been used for genome editing. Here, we present the smallest Cas9 orthologue characterized to date, derived from Campylobacter jejuni (CjCas9), for efficient genome editing in vivo. After determining protospacer-adjacent motif (PAM) sequences and optimizing single-guide RNA (sgRNA) length, we package the CjCas9 gene, its sgRNA sequence, and a marker gene in an all-in-one adeno-associated virus (AAV) vector and produce the resulting virus at a high titer. CjCas9 is highly specific, cleaving only a limited number of sites in the human or mouse genome. CjCas9, delivered via AAV, induces targeted mutations at high frequencies in mouse muscle cells or retinal pigment epithelium (RPE) cells. Furthermore, CjCas9 targeted to the Vegfa or Hif1a gene in RPE cells reduces the size of laser-induced choroidal neovascularization, suggesting that in vivo genome editing with CjCas9 is a new option for the treatment of age-related macular degeneration.

Journal ArticleDOI
TL;DR: The first attempts to study the whole transcriptome began in the early 1990s, and technological advances since the late 1990s have made transcriptomics a widespread discipline as mentioned in this paper, which has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease.
Abstract: Transcriptomics technologies are the techniques used to study an organism’s transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst noncoding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. The first attempts to study the whole transcriptome began in the early 1990s, and technological advances since the late 1990s have made transcriptomics a widespread discipline. Transcriptomics has been defined by repeated technological innovations that transform the field. There are two key contemporary techniques in the field: microarrays, which quantify a set of predetermined sequences, and RNA sequencing (RNA-Seq), which uses high-throughput sequencing to capture all sequences. Measuring the expression of an organism’s genes in different tissues, conditions, or time points gives information on how genes are regulated and reveals details of an organism’s biology. It can also help to infer the functions of previously unannotated genes. Transcriptomic analysis has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease. An analysis of gene expression in its entirety allows detection of broad coordinated trends which cannot be discerned by more targeted assays.

Journal ArticleDOI
02 Mar 2017-Nature
TL;DR: It is shown that, in mouse embryonic stem cells, Dnmt3b-dependent intragenic DNA methylation protects the gene body from spurious RNA polymerase II entry and cryptic transcription initiation, with implications for intragenics hypomethylation in cancer.
Abstract: In mammals, DNA methylation occurs mainly at CpG dinucleotides. Methylation of the promoter suppresses gene expression, but the functional role of gene-body DNA methylation in highly expressed genes has yet to be clarified. Here we show that, in mouse embryonic stem cells, Dnmt3b-dependent intragenic DNA methylation protects the gene body from spurious RNA polymerase II entry and cryptic transcription initiation. Using different genome-wide approaches, we demonstrate that this Dnmt3b function is dependent on its enzymatic activity and recruitment to the gene body by H3K36me3. Furthermore, the spurious transcripts can either be degraded by the RNA exosome complex or capped, polyadenylated, and delivered to the ribosome to produce aberrant proteins. Elongating RNA polymerase II therefore triggers an epigenetic crosstalk mechanism that involves SetD2, H3K36me3, Dnmt3b and DNA methylation to ensure the fidelity of gene transcription initiation, with implications for intragenic hypomethylation in cancer.

Journal ArticleDOI
23 Feb 2017-Cell
TL;DR: In this paper, a gene essentiality dataset across 14 human acute myeloid leukemia (AML) cell lines was generated by using genome-wide CRISPR-based screens, which revealed new gene relationships, the essential substrates of enzymes and the molecular functions of uncharacterized proteins.

Journal ArticleDOI
TL;DR: Cell type scores calculated from these genes are concordant with flow cytometry and IHC readings, show high reproducibility in replicate RNA samples from FFPE tissue and enable detailed analyses of the anti-tumor immune response in TCGA.
Abstract: Assays of the abundance of immune cell populations in the tumor microenvironment promise to inform immune oncology research and the choice of immunotherapy for individual patients. We propose to measure the intratumoral abundance of various immune cell populations with gene expression. In contrast to IHC and flow cytometry, gene expression assays yield high information content from a clinically practical workflow. Previous studies of gene expression in purified immune cells have reported hundreds of genes showing enrichment in a single cell type, but the utility of these genes in tumor samples is unknown. We use co-expression patterns in large tumor gene expression datasets to evaluate previously reported candidate cell type marker genes lists, eliminate numerous false positives and identify a subset of high confidence marker genes. Using a novel statistical tool, we use co-expression patterns in 9986 samples from The Cancer Genome Atlas (TCGA) to evaluate previously reported cell type marker genes. We compare immune cell scores derived from these genes to measurements from flow cytometry and immunohistochemistry. We characterize the reproducibility of our cell scores in replicate runs of RNA extracted from FFPE tumor tissue. We identify a list of 60 marker genes whose expression levels measure 14 immune cell populations. Cell type scores calculated from these genes are concordant with flow cytometry and IHC readings, show high reproducibility in replicate RNA samples from FFPE tissue and enable detailed analyses of the anti-tumor immune response in TCGA. In an immunotherapy dataset, they separate responders and non-responders early on therapy and provide an intricate picture of the effects of checkpoint inhibition. Most genes previously reported to be enriched in a single cell type have co-expression patterns inconsistent with cell type specificity. Due to their concise gene set, computational simplicity and utility in tumor samples, these cell type gene signatures may be useful in future discovery research and clinical trials to understand how tumors and therapeutic intervention shape the immune response.

Journal ArticleDOI
02 Mar 2017-Nature
TL;DR: It is shown that ecDNA was found in nearly half of human cancers; its frequency varied by tumour type, but it was almost never found in normal cells, and the results suggest that ec DNA contributes to accelerated evolution in cancer.
Abstract: Human cells have twenty-three pairs of chromosomes. In cancer, however, genes can be amplified in chromosomes or in circular extrachromosomal DNA (ecDNA), although the frequency and functional importance of ecDNA are not understood. We performed whole-genome sequencing, structural modelling and cytogenetic analyses of 17 different cancer types, including analysis of the structure and function of chromosomes during metaphase of 2,572 dividing cells, and developed a software package called ECdetect to conduct unbiased, integrated ecDNA detection and analysis. Here we show that ecDNA was found in nearly half of human cancers; its frequency varied by tumour type, but it was almost never found in normal cells. Driver oncogenes were amplified most commonly in ecDNA, thereby increasing transcript level. Mathematical modelling predicted that ecDNA amplification would increase oncogene copy number and intratumoural heterogeneity more effectively than chromosomal amplification. We validated these predictions by quantitative analyses of cancer samples. The results presented here suggest that ecDNA contributes to accelerated evolution in cancer.

Journal ArticleDOI
TL;DR: Mouse DUX and human DUX4 are proposed as major drivers of the cleavage or 2C state, which is strongly resembling that of mouse 2C embryos.
Abstract: To better understand transcriptional regulation during human oogenesis and preimplantation development, we defined stage-specific transcription, which highlighted the cleavage stage as being highly distinctive. Here, we present multiple lines of evidence that a eutherian-specific multicopy retrogene, DUX4, encodes a transcription factor that activates hundreds of endogenous genes (for example, ZSCAN4, KDM4E and PRAMEF-family genes) and retroviral elements (MERVL/HERVL family) that define the cleavage-specific transcriptional programs in humans and mice. Remarkably, mouse Dux expression is both necessary and sufficient to convert mouse embryonic stem cells (mESCs) into 2-cell-embryo-like ('2C-like') cells, measured here by the reactivation of '2C' genes and repeat elements, the loss of POU5F1 (also known as OCT4) protein and chromocenters, and the conversion of the chromatin landscape (as assessed by transposase-accessible chromatin using sequencing (ATAC-seq)) to a state strongly resembling that of mouse 2C embryos. Thus, we propose mouse DUX and human DUX4 as major drivers of the cleavage or 2C state.

Journal ArticleDOI
13 Jul 2017-Cell
TL;DR: The genomes of malaria parasites contain many genes of unknown function and the level of genetic redundancy in a single-celled organism may reflect the degree of environmental variation it experiences, which helps rationalize both the relative successes of drugs and the greater difficulty of making an effective vaccine.

Journal ArticleDOI
TL;DR: The protein coding regions of 2,735 mutant lines of tetraploid and hexaploid wheat were sequenced and a public database including more than 10 million mutations was developed, enabling rapid identification of mutations in the different copies of the wheat genes.
Abstract: Comprehensive reverse genetic resources, which have been key to understanding gene function in diploid model organisms, are missing in many polyploid crops. Young polyploid species such as wheat, which was domesticated less than 10,000 y ago, have high levels of sequence identity among subgenomes that mask the effects of recessive alleles. Such redundancy reduces the probability of selection of favorable mutations during natural or human selection, but also allows wheat to tolerate high densities of induced mutations. Here we exploited this property to sequence and catalog more than 10 million mutations in the protein-coding regions of 2,735 mutant lines of tetraploid and hexaploid wheat. We detected, on average, 2,705 and 5,351 mutations per tetraploid and hexaploid line, respectively, which resulted in 35-40 mutations per kb in each population. With these mutation densities, we identified an average of 23-24 missense and truncation alleles per gene, with at least one truncation or deleterious missense mutation in more than 90% of the captured wheat genes per population. This public collection of mutant seed stocks and sequence data enables rapid identification of mutations in the different copies of the wheat genes, which can be combined to uncover previously hidden variation. Polyploidy is a central phenomenon in plant evolution, and many crop species have undergone recent genome duplication events. Therefore, the general strategy and methods developed herein can benefit other polyploid crops.

Journal ArticleDOI
23 Mar 2017-Nature
TL;DR: It is demonstrated that KZFPs partner with transposable elements to build a largely species-restricted layer of epigenetic regulation and exploit evolutionarily conserved fragments of transposability elements as regulatory platforms long after the arms race against these genetic invaders has ended.
Abstract: The human genome encodes some 350 Kruppel-associated box (KRAB) domain-containing zinc-finger proteins (KZFPs), the products of a rapidly evolving gene family that has been traced back to early tetrapods. The function of most KZFPs is unknown, but a few have been demonstrated to repress transposable elements in embryonic stem (ES) cells by recruiting the transcriptional regulator TRIM28 and associated mediators of histone H3 Lys9 trimethylation (H3K9me3)-dependent heterochromatin formation and DNA methylation. Depletion of TRIM28 in human or mouse ES cells triggers the upregulation of a broad range of transposable elements, and recent data based on a few specific examples have pointed to an arms race between hosts and transposable elements as an important driver of KZFP gene selection. Here, to obtain a global view of this phenomenon, we combined phylogenetic and genomic studies to investigate the evolutionary emergence of KZFP genes in vertebrates and to identify their targets in the human genome. First, we unexpectedly reassigned the root of the family to a common ancestor of coelacanths and tetrapods. Second, although we confirmed that the majority of KZFPs bind transposable elements and pinpoint cases of ongoing co-evolution, we found that most of their transposable element targets have lost all transposition potential. Third, by examining the interplay between human KZFPs and other transcriptional modulators, we obtained evidence that KZFPs exploit evolutionarily conserved fragments of transposable elements as regulatory platforms long after the arms race against these genetic invaders has ended. Together, our results demonstrate that KZFPs partner with transposable elements to build a largely species-restricted layer of epigenetic regulation.

Journal ArticleDOI
TL;DR: The power of transcriptome sequencing is demonstrated to molecularly diagnose 10% of mitochondriopathy patients and identify candidate genes for the remainder, and examples of intronic loss-of-function variants with pathological relevance are provided.
Abstract: Across a variety of Mendelian disorders, ∼50-75% of patients do not receive a genetic diagnosis by exome sequencing indicating disease-causing variants in non-coding regions. Although genome sequencing in principle reveals all genetic variants, their sizeable number and poorer annotation make prioritization challenging. Here, we demonstrate the power of transcriptome sequencing to molecularly diagnose 10% (5 of 48) of mitochondriopathy patients and identify candidate genes for the remainder. We find a median of one aberrantly expressed gene, five aberrant splicing events and six mono-allelically expressed rare variants in patient-derived fibroblasts and establish disease-causing roles for each kind. Private exons often arise from cryptic splice sites providing an important clue for variant prioritization. One such event is found in the complex I assembly factor TIMMDC1 establishing a novel disease-associated gene. In conclusion, our study expands the diagnostic tools for detecting non-exonic variants and provides examples of intronic loss-of-function variants with pathological relevance.

Journal ArticleDOI
TL;DR: 3'-UTRs seem to be major players in gene regulation that enable local functions, compartmentalization, and cooperativity, which makes them important tools for the regulation of phenotypic diversity of higher organisms.
Abstract: 3′-untranslated regions (3′-UTRs) are the noncoding parts of mRNAs. Compared to yeast, in humans, median 3′-UTR length has expanded approximately tenfold alongside an increased generation of alternative 3′-UTR isoforms. In contrast, the number of coding genes, as well as coding region length, has remained similar. This suggests an important role for 3′-UTRs in the biology of higher organisms. 3′-UTRs are best known to regulate diverse fates of mRNAs, including degradation, translation, and localization, but they can also function like long noncoding or small RNAs, as has been shown for whole 3′-UTRs as well as for cleaved fragments. Furthermore, 3′-UTRs determine the fate of proteins through the regulation of protein–protein interactions. They facilitate cotranslational protein complex formation, which establishes a role for 3′-UTRs as evolved eukaryotic operons. Whereas bacterial operons promote the interaction of two subunits, 3′-UTRs enable the formation of protein complexes with diverse compositions. ...

Journal ArticleDOI
TL;DR: It is shown that H3K27ac HiChIP generates high-resolution contact maps of active enhancers and target genes in rare primary human T cell subtypes and coronary artery smooth muscle cells, providing a principled means of assigning molecular functions to autoimmune and cardiovascular disease risk variants.
Abstract: The challenge of linking intergenic mutations to target genes has limited molecular understanding of human diseases. Here we show that H3K27ac HiChIP generates high-resolution contact maps of active enhancers and target genes in rare primary human T cell subtypes and coronary artery smooth muscle cells. Differentiation of naive T cells into T helper 17 cells or regulatory T cells creates subtype-specific enhancer-promoter interactions, specifically at regions of shared DNA accessibility. These data provide a principled means of assigning molecular functions to autoimmune and cardiovascular disease risk variants, linking hundreds of noncoding variants to putative gene targets. Target genes identified with HiChIP are further supported by CRISPR interference and activation at linked enhancers, by the presence of expression quantitative trait loci, and by allele-specific enhancer loops in patient-derived primary cells. The majority of disease-associated enhancers contact genes beyond the nearest gene in the linear genome, leading to a fourfold increase in the number of potential target genes for autoimmune and cardiovascular diseases.

Journal ArticleDOI
06 Apr 2017-Cell
TL;DR: Analysis of chromatin conformation during Drosophila embryogenesis offers insight into when spatial genome organization is first established during development and identifies a key factor that helps trigger this architecture.