Showing papers in "Genome Research in 2014"

PDF

Open Access

Journal Article•DOI•

Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins

[...]

Sojung Kim¹, Daesik Kim¹, Seung Woo Cho¹, Jung-Eun Kim¹, Jin-Soo Kim¹ - Show less +1 more•Institutions (1)

01 Jun 2014-Genome Research

TL;DR: Delivery of purified recombinant Cas9 protein and guide RNA into cultured human cells including hard-to-transfect fibroblasts and pluripotent stem cells is delivered and RGEN ribonucleoproteins (RNPs) induce site-specific mutations at frequencies of up to 79%, while reducing off- target mutations associated with plasmid transfection at off-target sites.

...read moreread less

Abstract: RNA-guided engineered nucleases (RGENs) derived from the prokaryotic adaptive immune system known as CRISPR (clustered, regularly interspaced, short palindromic repeat)/Cas (CRISPR-associated) enable genome editing in human cell lines, animals, and plants, but are limited by off-target effects and unwanted integration of DNA segments derived from plasmids encoding Cas9 and guide RNA at both on-target and off-target sites in the genome. Here, we deliver purified recombinant Cas9 protein and guide RNA into cultured human cells including hard-to-transfect fibroblasts and pluripotent stem cells. RGEN ribonucleoproteins (RNPs) induce site-specific mutations at frequencies of up to 79%, while reducing off-target mutations associated with plasmid transfection at off-target sites that differ by one or two nucleotides from on-target sites. RGEN RNPs cleave chromosomal DNA almost immediately after delivery and are degraded rapidly in cells, reducing off-target effects. Furthermore, RNP delivery is less stressful to human embryonic stem cells, producing at least twofold more colonies than does plasmid transfection.

...read moreread less

1,526 citations

Journal Article•DOI•

Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases

[...]

Seung Woo Cho¹, Sojung Kim¹, Yongsub Kim¹, Jiyeon Kweon¹, Heon Seok Kim¹, Sangsu Bae¹, Jin-Soo Kim¹ - Show less +3 more•Institutions (1)

Seoul National University¹

01 Jan 2014-Genome Research

TL;DR: Off-target effects of RGENs can be reduced below the detection limits of deep sequencing by choosing unique target sequences in the genome and modifying both guide RNA and Cas9, and paired nickases induced chromosomal deletions in a targeted manner without causing unwanted translocations.

...read moreread less

Abstract: RNA-guided endonucleases (RGENs), derived from the prokaryotic adaptive immune system known as CRISPR/Cas, enable targeted genome engineering in cells and organisms. RGENs are ribonucleoproteins that consist of guide RNA and Cas9, a protein component originated from Streptococcus pyogenes. These enzymes cleave chromosomal DNA, whose sequence is complementary, to guide RNA in a targeted manner, producing site-specific DNA double-strand breaks (DSBs), the repair of which gives rise to targeted genome modifications. Despite broad interest in RGEN-mediated genome editing, these nucleases are limited by off-target mutations and unwanted chromosomal translocations associated with off-target DNA cleavages. Here, we show that off-target effects of RGENs can be reduced below the detection limits of deep sequencing by choosing unique target sequences in the genome and modifying both guide RNA and Cas9. We found that both the composition and structure of guide RNA can affect RGEN activities in cells to reduce off-target effects. RGENs efficiently discriminated on-target sites from off-target sites that differ by two bases. Furthermore, exome sequencing analysis showed that no off-target mutations were induced by two RGENs in four clonal populations of mutant cells. In addition, paired Cas9 nickases, composed of D10A Cas9 and guide RNA, which generate two single-strand breaks (SSBs) or nicks on different DNA strands, were highly specific in human cells, avoiding off-target mutations without sacrificing genome-editing efficiency. Interestingly, paired nickases induced chromosomal deletions in a targeted manner without causing unwanted translocations. Our results highlight the importance of choosing unique target sequences and optimizing guide RNA and Cas9 to avoid or reduce RGEN-induced off-target mutations.

...read moreread less

1,332 citations

Journal Article•DOI•

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads

[...]

Rei Kajitani¹, Kouta Toshimoto¹, Hideki Noguchi², Atsushi Toyoda², Yoshitoshi Ogura³, Miki Okuno¹, Mitsuru Yabana¹, Masayuki Harada¹, Eiji Nagayasu³, Haruhiko Maruyama³, Yuji Kohara², Asao Fujiyama², Tetsuya Hayashi³, Takehiko Itoh¹ - Show less +10 more•Institutions (3)

Tokyo Institute of Technology¹, National Institute of Genetics², University of Miyazaki³

22 Apr 2014-Genome Research

TL;DR: Platanus provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.

...read moreread less

Abstract: Although many de novo genome assembly projects have recently been conducted using high-throughput sequencers, assembling highly heterozygous diploid genomes is a substantial challenge due to the increased complexity of the de Bruijn graph structure predominantly used. To address the increasing demand for sequencing of nonmodel and/or wild-type samples, in most cases inbred lines or fosmid-based hierarchical sequencing methods are used to overcome such problems. However, these methods are costly and time consuming, forfeiting the advantages of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, that can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes followed by the scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, Platanus yields assembly results that have a larger scaffold NG50 length without any accompanying loss of accuracy in both simulated and real data. In addition, Platanus recorded the largest scaffold NG50 values for two of the three low-heterozygosity species used in the de novo assembly contest, Assemblathon 2. Platanus therefore provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.

...read moreread less

924 citations

Journal Article•DOI•

Tn5 transposase and tagmentation procedures for massively scaled sequencing projects

[...]

Simone Picelli¹, Åsa K. Björklund¹, Björn Reinius¹, Björn Reinius², Sven Sagasser², Sven Sagasser¹, Gösta Winberg², Gösta Winberg¹, Rickard Sandberg², Rickard Sandberg¹ - Show less +6 more•Institutions (2)

Ludwig Institute for Cancer Research¹, Karolinska Institutet²

01 Dec 2014-Genome Research

TL;DR: This work presents simple and robust procedures for Tn5 transposase production and optimized reaction conditions for tagmentation-based sequencing library construction and shows how molecular crowding agents both modulate library lengths and enable efficient tagmentation from subpicogram amounts of cDNA.

...read moreread less

Abstract: Massively parallel DNA sequencing of thousands of samples in a single machine-run is now possible, but the preparation of the individual sequencing libraries is expensive and time-consuming. Tagmentation-based library construction, using the Tn5 transposase, is efficient for generating sequencing libraries but currently relies on undisclosed reagents, which severely limits development of novel applications and the execution of large-scale projects. Here, we present simple and robust procedures for Tn5 transposase production and optimized reaction conditions for tagmentation-based sequencing library construction. We further show how molecular crowding agents both modulate library lengths and enable efficient tagmentation from subpicogram amounts of cDNA. The comparison of single-cell RNA-sequencing libraries generated using produced and commercial Tn5 demonstrated equal performances in terms of gene detection and library characteristics. Finally, because naked Tn5 can be annealed to any oligonucleotide of choice, for example, molecular barcodes in single-cell assays or methylated oligonucleotides for bisulfite sequencing, custom Tn5 production and tagmentation enable innovation in sequencing-based applications.

...read moreread less

690 citations

Journal Article•DOI•

Gene disruption by cell-penetrating peptide-mediated delivery of Cas9 protein and guide RNA

[...]

Suresh Ramakrishna¹, Abu-Bonsrah Kwaku Dad¹, Jagadish Beloor¹, Ramu Gopalappa¹, Sang-Kyung Lee¹, Hyongbum Kim¹ - Show less +2 more•Institutions (1)

Hanyang University¹

01 Jun 2014-Genome Research

TL;DR: This work shows that simple treatment with cell-penetrating peptide (CPP)-conjugated recombinant Cas9 protein and CPP-complexed guide RNAs leads to endogenous gene disruptions in human cell lines, and envisages that this method will facilitate RGEN-directed genome editing.

...read moreread less

Abstract: .RNA-guided endonucleases (RGENs) derived from the CRISPR/Cas system represent an efficient tool for genome editing. RGENs consist of two components: Cas9 protein and guide RNA. Plasmid-mediated delivery of these components into cells can result in uncontrolled integration of the plasmid sequence into the host genome, and unwanted immune responses and potential safety problems that can be caused by the bacterial sequences. Furthermore, this delivery method requires transfectiontools.Hereweshowthatsimple treatment with cell-penetratingpeptide (CPP)–conjugatedrecombinant Cas9 protein and CPP-complexed guide RNAs leads to endogenous gene disruptions in human cell lines. The Cas9 protein was conjugated to CPP via a thioether bond, whereas the guide RNA was complexed with CPP, forming condensed, positively charged nanoparticles. Simultaneous and sequential treatment of human cells, including embryonic stem cells, dermal fibroblasts, HEK293T cells, HeLa cells, and embryonic carcinoma cells, with the modified Cas9 and guide RNA, leads to efficient gene disruptions with reduced off-target mutations relative to plasmid transfections, resulting in the generation of clones containing RGEN-induced mutations. Our CPP-mediated RGEN delivery process provides a plasmidfree and additional transfection reagent–free method to use this tool with reduced off-target effects. We envision that our method will facilitate RGEN-directed genome editing.

...read moreread less

654 citations

Journal Article•DOI•

Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals

[...]

Alexis Battle¹, Sara Mostafavi¹, Xiaowei Zhu¹, James B. Potash², Myrna M. Weissman³, Courtney McCormick⁴, Christian D. Haudenschild, Kenneth B. Beckman⁵, Jianxin Shi, Rui Mei, Alexander E. Urban¹, Stephen B. Montgomery¹, Douglas F. Levinson¹, Daphne Koller¹ - Show less +10 more•Institutions (5)

Stanford University¹, University of Iowa Hospitals and Clinics², Columbia University³, Illumina⁴, University of Minnesota⁵

01 Jan 2014-Genome Research

TL;DR: This work provides a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals, and presents a comprehensive description of the distribution of regulatory variation--by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants.

...read moreread less

Abstract: Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation--by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.

...read moreread less

577 citations

Journal Article•DOI•

Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines

[...]

Wen Huang¹, Andreas Massouras², Andreas Massouras³, Yutaka Inoue⁴, Jason A. Peiffer¹, Miquel Ràmia⁵, Aaron M. Tarone⁶, Lavanya Turlapati¹, Thomas Zichner⁷, Dianhui Zhu⁸, Richard F. Lyman¹, Michael M. Magwire¹, Kerstin P. Blankenburg⁸, Mary Anna Carbone¹, Kyle Chang⁸, Lisa L. Ellis⁶, Sonia Fernandez⁸, Yi Han⁸, Gareth Highnam⁹, Carl E. Hjelmen⁶, John Jack¹, Mehwish Javaid⁸, Joy Jayaseelan⁸, Divya Kalra⁸, Sandy Lee⁸, Lora Lewis⁸, Mala Munidasa⁸, Fiona Ongeri⁸, Shohba Patel⁸, Lora Perales⁸, Agapito Perez⁸, Ling-Ling Pu⁸, Stephanie M. Rollmann¹, Robert Ruth⁸, Nehad Saada⁸, Crystal B. Warner⁸, Aneisa Williams⁸, Yuanqing Wu⁸, Akihiko Yamamoto¹, Yiqing Zhang⁸, Yiming Zhu⁸, Robert R. H. Anholt¹, Jan O. Korbel⁷, David Mittelman⁹, Donna M. Muzny⁸, Richard A. Gibbs⁸, Antonio Barbadilla⁵, J. Spencer Johnston⁶, Eric A. Stone¹, Stephen Richards⁸, Bart Deplancke², Bart Deplancke³, Trudy F. C. Mackay¹ - Show less +49 more•Institutions (9)

North Carolina State University¹, École Polytechnique Fédérale de Lausanne², Swiss Institute of Bioinformatics³, Osaka University⁴, Autonomous University of Barcelona⁵, Texas A&M University⁶, European Bioinformatics Institute⁷, Baylor College of Medicine⁸, Virginia Bioinformatics Institute⁹

01 Jul 2014-Genome Research

TL;DR: An integrated genotyping strategy was used to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants and identified 16 polymorphic inversions in the DGRP, finding variation in genome size and many quantitative traits are significantly associated with inversions.

...read moreread less

Abstract: The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.

...read moreread less

569 citations

Journal Article•DOI•

Highly efficient CRISPR/Cas9-mediated knock-in in zebrafish by homology-independent DNA repair

[...]

Thomas O. Auer¹, Karine Duroure², Karine Duroure³, Karine Duroure¹, Anne De Cian³, Anne De Cian², Jean-Paul Concordet², Jean-Paul Concordet³, Filippo Del Bene², Filippo Del Bene³, Filippo Del Bene¹ - Show less +7 more•Institutions (3)

Curie Institute¹, Centre national de la recherche scientifique², French Institute of Health and Medical Research³

01 Jan 2014-Genome Research

TL;DR: CRISPR/Cas9-mediated knock-in of DNA cassettes into the zebrafish genome at a very high rate by homology-independent double-strand break (DSB) repair pathways is reported and the possibility of easily targeting DNA integration at endogenous loci is shown, thus greatly facilitating the creation of reporter and loss-of-function alleles.

...read moreread less

Abstract: Sequence-specific nucleases like TALENs and the CRISPR/Cas9 system have greatly expanded the genome editing possibilities in model organisms such as zebrafish. Both systems have recently been used to create knock-out alleles with great efficiency, and TALENs have also been successfully employed in knock-in of DNA cassettes at defined loci via homologous recombination (HR). Here we report CRISPR/Cas9-mediated knock-in of DNA cassettes into the zebrafish genome at a very high rate by homology-independent double-strand break (DSB) repair pathways. After co-injection of a donor plasmid with a short guide RNA (sgRNA) and Cas9 nuclease mRNA, concurrent cleavage of donor plasmid DNA and the selected chromosomal integration site resulted in efficient targeted integration of donor DNA. We successfully employed this approach to convert eGFP into Gal4 transgenic lines, and the same plasmids and sgRNAs can be applied in any species where eGFP lines were generated as part of enhancer and gene trap screens. In addition, we show the possibility of easily targeting DNA integration at endogenous loci, thus greatly facilitating the creation of reporter and loss-of-function alleles. Due to its simplicity, flexibility, and very high efficiency, our method greatly expands the repertoire for genome editing in zebrafish and can be readily adapted to many other organisms.

...read moreread less

565 citations

Journal Article•DOI•

Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival

[...]

Scott D. Brown¹, Scott D. Brown², René L. Warren¹, Ewan A. Gibb², Ewan A. Gibb¹, Spencer D. Martin², Spencer D. Martin¹, John J. Spinelli², John J. Spinelli¹, Brad H. Nelson¹, Brad H. Nelson³, Brad H. Nelson², Robert A. Holt⁴, Robert A. Holt², Robert A. Holt¹ - Show less +11 more•Institutions (4)

BC Cancer Agency¹, University of British Columbia², University of Victoria³, Simon Fraser University⁴

01 May 2014-Genome Research

TL;DR: For 515 patients from six tumor sites, RNA-seq data from The Cancer Genome Atlas was used to identify mutations that were predicted to be immunogenic in that they yielded mutational epitopes presented by the MHC proteins encoded by each patient's autologous HLA-A alleles that were associated with increased patient survival.

...read moreread less

Abstract: Somatic missense mutations can initiate tumorogenesis and, conversely, anti-tumor cytotoxic T cell (CTL) responses. Tumor genome analysis has revealed extreme heterogeneity among tumor missense mutation profiles, but their relevance to tumor immunology and patient outcomes has awaited comprehensive evaluation. Here, for 515 patients from six tumor sites, we used RNA-seq data from The Cancer Genome Atlas to identify mutations that are predicted to be immunogenic in that they yielded mutational epitopes presented by the MHC proteins encoded by each patient’s autologous HLA-A alleles. Mutational epitopes were associated with increased patient survival. Moreover, the corresponding tumors had higher CTL content, inferred from CD8A gene expression, and elevated expression of the CTL exhaustion markers PDCD1 and CTLA4. Mutational epitopes were very scarce in tumors without evidence of CTL infiltration. These findings suggest that the abundance of predicted immunogenic mutations may be useful for identifying patients likely to benefit from checkpoint blockade and related immunotherapies.

...read moreread less

547 citations

Journal Article•DOI•

Widespread intron retention in mammals functionally tunes transcriptomes

[...]

Ulrich Braunschweig¹, Nuno L. Barbosa-Morais², Nuno L. Barbosa-Morais¹, Qun Pan¹, Emil N. Nachman¹, Babak Alipanahi¹, Thomas Gonatopoulos-Pournatzis¹, Brendan J. Frey¹, Manuel Irimia¹, Benjamin J. Blencowe¹ - Show less +6 more•Institutions (2)

University of Toronto¹, Instituto de Medicina Molecular²

01 Nov 2014-Genome Research

TL;DR: It is shown that intron retention acts widely to reduce the levels of transcripts that are less or not required for the physiology of the cell or tissue type in which they are detected, and this "transcriptome tuning" function of IR acts through both nonsense-mediated mRNA decay and nuclear sequestration and turnover of IR transcripts.

...read moreread less

Abstract: Alternative splicing (AS) of precursor RNAs is responsible for greatly expanding the regulatory and functional capacity of eukaryotic genomes. Of the different classes of AS, intron retention (IR) is the least well understood. In plants and unicellular eukaryotes, IR is the most common form of AS, whereas in animals, it is thought to represent the least prevalent form. Using high-coverage poly(A)+ RNA-seq data, we observe that IR is surprisingly frequent in mammals, affecting transcripts from as many as three-quarters of multiexonic genes. A highly correlated set of cis features comprising an “IR code” reliably discriminates retained from constitutively spliced introns. We show that IR acts widely to reduce the levels of transcripts that are less or not required for the physiology of the cell or tissue type in which they are detected. This “transcriptome tuning” function of IR acts through both nonsense-mediated mRNA decay and nuclear sequestration and turnover of IR transcripts. We further show that IR is linked to a cross-talk mechanism involving localized stalling of RNA polymerase II (Pol II) and reduced availability of spliceosomal components. Collectively, the results implicate a global checkpoint-type mechanism whereby reduced recruitment of splicing components coupled to Pol II pausing underlies widespread IR-mediated suppression of inappropriately expressed transcripts.

...read moreread less

547 citations

Journal Article•DOI•

From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing

[...]

Georgi K. Marinov¹, Brian A. Williams¹, Kenneth McCue¹, Gary P. Schroth², Jason Gertz, Richard M. Myers, Barbara J. Wold¹ - Show less +3 more•Institutions (2)

California Institute of Technology¹, Illumina²

01 Mar 2014-Genome Research

TL;DR: The SMART-seq single-cell RNA-seq protocol is applied to study the reference lymphoblastoid cell line GM12878 and it is shown that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporaryRNA-seq from large amounts of input material.

...read moreread less

Abstract: Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30–100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.

...read moreread less

Journal Article•DOI•

A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes.

[...]

Lily Bazak¹, Ami Haviv¹, Michal Barak¹, Jasmine Jacob-Hirsch¹, Jasmine Jacob-Hirsch², Patricia Deng³, Rui Zhang³, Farren J. Isaacs⁴, Gideon Rechavi⁵, Gideon Rechavi², Jin Billy Li³, Eli Eisenberg⁵, Erez Y. Levanon¹ - Show less +9 more•Institutions (5)

Bar-Ilan University¹, Sheba Medical Center², Stanford University³, Yale University⁴, Tel Aviv University⁵

01 Mar 2014-Genome Research

TL;DR: It is found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels, doubling the number of edited sites in the human genome.

...read moreread less

Abstract: RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.

...read moreread less

Journal Article•DOI•

Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts

[...]

Ferhat Ay¹, Timothy L. Bailey², William Stafford Noble¹•Institutions (2)

University of Washington¹, University of Queensland²

05 Feb 2014-Genome Research

TL;DR: Fit-Hi-C is described, a method that assigns statistical confidence estimates to mid-range intra-chromosomal contacts by jointly modeling the random polymer looping effect and previously observed technical biases in Hi-C data sets and shows that insulators and heterochromatin regions are hubs for high-confidence contacts, while promoters and strong enhancers are involved in fewer contacts.

...read moreread less

Abstract: Our current understanding of how DNA is packed in the nucleus is most accurate at the fine scale of individual nucleosomes and at the large scale of chromosome territories. However, accurate modeling of DNA architecture at the intermediate scale of ∼50 kb-10 Mb is crucial for identifying functional interactions among regulatory elements and their target promoters. We describe a method, Fit-Hi-C, that assigns statistical confidence estimates to mid-range intra-chromosomal contacts by jointly modeling the random polymer looping effect and previously observed technical biases in Hi-C data sets. We demonstrate that our proposed approach computes accurate empirical null models of contact probability without any distribution assumption, corrects for binning artifacts, and provides improved statistical power relative to a previously described method. High-confidence contacts identified by Fit-Hi-C preferentially link expressed gene promoters to active enhancers identified by chromatin signatures in human embryonic stem cells (ESCs), capture 77% of RNA polymerase II-mediated enhancer-promoter interactions identified using ChIA-PET in mouse ESCs, and confirm previously validated, cell line-specific interactions in mouse cortex cells. We observe that insulators and heterochromatin regions are hubs for high-confidence contacts, while promoters and strong enhancers are involved in fewer contacts. We also observe that binding peaks of master pluripotency factors such as NANOG and POU5F1 are highly enriched in high-confidence contacts for human ESCs. Furthermore, we show that pairs of loci linked by high-confidence contacts exhibit similar replication timing in human and mouse ESCs and preferentially lie within the boundaries of topological domains for human and mouse cell lines.

...read moreread less

Journal Article•DOI•

A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples

[...]

Samia N. Naccache¹, Scot Federman¹, Narayanan Veeraraghavan¹, Matei Zaharia², Deanna Lee¹, Erik Samayoa¹, Jerome Bouquet¹, Alexander L. Greninger¹, Ka Cheung Luk, Barryett Enge³, Debra A. Wadford³, Sharon Messenger³, Gillian Genrich¹, Kristen Pellegrino¹, Gilda Grard, Eric M. Leroy, Bradley S. Schneider, Joseph N. Fair, Miguel Ángel Martínez⁴, Pavel Isa⁴, John A. Crump⁵, Joseph L. DeRisi¹, Taylor Sittler¹, John Hackett, Steve Miller¹, Charles Y. Chiu¹ - Show less +22 more•Institutions (5)

University of California, San Francisco¹, University of California, Berkeley², California Department of Public Health³, National Autonomous University of Mexico⁴, Duke University⁵

01 Jul 2014-Genome Research

TL;DR: SURPI is described, a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and use of the pipeline is demonstrated in the analysis of 237 clinical samples comprising more than 1.1 billion sequences.

...read moreread less

Abstract: Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.

...read moreread less

Journal Article•DOI•

Functional and topological characteristics of mammalian regulatory domains

[...]

Orsolya Symmons, Veli Vural Uslu, Taro Tsujimura, Sandra Ruf, Sonya Nassari, Wibke Schwarzer, Laurence Ettwiller¹, François Spitz - Show less +4 more•Institutions (1)

Heidelberg University¹

07 Jan 2014-Genome Research

TL;DR: A large operational analysis to chart the distribution of gene regulatory activities along the mouse genome, using hundreds of insertions of a regulatory sensor finds that enhancers distribute their activities along broad regions and not in a gene-centric manner, defining large regulatory domains.

...read moreread less

Abstract: Long-range regulatory interactions play an important role in shaping gene-expression programs. However, the genomic features that organize these activities are still poorly characterized. We conducted a large operational analysis to chart the distribution of gene regulatory activities along the mouse genome, using hundreds of insertions of a regulatory sensor. We found that enhancers distribute their activities along broad regions and not in a gene-centric manner, defining large regulatory domains. Remarkably, these domains correlate strongly with the recently described TADs, which partition the genome into distinct self-interacting blocks. Different features, including specific repeats and CTCF-binding sites, correlate with the transition zones separating regulatory domains, and may help to further organize promiscuously distributed regulatory influences within large domains. These findings support a model of genomic organization where TADs confine regulatory activities to specific but large regulatory domains, contributing to the establishment of specific gene expression profiles.

...read moreread less

Journal Article•DOI•

Widespread contribution of transposable elements to the innovation of gene regulatory networks

[...]

Vasavi Sundaram¹, Yong Cheng², Zhihai Ma², Daofeng Li¹, Xiaoyun Xing¹, Peter Edge³, Michael Snyder², Ting Wang¹ - Show less +4 more•Institutions (3)

Washington University in St. Louis¹, Stanford University², University of Minnesota³

01 Dec 2014-Genome Research

TL;DR: Transposable elements have significantly and continuously shaped gene regulatory networks during mammalian evolution, and are an important driving force for regulatory innovation.

...read moreread less

Abstract: Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF–TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.

...read moreread less

Journal Article•DOI•

Seamless gene correction of β-thalassemia mutations in patient-specific iPSCs using CRISPR/Cas9 and piggyBac

[...]

Fei Xie¹, Lin Ye¹, Judy C. Chang¹, Ashley I. Beyer², Jiaming Wang¹, Marcus O. Muench¹, Yuet Wai Kan¹ - Show less +3 more•Institutions (2)

University of California, San Francisco¹, Systems Research Institute²

01 Sep 2014-Genome Research

TL;DR: This study provides an effective approach to correct HBB mutations without leaving any genetic footprint in patient-derived iPSCs, thereby demonstrating a critical step toward the future application of stem cell-based gene therapy to monogenic diseases.

...read moreread less

Abstract: β-thalassemia, one of the most common genetic diseases worldwide, is caused by mutations in the human hemoglobin beta (HBB) gene. Creation of human induced pluripotent stem cells (iPSCs) from β-thalassemia patients could offer an approach to cure this disease. Correction of the disease-causing mutations in iPSCs could restore normal function and provide a rich source of cells for transplantation. In this study, we used the latest gene-editing tool, CRISPR/Cas9 technology, combined with the piggyBac transposon to efficiently correct the HBB mutations in patient-derived iPSCs without leaving any residual footprint. No off-target effects were detected in the corrected iPSCs, and the cells retain full pluripotency and exhibit normal karyotypes. When differentiated into erythroblasts using a monolayer culture, gene-corrected iPSCs restored expression of HBB compared to the parental iPSCs line. Our study provides an effective approach to correct HBB mutations without leaving any genetic footprint in patient-derived iPSCs, thereby demonstrating a critical step toward the future application of stem cell-based gene therapy to monogenic diseases.

...read moreread less

Journal Article•DOI•

Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability

[...]

Keiko Akagi¹, Jingfeng Li¹, Tatevik Broutian¹, Hesed Padilla-Nash, Weihong Xiao¹, Bo Jiang¹, James W. Rocco², James W. Rocco³, Theodoros N. Teknos⁴, Bhavna Kumar⁴, Danny Wangsa, Dandan He¹, Thomas Ried, David E. Symer, Maura L. Gillison¹ - Show less +11 more•Institutions (4)

Ohio State University¹, Harvard University², Massachusetts Eye and Ear Infirmary³, The Ohio State University Wexner Medical Center⁴

01 Feb 2014-Genome Research

TL;DR: This work presents a model of "looping" by which HPV integrant-mediated DNA replication and recombination may result in viral-host DNA concatemers, frequently disrupting genes involved in oncogenesis and amplifying HPV oncogenes E6 and E7.

...read moreread less

Abstract: Genomic instability is a hallmark of human cancers, including the 5% caused by human papillomavirus (HPV). Here we report a striking association between HPV integration and adjacent host genomic structural variation in human cancer cell lines and primary tumors. Whole-genome sequencing revealed HPV integrants flanking and bridging extensive host genomic amplifications and rearrangements, including deletions, inversions, and chromosomal translocations. We present a model of "looping" by which HPV integrant-mediated DNA replication and recombination may result in viral-host DNA concatemers, frequently disrupting genes involved in oncogenesis and amplifying HPV oncogenes E6 and E7. Our high-resolution results shed new light on a catastrophic process, distinct from chromothripsis and other mutational processes, by which HPV directly promotes genomic instability.

...read moreread less

Journal Article•DOI•

Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary

[...]

Kevin Vanneste¹, Guy Baele², Steven Maere¹, Yves Van de Peer³•Institutions (3)

Ghent University¹, Katholieke Universiteit Leuven², University of Pretoria³

01 Aug 2014-Genome Research

TL;DR: It is argued that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time ofpolyploidization could mitigate the stark contrast in the proposed evolutionary fates of Polyploids.

...read moreread less

Abstract: Ancient whole-genome duplications (WGDs), also referred to as paleopolyploidizations, have been reported in most evolutionary lineages. Their attributed role remains a major topic of discussion, ranging from an evolutionary dead end to a road toward evolutionary success, with evidence supporting both fates. Previously, based on dating WGDs in a limited number of plant species, we found a clustering of angiosperm paleopolyploidizations around the Cretaceous-Paleogene (K-Pg) extinction event about 66 million years ago. Here we revisit this finding, which has proven controversial, by combining genome sequence information for many more plant lineages and using more sophisticated analyses. We include 38 full genome sequences and three transcriptome assemblies in a Bayesian evolutionary analysis framework that incorporates uncorrelated relaxed clock methods and fossil uncertainty. In accordance with earlier findings, we demonstrate a strongly nonrandom pattern of genome duplications over time with many WGDs clustering around the K-Pg boundary. We interpret these results in the context of recent studies on invasive polyploid plant species, and suggest that polyploid establishment is promoted during times of environmental stress. We argue that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time of polyploidization could mitigate the stark contrast in the proposed evolutionary fates of polyploids.

...read moreread less

Journal Article•DOI•

TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data

[...]

Gavin Ha¹, Gavin Ha², Andrew Roth², Andrew Roth¹, Jaswinder Khattra¹, Julie Ho, Damian Yap¹, Leah M Prentice, Nataliya Melnyk, Andrew McPherson², Andrew McPherson¹, Ali Bashashati¹, Emma Laks¹, Justina Biele¹, Jiarui Ding¹, Jiarui Ding², Alan Le¹, Jamie Rosner¹, Karey Shumansky¹, Marco A. Marra¹, C. Blake Gilks³, David G. Huntsman², Jessica N. McAlpine², Samuel Aparicio², Samuel Aparicio¹, Sohrab P. Shah², Sohrab P. Shah¹ - Show less +23 more•Institutions (3)

BC Cancer Agency¹, University of British Columbia², Vancouver General Hospital³

01 Nov 2014-Genome Research

TL;DR: A novel probabilistic model is presented, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event.

...read moreread less

Abstract: The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN.

...read moreread less

Journal Article•DOI•

Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits.

[...]

Olivia Corradin¹, Alina Saiakhova¹, Batool Akhtar-Zaidi¹, Lois Myeroff¹, Joseph Willis¹, Richard Cowper-Sal·lari², Mathieu Lupien², Sanford D. Markowitz¹, Peter C. Scacheri¹ - Show less +5 more•Institutions (2)

Case Western Reserve University¹, University Health Network²

01 Jan 2014-Genome Research

TL;DR: Evidence is provided that for six common autoimmune disorders, the GWAS association arises from multiple polymorphisms in LD that map to clusters of enhancer elements active in the same cell type, which suggests a "multiple enhancer variant" hypothesis for common traits.

...read moreread less

Abstract: DNA variants (SNPs) that predispose to common traits often localize within noncoding regulatory elements such as enhancers. Moreover, loci identified by genome-wide association studies (GWAS) often contain multiple SNPs in linkage disequilibrium (LD), any of which may be causal. Thus, determining the effect of these multiple variant SNPs on target transcript levels has been a major challenge. Here, we provide evidence that for six common autoimmune disorders (rheumatoid arthritis, Crohn's disease, celiac disease, multiple sclerosis, lupus, and ulcerative colitis), the GWAS association arises from multiple polymorphisms in LD that map to clusters of enhancer elements active in the same cell type. This finding suggests a "multiple enhancer variant" hypothesis for common traits, where several variants in LD impact multiple enhancers and cooperatively affect gene expression. Using a novel method to delineate enhancer-gene interactions, we show that multiple enhancer variants within a given locus typically target the same gene. Using available data from HapMap and B lymphoblasts as a model system, we provide evidence at numerous loci that multiple enhancer variants cooperatively contribute to altered expression of their gene targets. The effects on target transcript levels tend to be modest and can be either gain- or loss-of-function. Additionally, the genes associated with multiple enhancer variants encode proteins that are often functionally related and enriched in common pathways. Overall, the multiple enhancer variant hypothesis offers a new paradigm by which noncoding variants can confer susceptibility to common traits.

...read moreread less

Journal Article•DOI•

DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly

[...]

Ilari Scheinin¹, Daoud Sie², Henrik Bengtsson³, Mark A. van de Wiel², Adam B. Olshen³, Hinke F. van Thuijl², Hendrik F. van Essen², Paul P. Eijk², François Rustenburg², Gerrit A. Meijer², Jaap C. Reijneveld², Pieter Wesseling², Daniel Pinkel³, Donna G. Albertson⁴, Bauke Ylstra² - Show less +11 more•Institutions (4)

Helsinki University Central Hospital¹, VU University Medical Center², University of California, San Francisco³, New York University⁴

01 Dec 2014-Genome Research

TL;DR: This work improves on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions.

...read moreread less

Abstract: Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.

...read moreread less

Journal Article•DOI•

Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals

[...]

Stefan Washietl¹, Manolis Kellis², Manolis Kellis¹, Manuel Garber², Manuel Garber³ - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Broad Institute², University of Massachusetts Medical School³

15 Jan 2014-Genome Research

TL;DR: It is found that ∼20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus, which suggests that exact splice sites are not critical.

...read moreread less

Abstract: .Long intergenic noncoding RNAs (lincRNAs) play diverse regulatory roles in human development and disease, but little is known about their evolutionary history and constraint. Here, we characterize human lincRNA expression patterns in nine tissues across six mammalian species and multiple individuals. Of the 1898 human lincRNAs expressed in these tissues, we find orthologous transcripts for 80% in chimpanzee, 63% in rhesus, 39% in cow, 38% in mouse, and 35% in rat. Mammalian-expressed lincRNAs show remarkably strong conservation of tissue specificity, suggesting that it is selectively maintained. In contrast, abundant splice-site turnover suggests that exact splice sites are not critical. Relative to evolutionarily young lincRNAs, mammalian-expressed lincRNAs show higher primary sequence conservation in their promoters and exons, increased proximity to protein-coding genes enriched for tissue-specific functions, fewer repeat elements, and more frequent single-exon transcripts. Remarkably, we find that ~20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus. These hominid-specific lincRNAs are more tissue specific, enriched for testis, and faster evolving within the human lineage.

...read moreread less

Journal Article•DOI•

Improved exome prioritization of disease genes through cross-species phenotype comparison.

[...]

Peter N. Robinson¹, Sebastian Köhler¹, Anika Oellrich², Sanger Mouse Genetics², Kai Wang³, Christopher J. Mungall⁴, Suzanna E. Lewis⁴, Nicole L. Washington⁴, Sebastian Bauer⁵, Sebastian Bauer¹, Dominik Seelow¹, Peter Krawitz¹, Peter Krawitz⁵, Christian Gilissen⁶, Melissa A. Haendel⁷, Damian Smedley² - Show less +12 more•Institutions (7)

Charité¹, Wellcome Trust Sanger Institute², University of Southern California³, Lawrence Berkeley National Laboratory⁴, Max Planck Society⁵, Radboud University Nijmegen Medical Centre⁶, Oregon Health & Science University⁷

01 Feb 2014-Genome Research

TL;DR: It is proposed that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here and conclude that incorporation of phenotype data can play a vital role in translational bioinformatics.

...read moreread less

Abstract: Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

...read moreread less

Journal Article•DOI•

Epigenetic modification and inheritance in sexual reversal of fish

[...]

Changwei Shao, Qiye Li, Songlin Chen, Pei Zhang, Jinmin Lian, Qiaomu Hu, Bing Sun, Lijun Jin, Shanshan Liu, Zongji Wang¹, Hongmei Zhao, Zonghui Jin, Zhuo Liang, Yangzhen Li, Qiumei Zheng, Yong Zhang, Jun Wang², Jun Wang³, Guojie Zhang² - Show less +15 more•Institutions (3)

South China University of Technology¹, University of Copenhagen², King Abdulaziz University³

01 Apr 2014-Genome Research

TL;DR: It is concluded that epigenetic regulation plays multiple crucial roles in sexual reversal of tongue sole fish, and the first clues on the mechanisms behind gene dosage balancing in an organism that undergoes sexual reversal are offered.

...read moreread less

Abstract: Environmental sex determination (ESD) occurs in divergent, phylogenetically unrelated taxa, and in some species, co-occurs with genetic sex determination (GSD) mechanisms. Although epigenetic regulation in response to environmental effects has long been proposed to be associated with ESD, a systemic analysis on epigenetic regulation of ESD is still lacking. Using half-smooth tongue sole (Cynoglossus semilaevis) as a model-a marine fish that has both ZW chromosomal GSD and temperature-dependent ESD-we investigated the role of DNA methylation in transition from GSD to ESD. Comparative analysis of the gonadal DNA methylomes of pseudomale, female, and normal male fish revealed that genes in the sex determination pathways are the major targets of substantial methylation modification during sexual reversal. Methylation modification in pseudomales is globally inherited in their ZW offspring, which can naturally develop into pseudomales without temperature incubation. Transcriptome analysis revealed that dosage compensation occurs in a restricted, methylated cytosine enriched Z chromosomal region in pseudomale testes, achieving equal expression level in normal male testes. In contrast, female-specific W chromosomal genes are suppressed in pseudomales by methylation regulation. We conclude that epigenetic regulation plays multiple crucial roles in sexual reversal of tongue sole fish. We also offer the first clues on the mechanisms behind gene dosage balancing in an organism that undergoes sexual reversal. Finally, we suggest a causal link between the bias sex chromosome assortment in the offspring of a pseudomale family and the transgenerational epigenetic inheritance of sexual reversal in tongue sole fish.

...read moreread less

Journal Article•DOI•

RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas

[...]

Zhaoshi Bao¹, Hui Min Chen², Mingyu Yang², Chuanbao Zhang¹, Kai Yu², Wan Lu Ye², Bo Qiang Hu², Wei Yan³, Wei Zhang¹, Johnny C. Akers⁴, Valya Ramakrishnan⁴, Jie Li⁴, Bob S. Carter⁴, Yan Wei Liu¹, Hui Min Hu, Zheng Wang¹, Mingyang Li¹, Kun Yao¹, Xiao Guang Qiu¹, Chunsheng Kang⁵, Yong ping You³, Xiao Long Fan⁶, Wei Sonya Song, Rui Qiang Li², Xiao-Dong Su², Clark C. Chen⁴, Tao Jiang - Show less +23 more•Institutions (6)

Capital Medical University¹, Peking University², Nanjing Medical University³, University of California, San Diego⁴, Tianjin Medical University General Hospital⁵, Beijing Normal University⁶

18 Aug 2014-Genome Research

TL;DR: This study profiles the shifting RNA landscape of gliomas during progression and reveled ZM as a novel, recurrent fusion transcript in sGBMs and revealed that the fusion arose from translocation events involving introns 3 or 8 of PTPRZ and intron 1 of MET.

...read moreread less

Abstract: Studies of gene rearrangements and the consequent oncogenic fusion proteins have laid the foundation for targeted cancer therapy. To identify oncogenic fusions associated with glioma progression, we catalogued fusion transcripts by RNA-seq of 272 gliomas. Fusion transcripts were more frequently found in high-grade gliomas, in the classical subtype of gliomas, and in gliomas treated with radiation/temozolomide. Sixty-seven in-frame fusion transcripts were identified, including three recurrent fusion transcripts: FGFR3-TACC3, RNF213-SLC26A11, and PTPRZ1-MET (ZM). Interestingly, the ZM fusion was found only in grade III astrocytomas (1/13; 7.7%) or secondary GBMs (sGBMs, 3/20; 15.0%). In an independent cohort of sGBMs, the ZM fusion was found in three of 20 (15%) specimens. Genomic analysis revealed that the fusion arose from translocation events involving introns 3 or 8 of PTPRZ and intron 1 of MET. ZM fusion transcripts were found in GBMs irrespective of isocitrate dehydrogenase 1 (IDH1) mutation status. sGBMs harboring ZM fusion showed higher expression of genes required for PIK3CA signaling and lowered expression of genes that suppressed RB1 or TP53 function. Expression of the ZM fusion was mutually exclusive with EGFR overexpression in sGBMs. Exogenous expression of the ZM fusion in the U87MG glioblastoma line enhanced cell migration and invasion. Clinically, patients afflicted with ZM fusion harboring glioblastomas survived poorly relative to those afflicted with non-ZM-harboring sGBMs (P < 0.001). Our study profiles the shifting RNA landscape of gliomas during progression and reveled ZM as a novel, recurrent fusion transcript in sGBMs.

...read moreread less

Journal Article•DOI•

Genome-wide identification of long noncoding natural antisense transcripts and their responses to light in Arabidopsis

[...]

Huan Wang¹, Pil Joong Chung¹, Jun Liu¹, In-Cheol Jang¹, Michelle J. Kean¹, Jun Xu¹, Nam-Hai Chua¹ - Show less +3 more•Institutions (1)

Rockefeller University¹

08 Jan 2014-Genome Research

TL;DR: This work systematically identified long noncoding natural antisense transcripts (lncNATs), defined as lncRNAs transcribed from the opposite DNA strand of coding orNoncoding genes in Arabidopsis.

...read moreread less

Abstract: Recent research on long noncoding RNAs (lncRNAs) has expanded our understanding of gene transcription regulation and the generation of cellular complexity. Depending on their genomic origins, lncRNAs can be transcribed from intergenic or intragenic regions or from introns of protein-coding genes. We have recently reported more than 6000 intergenic lncRNAs in Arabidopsis. Here, we systematically identified long noncoding natural antisense transcripts (lncNATs), defined as lncRNAs transcribed from the opposite DNA strand of coding or noncoding genes. We found a total of 37,238 sense-antisense transcript pairs and 70% of annotated mRNAs to be associated with antisense transcripts in Arabidopsis. These lncNATs could be reproducibly detected by different technical platforms, including strand-specific tiling arrays, Agilent custom expression arrays, strand-specific RNA-seq, and qRT-PCR experiments. Moreover, we investigated the expression profiles of sense-antisense pairs in response to light and observed spatial and developmental-specific light effects on 626 concordant and 766 discordant NAT pairs. Genes for a large number of the light-responsive NAT pairs are associated with histone modification peaks, and histone acetylation is dynamically correlated with light-responsive expression changes of NATs.

...read moreread less

Journal Article•DOI•

The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes

[...]

Ai Ling Teh¹, Hong Pan², Li Chen¹, Mei-Lyn Ong¹, Shaillay Kumar Dogra¹, Johnny Wong¹, Julia L. MacIsaac³, Sarah M Mah³, Lisa M. McEwen³, Seang-Mei Saw, Keith M. Godfrey⁴, Yap Seng Chong⁵, Kenneth Kwek⁶, Chee Keong Kwoh², Shu E Soh⁵, Mary Ff F. Chong¹, Mary Ff F. Chong⁵, Sheila J. Barton⁴, Neerja Karnani¹, Clara Y. Cheong¹, Jan Paul Buschdorf¹, Walter Stünkel¹, Michael S. Kobor³, Michael J. Meaney⁷, Peter D. Gluckman⁸, Joanna D. Holbrook¹ - Show less +22 more•Institutions (8)

Agency for Science, Technology and Research¹, Nanyang Technological University², University of British Columbia³, University Hospital Southampton NHS Foundation Trust⁴, National University of Singapore⁵, Boston Children's Hospital⁶, McGill University⁷, University of Auckland⁸

01 Jul 2014-Genome Research

TL;DR: This study surveyed the genotypes and DNA methylomes of 237 neonates and found 1423 punctuate regions of the methylome that were highly variable across individuals, termed variably methylated regions (VMRs), against a backdrop of homogeneity.

...read moreread less

Abstract: Integrating the genotype with epigenetic marks holds the promise of better understanding the biology that underlies the complex interactions of inherited and environmental components that define the developmental origins of a range of disorders. The quality of the in utero environment significantly influences health over the lifecourse. Epigenetics, and in particular DNA methylation marks, have been postulated as a mechanism for the enduring effects of the prenatal environment. Accordingly, neonate methylomes contain molecular memory of the individual in utero experience. However, interindividual variation in methylation can also be a consequence of DNA sequence polymorphisms that result in methylation quantitative trait loci (methQTLs) and, potentially, the interaction between fixed genetic variation and environmental influences. We surveyed the genotypes and DNA methylomes of 237 neonates and found 1423 punctuate regions of the methylome that were highly variable across individuals, termed variably methylated regions (VMRs), against a backdrop of homogeneity. MethQTLs were readily detected in neonatal methylomes, and genotype alone best explained ∼25% of the VMRs. We found that the best explanation for 75% of VMRs was the interaction of genotype with different in utero environments, including maternal smoking, maternal depression, maternal BMI, infant birth weight, gestational age, and birth order. Our study sheds new light on the complex relationship between biological inheritance as represented by genotype and individual prenatal experience and suggests the importance of considering both fixed genetic variation and environmental factors in interpreting epigenetic variation.

...read moreread less

Journal Article•DOI•

Genome-wide parent-of-origin DNA methylation analysis reveals the intricacies of human imprinting and suggests a germline methylation-independent mechanism of establishment

[...]

Franck Court, Chiharu Tayama, Valeria Romanelli, Alex Martin-Trujillo, Isabel Iglesias-Platas, Kohji Okamura, Naoko Sugahara, Carlos Simón¹, Harry Moore², Julie V. Harness³, Hans S. Keirstead³, Jose V. Sanchez-Mut, Eisuke Kaneki⁴, Pablo Lapunzina⁵, Hidenobu Soejima⁶, Norio Wake⁴, Manel Esteller⁷, Manel Esteller⁸, Tsutomu Ogata⁹, Kenichiro Hata, Kazuhiko Nakabayashi, David Monk - Show less +18 more•Institutions (9)

University of Valencia¹, University of Sheffield², University of California, Irvine³, Kyushu University⁴, Autonomous University of Madrid⁵, Saga University⁶, University of Barcelona⁷, Catalan Institution for Research and Advanced Studies⁸, Hamamatsu University School of Medicine⁹

01 Apr 2014-Genome Research

TL;DR: Pl placental-specific imprinting provides evidence for an inheritable epigenetic state that is independent of DNA methylation and the existence of a novel imprinting mechanism at these loci.

...read moreread less

Abstract: Genomic imprinting is a form of epigenetic regulation that results in the expression of either the maternally or paternally inherited allele of a subset of genes (Ramowitz and Bartolomei 2011). This imprinted expression of transcripts is crucial for normal mammalian development. In humans, loss-of-imprinting of specific loci results in a number of diseases exemplified by the reciprocal growth phenotypes of the Beckwith-Wiedemann and Silver-Russell syndromes, and the behavioral disorders Angelman and Prader-Willi syndromes (Kagami et al. 2008; Buiting 2010; Choufani et al. 2010; Eggermann 2010; Kelsey 2010; Mackay and Temple 2010). In addition, aberrant imprinting also contributes to multigenic disorders associated with various complex traits and cancer (Kong et al. 2009; Monk 2010). Imprinted loci contain differentially methylated regions (DMRs) where cytosine methylation marks one of the parental alleles, providing cis-acting regulatory elements that influence the allelic expression of surrounding genes. Some DMRs acquire their allelic methylation during gametogenesis, when the two parental genomes are separated, resulting from the cooperation of the de novo methyltransferase DNMT3A and its cofactor DNMT3L (Bourc'his et al. 2001; Hata et al. 2002). These primary, or germline imprinted DMRs are stably maintained throughout somatic development, surviving the epigenetic reprogramming at the oocyte-to-embryo transition (Smallwood et al. 2011; Smith et al. 2012). To confirm that an imprinted DMR functions as an imprinting control region (ICR), disruption of the imprinted expression upon genetic deletion of that DMR, either through experimental targeting in mouse or that which occurs spontaneously in humans, is required. A subset of DMRs, known as secondary DMRs, acquire methylation during development and are regulated by nearby germline DMRs in a hierarchical fashion (Coombes et al. 2003; Lopes et al. 2003; Kagami et al. 2010). With the advent of large-scale, base-resolution methylation technologies, it is now possible to discriminate allelic methylation dictated by sequence variants from imprinted methylation. Yet our knowledge of the total number of imprinted DMRs in humans, and their developmental dynamics, remains incomplete, hampered by genetic heterogeneity of human samples. Here we present high-resolution mapping of human imprinted methylation. We performed whole-genome-wide bisulfite sequencing (WGBS) on leukocyte-, brain-, liver-, and placenta-derived DNA samples to identify partially methylated regions common to all tissues consistent with imprinted DMRs. We subsequently confirmed the partial methylated states in tissues using high-density methylation microarrays. The parental origin of methylation was determined by comparing microarray data for DNA samples from reciprocal genome-wide uniparental disomy (UPD) samples, in which all chromosomes are inherited from one parent (Lapunzina and Monk 2011), and androgenetic hydatidiform moles, which are created by the fertilization of an oocyte lacking a nucleus by a sperm that endoreduplicates. The use of uniparental disomies and hydatidiform moles meant that our analyses were not subjected to genotype influences, enabling us to characterize all known imprinted DMRs at base-pair resolution and to identify 21 imprinted domains, which we show are absent in mice. Lastly, we extended our analyses to determine the methylation profiles of all imprinted DMRs in sperm, stem cells derived from parthenogenetically activated metaphase-2 oocyte blastocytes (phES) (Mai et al. 2007; Harness et al. 2011), and stem cells (hES) generated from both six-cell blastomeres and the inner cell mass of blastocysts, delineating the extent of embryonic reprogramming that occurs at these loci during human development.

...read moreread less

Journal Article•DOI•

MultiBLUP: improved SNP-based prediction for complex traits

[...]

Doug Speed¹, David J. Balding¹•Institutions (1)

University College London¹

24 Jun 2014-Genome Research

TL;DR: MultiBLUP is proposed, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances, and is computationally very efficient.

...read moreread less

Abstract: BLUP (best linear unbiased prediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for example, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient; for the largest data set, which includes 12,678 individuals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK.

...read moreread less

Collapse