scispace - formally typeset
Search or ask a question

Showing papers on "Genome published in 2013"


Journal ArticleDOI
TL;DR: A set of tools for Cas9-mediated genome editing via nonhomologous end joining (NHEJ) or homology-directed repair (HDR) in mammalian cells, as well as generation of modified cell lines for downstream functional studies are described.
Abstract: Targeted nucleases are powerful tools for mediating genome alteration with high precision. The RNA-guided Cas9 nuclease from the microbial clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system can be used to facilitate efficient genome engineering in eukaryotic cells by simply specifying a 20-nt targeting sequence within its guide RNA. Here we describe a set of tools for Cas9-mediated genome editing via nonhomologous end joining (NHEJ) or homology-directed repair (HDR) in mammalian cells, as well as generation of modified cell lines for downstream functional studies. To minimize off-target cleavage, we further describe a double-nicking strategy using the Cas9 nickase mutant with paired guide RNAs. This protocol provides experimentally derived guidelines for the selection of target sites, evaluation of cleavage efficiency and analysis of off-target activity. Beginning with target design, gene modifications can be achieved within as little as 1-2 weeks, and modified clonal cell lines can be derived within 2-3 weeks.

8,663 citations


Journal ArticleDOI
29 Mar 2013-Science
TL;DR: This work has revealed the genomic landscapes of common forms of human cancer, which consists of a small number of “mountains” (genes altered in a high percentage of tumors) and a much larger number of "hills" (Genes altered infrequently).
Abstract: Over the past decade, comprehensive sequencing efforts have revealed the genomic landscapes of common forms of human cancer. For most cancer types, this landscape consists of a small number of “mountains” (genes altered in a high percentage of tumors) and a much larger number of “hills” (genes altered infrequently). To date, these studies have revealed ~140 genes that, when altered by intragenic mutations, can promote or “drive” tumorigenesis. A typical tumor contains two to eight of these “driver gene” mutations; the remaining mutations are passengers that confer no selective growth advantage. Driver genes can be classified into 12 signaling pathways that regulate three core cellular processes: cell fate, cell survival, and genome maintenance. A better understanding of these pathways is one of the most pressing needs in basic cancer research. Even now, however, our knowledge of cancer genomes is sufficient to guide the development of more effective approaches for reducing cancer morbidity and mortality.

6,441 citations


Journal ArticleDOI
Michael S. Lawrence1, Petar Stojanov1, Petar Stojanov2, Paz Polak3, Paz Polak2, Paz Polak1, Gregory V. Kryukov3, Gregory V. Kryukov2, Gregory V. Kryukov1, Kristian Cibulskis1, Andrey Sivachenko1, Scott L. Carter1, Chip Stewart1, Craig H. Mermel1, Craig H. Mermel2, Steven A. Roberts4, Adam Kiezun1, Peter S. Hammerman1, Peter S. Hammerman2, Aaron McKenna1, Aaron McKenna5, Yotam Drier, Lihua Zou1, Alex H. Ramos1, Trevor J. Pugh2, Trevor J. Pugh1, Nicolas Stransky1, Elena Helman1, Elena Helman6, Jaegil Kim1, Carrie Sougnez1, Lauren Ambrogio1, Elizabeth Nickerson1, Erica Shefler1, Maria L. Cortes1, Daniel Auclair1, Gordon Saksena1, Douglas Voet1, Michael S. Noble1, Daniel DiCara1, Pei Lin1, Lee Lichtenstein1, David I. Heiman1, Timothy Fennell1, Marcin Imielinski1, Marcin Imielinski2, Bryan Hernandez1, Eran Hodis2, Eran Hodis1, Sylvan C. Baca1, Sylvan C. Baca2, Austin M. Dulak2, Austin M. Dulak1, Jens G. Lohr1, Jens G. Lohr2, Dan A. Landau7, Dan A. Landau2, Dan A. Landau1, Catherine J. Wu2, Jorge Melendez-Zajgla, Alfredo Hidalgo-Miranda, Amnon Koren1, Amnon Koren2, Steven A. McCarroll2, Steven A. McCarroll1, Jaume Mora8, Ryan S. Lee2, Ryan S. Lee9, Brian D. Crompton9, Brian D. Crompton2, Robert C. Onofrio1, Melissa Parkin1, Wendy Winckler1, Kristin G. Ardlie1, Stacey Gabriel1, Charles W. M. Roberts2, Charles W. M. Roberts9, Jaclyn A. Biegel10, Kimberly Stegmaier9, Kimberly Stegmaier1, Kimberly Stegmaier2, Adam J. Bass2, Adam J. Bass1, Levi A. Garraway2, Levi A. Garraway1, Matthew Meyerson2, Matthew Meyerson1, Todd R. Golub, Dmitry A. Gordenin4, Shamil R. Sunyaev3, Shamil R. Sunyaev1, Shamil R. Sunyaev2, Eric S. Lander2, Eric S. Lander6, Eric S. Lander1, Gad Getz1, Gad Getz2 
11 Jul 2013-Nature
TL;DR: A fundamental problem with cancer genome studies is described: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds and the list includes many implausible genes, suggesting extensive false-positive findings that overshadow true driver events.
Abstract: Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.

4,411 citations


Journal ArticleDOI
Kerstin Howe, Matthew D. Clark, Carlos Torroja1, Carlos Torroja2  +171 moreInstitutions (11)
25 Apr 2013-Nature
TL;DR: A high-quality sequence assembly of the zebrafish genome is generated, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map, providing a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebra fish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
Abstract: Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

3,573 citations


Journal ArticleDOI
12 Sep 2013-Cell
TL;DR: In this paper, an approach that combines a Cas9 nickase mutant with paired guide RNAs to introduce targeted double-strand breaks is described. But the approach is limited to mouse zygotes.

3,026 citations


Journal ArticleDOI
TL;DR: PolyPhen‐2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations.
Abstract: PolyPhen-2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single-nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes, and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen-2 features include a high-quality multiple protein sequence alignment pipeline and a prediction method employing machine-learning classification. The software also integrates the UCSC Genome Browser's human genome annotations and MultiZ multiple alignments of vertebrate genomes with the human genome. PolyPhen-2 is capable of analyzing large volumes of data produced by next-generation sequencing projects, thanks to built-in support for high-performance computing environments like Grid Engine and Platform LSF.

2,681 citations


01 Dec 2013
TL;DR: A pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single-guide RNA (sgRNA) library is described and it is shown that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs.
Abstract: The bacterial CRISPR/Cas9 system for genome editing has greatly expanded the toolbox for mammalian genetics, enabling the rapid generation of isogenic cell lines and mice with modified alleles. Here, we describe a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single guide RNA (sgRNA) library. sgRNA expression cassettes were stably integrated into the genome, which enabled a complex mutant pool to be tracked by massively parallel sequencing. We used a library containing 73,000 sgRNAs to generate knockout collections and performed screens in two human cell lines. A screen for resistance to the nucleotide analog 6-thioguanine identified all expected members of the DNA mismatch repair pathway, while another for the DNA topoisomerase II (TOP2A) poison etoposide identified TOP2A, as expected, and also cyclin-dependent kinase 6, CDK6. A negative selection screen for essential genes identified numerous gene sets corresponding to fundamental processes. Finally, we show that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs. Collectively, these results establish Cas9/ sgRNA screens as a powerful tool for systematic genetic analysis in mammalian cells.

2,130 citations


01 Sep 2013
TL;DR: It is demonstrated that using paired nicking can reduce off-target activity by 50- to 1,500-fold in cell lines and to facilitate gene knockout in mouse zygotes without sacrificing on-target cleavage efficiency.
Abstract: Targeted genome editing technologies have enabled a broad range of research and medical applications. The Cas9 nuclease from the microbial CRISPR-Cas system is targeted to specific genomic loci by a 20 nt guide sequence, which can tolerate certain mismatches to the DNA target and thereby promote undesired off-target mutagenesis. Here, we describe an approach that combines a Cas9 nickase mutant with paired guide RNAs to introduce targeted double-strand breaks. Because individual nicks in the genome are repaired with high fidelity, simultaneous nicking via appropriately offset guide RNAs is required for double-stranded breaks and extends the number of specifically recognized bases for target cleavage. We demonstrate that using paired nicking can reduce off-target activity by 50- to 1,500-fold in cell lines and to facilitate gene knockout in mouse zygotes without sacrificing on-target cleavage efficiency. This versatile strategy enables a wide variety of genome editing applications that require high specificity.

1,947 citations


Journal ArticleDOI
TL;DR: It is shown that complexes of the Cas9 protein and artificial chimeric RNAs efficiently cleave two genomic sites and induce indels with frequencies of up to 33% in human cells.
Abstract: We employ the CRISPR-Cas system of Streptococcus pyogenes as programmable RNA-guided endonucleases (RGENs) to cleave DNA in a targeted manner for genome editing in human cells. We show that complexes of the Cas9 protein and artificial chimeric RNAs efficiently cleave two genomic sites and induce indels with frequencies of up to 33%.

1,893 citations


Journal ArticleDOI
26 Sep 2013-Nature
TL;DR: Se sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project—the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences discover extremely widespread genetic variation affecting the regulation of most genes.
Abstract: Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

1,892 citations


Journal ArticleDOI
TL;DR: This system is engineer to enable RNA-guided genome regulation in human cells by tethering transcriptional activation domains either directly to a nuclease-null Cas9 protein or to an aptamer-modified single guide RNA (sgRNA).
Abstract: Prokaryotic type II CRISPR-Cas systems can be adapted to enable targeted genome modifications across a range of eukaryotes. Here we engineer this system to enable RNA-guided genome regulation in human cells by tethering transcriptional activation domains either directly to a nuclease-null Cas9 protein or to an aptamer-modified single guide RNA (sgRNA). Using this functionality we developed a transcriptional activation-based assay to determine the landscape of off-target binding of sgRNA:Cas9 complexes and compared it with the off-target activity of transcription activator-like (TALs) effectors. Our results reveal that specificity profiles are sgRNA dependent, and that sgRNA:Cas9 complexes and 18-mer TAL effectors can potentially tolerate 1-3 and 1-2 target mismatches, respectively. By engineering a requirement for cooperativity through offset nicking for genome editing or through multiple synergistic sgRNAs for robust transcriptional activation, we suggest methods to mitigate off-target phenomena. Our results expand the versatility of the sgRNA:Cas9 tool and highlight the critical need to engineer improved specificity.

Journal ArticleDOI
22 Feb 2013-Science
TL;DR: Two independent mutations within the core promoter of telomerase reverse transcriptase (TERT) are described, which collectively occur in 50 of 70 melanomas examined, suggesting somatic mutations in regulatory regions of the genome may represent an important tumorigenic mechanism.
Abstract: Systematic sequencing of human cancer genomes has identified many recurrent mutations in the protein-coding regions of genes but rarely in gene regulatory regions. Here, we describe two independent mutations within the core promoter of telomerase reverse transcriptase (TERT), the gene coding for the catalytic subunit of telomerase, which collectively occur in 50 of 70 (71%) melanomas examined. These mutations generate de novo consensus binding motifs for E-twenty-six (ETS) transcription factors, and in reporter assays, the mutations increased transcriptional activity from the TERT promoter by two- to fourfold. Examination of 150 cancer cell lines derived from diverse tumor types revealed the same mutations in 24 cases (16%), with preliminary evidence of elevated frequency in bladder and hepatocellular cancer cells. Thus, somatic mutations in regulatory regions of the genome may represent an important tumorigenic mechanism.

Journal ArticleDOI
TL;DR: The use of type II bacterial CRISPR-Cas system in Saccharomyces cerevisiae for genome engineering provides foundations for a simple and powerful genome engineering tool for site-specific mutagenesis and allelic replacement in yeast.
Abstract: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) systems in bacteria and archaea use RNA-guided nuclease activity to provide adaptive immunity against invading foreign nucleic acids. Here, we report the use of type II bacterial CRISPR-Cas system in Saccharomyces cerevisiae for genome engineering. The CRISPR-Cas components, Cas9 gene and a designer genome targeting CRISPR guide RNA (gRNA), show robust and specific RNA-guided endonuclease activity at targeted endogenous genomic loci in yeast. Using constitutive Cas9 expression and a transient gRNA cassette, we show that targeted double-strand breaks can increase homologous recombination rates of single- and doublestranded oligonucleotide donors by 5-fold and 130-fold, respectively. In addition, co-transformation of a gRNA plasmid and a donor DNA in cells constitutively expressing Cas9 resulted in near 100% donor DNA recombination frequency. Our approach provides foundations for a simple and powerful genome engineering tool for site-specific mutagenesis and allelic replacement in yeast.

Journal ArticleDOI
06 Feb 2013-Rice
TL;DR: A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice.
Abstract: Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group). The Nipponbare genome assembly was updated by revising and validating the minimal tiling path of clones with the optical map for rice. Sequencing errors in the revised genome assembly were identified by re-sequencing the genome of two different Nipponbare individuals using the Illumina Genome Analyzer II/IIx platform. A total of 4,886 sequencing errors were identified in 321 Mb of the assembled genome indicating an error rate in the original IRGSP assembly of only 0.15 per 10,000 nucleotides. A small number (five) of insertions/deletions were identified using longer reads generated using the Roche 454 pyrosequencing platform. As the re-sequencing data were generated from two different individuals, we were able to identify a number of allelic differences between the original individual used in the IRGSP effort and the two individuals used in the re-sequencing effort. The revised assembly, termed Os-Nipponbare-Reference-IRGSP-1.0, is now being used in updated releases of the Rice Annotation Project and the Michigan State University Rice Genome Annotation Project, thereby providing a unified set of pseudomolecules for the rice community. A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice. Detection of polymorphisms between three different Nipponbare individuals highlights that allelic differences between individuals should be considered in diversity studies.

Journal ArticleDOI
03 Oct 2013-Nature
TL;DR: Single-cell Hi-C data bridge current gaps between genomics and microscopy studies of chromosomes, demonstrating how modular organization underlies dynamic chromosome structure, and how this structure is probabilistically linked with genome activity patterns.
Abstract: Large-scale chromosome structure and spatial nuclear arrangement have been linked to control of gene expression and DNA replication and repair Genomic techniques based on chromosome conformation capture (3C) assess contacts for millions of loci simultaneously, but do so by averaging chromosome conformations from millions of nuclei Here we introduce single-cell Hi-C, combined with genome-wide statistical analysis and structural modelling of single-copy X chromosomes, to show that individual chromosomes maintain domain organization at the megabase scale, but show variable cell-to-cell chromosome structures at larger scales Despite this structural stochasticity, localization of active gene domains to boundaries of chromosome territories is a hallmark of chromosomal conformation Single-cell Hi-C data bridge current gaps between genomics and microscopy studies of chromosomes, demonstrating how modular organization underlies dynamic chromosome structure, and how this structure is probabilistically linked with genome activity patterns

Journal ArticleDOI
22 May 2013-Nature
TL;DR: The draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm, is presented, revealing numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs, which opens up new genomic avenues for conifer forestry and breeding.
Abstract: Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.

Journal ArticleDOI
TL;DR: Multiplex and homologous recombination–mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9 is described.
Abstract: Multiplex and homologous recombination–mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9

Journal ArticleDOI
TL;DR: OrganellarGenomeDraw (OGDRAW), a suite of software tools that enable users to create high-quality visual representations of both circular and linear annotated genome sequences provided as GenBank files or accession numbers, is developed.
Abstract: Mitochondria and plastids (chloroplasts) are cell organelles of endosymbiotic origin that possess their own genetic information. Most organellar DNAs map as circular double-stranded genomes. Across the eukaryotic kingdom, organellar genomes display great size variation, ranging from ∼15 to 20 kb (the size of the mitochondrial genome in most animals) to >10 Mb (the size of the mitochondrial genome in some lineages of flowering plants). We have developed OrganellarGenomeDraw (OGDRAW), a suite of software tools that enable users to create high-quality visual representations of both circular and linear annotated genome sequences provided as GenBank files or accession numbers. Although all types of DNA sequences are accepted as input, the software has been specifically optimized to properly depict features of organellar genomes. A recent extension facilitates the plotting of quantitative gene expression data, such as transcript or protein abundance data, directly onto the genome map. OGDRAW has already become widely used and is available as a free web tool (http://ogdraw.mpimp-golm.mpg.de/). The core processing components can be downloaded as a Perl module, thus also allowing for convenient integration into custom processing pipelines.

Journal ArticleDOI
TL;DR: An improved CRISPR/Cas system in zebra fish with custom guide RNAs and a zebrafish codon-optimized Cas9 protein that efficiently targeted a reporter transgene Tg(-5.1mnx1:egfp) and four endogenous loci and five genomic loci, resulting in multiple loss-of-function phenotypes in the same injected fish.
Abstract: A simple and robust method for targeted mutagenesis in zebrafish has long been sought. Previous methods generate monoallelic mutations in the germ line of F0 animals, usually delaying homozygosity for the mutation to the F2 generation. Generation of robust biallelic mutations in the F0 would allow for phenotypic analysis directly in injected animals. Recently the type II prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated proteins (Cas) system has been adapted to serve as a targeted genome mutagenesis tool. Here we report an improved CRISPR/Cas system in zebrafish with custom guide RNAs and a zebrafish codon-optimized Cas9 protein that efficiently targeted a reporter transgene Tg(-5.1mnx1:egfp) and four endogenous loci (tyr, golden, mitfa, and ddx19). Mutagenesis rates reached 75–99%, indicating that most cells contained biallelic mutations. Recessive null-like phenotypes were observed in four of the five targeting cases, supporting high rates of biallelic gene disruption. We also observed efficient germ-line transmission of the Cas9-induced mutations. Finally, five genomic loci can be targeted simultaneously, resulting in multiple loss-of-function phenotypes in the same injected fish. This CRISPR/Cas9 system represents a highly effective and scalable gene knockout method in zebrafish and has the potential for applications in other model organisms.

Journal ArticleDOI
28 Mar 2013-Cell
TL;DR: This work has revealed scores of new cancer genes, including many in processes not previously known to be causal targets in cancer, and the genes affect cell signaling, chromatin, and epigenomic regulation; RNA splicing; protein homeostasis; metabolism; and lineage maturation.

Journal ArticleDOI
TL;DR: Reanalysis of published metagenomes reveals that differential coverage binning facilitates recovery of more complete and higher fidelity genome bins than other currently used methods, which are primarily based on sequence composition.
Abstract: Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition-independent approach to recover high-quality microbial genomes from deeply sequenced metagenomes. Multiple metagenomes of the same community, which differ in relative population abundances, were used to assemble 31 bacterial genomes, including rare (<1% relative abundance) species, from an activated sludge bioreactor. Twelve genomes were assembled into complete or near-complete chromosomes. Four belong to the candidate bacterial phylum TM7 and represent the most complete genomes for this phylum to date (relative abundances, 0.06-1.58%). Reanalysis of published metagenomes reveals that differential coverage binning facilitates recovery of more complete and higher fidelity genome bins than other currently used methods, which are primarily based on sequence composition. This approach will be an important addition to the standard metagenome toolbox and greatly improve access to genomes of uncultured microorganisms.

Journal ArticleDOI
TL;DR: Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project, so genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres.
Abstract: Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving--for the human genome--98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.

Journal ArticleDOI
TL;DR: Comparisons showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.
Abstract: Rosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.

Journal ArticleDOI
27 Feb 2013-PLOS ONE
TL;DR: It is demonstrated that the information on 16S rRNA copy numbers and genome sizes of genome-sequenced bacteria may be used as an estimate for the closest related taxon in an environmental dataset to calculate alternative estimates of the relative abundance of individual bacterial taxa in environmental samples.
Abstract: 16S ribosomal RNA currently represents the most important target of study in bacterial ecology. Its use for the description of bacterial diversity is, however, limited by the presence of variable copy numbers in bacterial genomes and sequence variation within closely related taxa or within a genome. Here we use the information from sequenced bacterial genomes to explore the variability of 16S rRNA sequences and copy numbers at various taxonomic levels and apply it to estimate bacterial genome and DNA abundances. In total, 7,081 16S rRNA sequences were in silico extracted from 1,690 available bacterial genomes (1–15 per genome). While there are several phyla containing low 16S rRNA copy numbers, in certain taxa, e.g., the Firmicutes and Gammaproteobacteria, the variation is large. Genome sizes are more conserved at all tested taxonomic levels than 16S rRNA copy numbers. Only a minority of bacterial genomes harbors identical 16S rRNA gene copies, and sequence diversity increases with increasing copy numbers. While certain taxa harbor dissimilar 16S rRNA genes, others contain sequences common to multiple species. Sequence identity clusters (often termed operational taxonomic units) thus provide an imperfect representation of bacterial taxa of a certain phylogenetic rank. We have demonstrated that the information on 16S rRNA copy numbers and genome sizes of genome-sequenced bacteria may be used as an estimate for the closest related taxon in an environmental dataset to calculate alternative estimates of the relative abundance of individual bacterial taxa in environmental samples. Using an example from forest soil, this procedure would increase the abundance estimates of Acidobacteria and decrease these of Firmicutes. Using the currently available information, alternative estimates of bacterial community composition may be obtained in this way if the variation of 16S rRNA copy numbers among bacteria is considered.

Journal ArticleDOI
TL;DR: The use of clustered, regularly interspaced, short palindromic repeats (CRISPR)-associated endonuclease Cas9 to target genomic sequences in the Caenorhabditis elegans germ line using single-guide RNAs that are expressed from a U6 small nuclear RNA promoter is reported.
Abstract: CRISPR-Cas systems have been used with single-guide RNAs for accurate gene disruption and conversion in multiple biological systems. Here we report the use of the endonuclease Cas9 to target genomic sequences in the C. elegans germline, utilizing single-guide RNAs that are expressed from a U6 small nuclear RNA promoter. Our results demonstrate that targeted, heritable genetic alterations can be achieved in C. elegans, providing a convenient and effective approach for generating loss-of-function mutants.

Journal ArticleDOI
TL;DR: A method to edit the C. elegans genome using the clustered, regularly interspersed, short palindromic repeats (CRISPR) RNA-guided Cas9 nuclease and homologous recombination and it is demonstrated that Cas9 is able to induce DNA double-strand breaks with specificity for targeted sites.
Abstract: Study of the nematode Caenorhabditis elegans has provided important insights in a wide range of fields in biology. The ability to precisely modify genomes is critical to fully realize the utility of model organisms. Here we report a method to edit the C. elegans genome using the clustered, regularly interspersed, short palindromic repeats (CRISPR) RNA-guided Cas9 nuclease and homologous recombination. We demonstrate that Cas9 is able to induce DNA double-strand breaks with specificity for targeted sites and that these breaks can be repaired efficiently by homologous recombination. By supplying engineered homologous repair templates, we generated gfp knock-ins and targeted mutations. Together our results outline a flexible methodology to produce essentially any desired modification in the C. elegans genome quickly and at low cost. This technology is an important addition to the array of genetic techniques already available in this experimentally tractable model organism.

Journal ArticleDOI
01 Mar 2013-Science
TL;DR: STARR-seq identifies thousands of cell type–specific enhancers across a broad continuum of strengths, links differential gene expression to differences in enhancer activity, and creates a genome-wide quantitative enhancer map, revealing the highly complex regulation of transcription.
Abstract: Genomic enhancers are important regulators of gene expression, but their identification is a challenge, and methods depend on indirect measures of activity. We developed a method termed STARR-seq to directly and quantitatively assess enhancer activity for millions of candidates from arbitrary sources of DNA, which enables screens across entire genomes. When applied to the Drosophila genome, STARR-seq identifies thousands of cell type–specific enhancers across a broad continuum of strengths, links differential gene expression to differences in enhancer activity, and creates a genome-wide quantitative enhancer map. This map reveals the highly complex regulation of transcription, with several independent enhancers for both developmental regulators and ubiquitously expressed genes. STARR-seq can be used to identify and quantify enhancer activity in other eukaryotes, including humans.

Journal ArticleDOI
TL;DR: It is reported that RNA-guided Cas9 nuclease efficiently facilitates genome editing in both mammalian cells and zebrafish embryos in a simple and robust manner and achieves site-specific insertion of mloxP sequence induced by Cas9/gRNA system in zebra fish embryos.
Abstract: Recent advances with the type II clustered regularly interspaced short palindromic repeats (CRISPR) system promise an improved approach to genome editing. However, the applicability and efficiency of this system in model organisms, such as zebrafish, are little studied. Here, we report that RNA-guided Cas9 nuclease efficiently facilitates genome editing in both mammalian cells and zebrafish embryos in a simple and robust manner. Over 35% of site-specific somatic mutations were found when specific Cas/gRNA was used to target either etsrp, gata4 or gata5 in zebrafish embryos in vivo. The Cas9/gRNA efficiently induced biallelic conversion of etsrp or gata5 in the resulting somatic cells, recapitulating their respective vessel phenotypes in etsrpy11 mutant embryos or cardia bifida phenotypes in fautm236a mutant embryos. Finally, we successfully achieved site-specific insertion of mloxP sequence induced by Cas9/gRNA system in zebrafish embryos. These results demonstrate that the Cas9/gRNA system has the potential of becoming a simple, robust and efficient reverse genetic tool for zebrafish and other model organisms. Together with other genome-engineering technologies, the Cas9 system is promising for applications in biology, agriculture, environmental studies and medicine.

Journal ArticleDOI
TL;DR: Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis.
Abstract: Oranges are an important nutritional source for human health and have immense economic value Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis) The assembled sequence covers 873% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements We predicted 29,445 protein-coding genes, half of which are in the heterozygous state With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future

Journal ArticleDOI
28 Mar 2013-Cell
TL;DR: The ways in which alterations in the genome and epigenome influence each other and cooperate to promote oncogenic transformation are explored.