scispace - formally typeset
Search or ask a question

Showing papers on "Genome published in 2015"


Journal ArticleDOI
TL;DR: An objective measure of genome quality is proposed that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities and is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches.
Abstract: Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.

5,788 citations


Journal ArticleDOI
TL;DR: COSMIC, the Catalogue Of Somatic Mutations In Cancer is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer, describing 2 002 811 coding point mutations in over one million tumor samples and across most human genes.
Abstract: COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk) is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. Our latest release (v70; Aug 2014) describes 2 002 811 coding point mutations in over one million tumor samples and across most human genes. To emphasize depth of knowledge on known cancer genes, mutation information is curated manually from the scientific literature, allowing very precise definitions of disease types and patient details. Combination of almost 20 000 published studies gives substantial resolution of how mutations and phenotypes relate in human cancer, providing insights into the stratification of mutations and biomarkers across cancer patient populations. Conversely, our curation of cancer genomes (over 12 000) emphasizes knowledge breadth, driving discovery of unrecognized cancer-driving hotspots and molecular targets. Our high-resolution curation approach is globally unique, giving substantial insight into molecular biomarkers in human oncology. In addition, COSMIC also details more than six million noncoding mutations, 10 534 gene fusions, 61 299 genome rearrangements, 695 504 abnormal copy number segments and 60 119 787 abnormal expression variants. All these types of somatic mutation are annotated to both the human genome and each affected coding gene, then correlated across disease and mutation types.

2,229 citations


Journal ArticleDOI
TL;DR: A web server to predict the functional effect of single or multiple amino acid substitutions, insertions and deletions using the prediction tool PROVEAN, which provides rapid analysis of protein variants from any organisms, and also supports high-throughput analysis for human and mouse variants at both the genomic and protein levels.
Abstract: Summary: We present a web server to predict the functional effect of single or multiple amino acid substitutions, insertions and deletions using the prediction tool PROVEAN. The server provides rapid analysis of protein variants from any organisms, and also supports high-throughput analysis for human and mouse variants at both the genomic and protein levels. Availability and implementation: The web server is freely available and open to all users with no login requirements at http://provean.jcvi.org. Contact: gro.ivcj@nahca Supplementary information: Supplementary data are available at Bioinformatics online.

1,886 citations


01 Apr 2015
TL;DR: In this paper, the RNA-guided endonuclease Cas9 has emerged as a versatile genome-editing platform and has been used for basic research and therapeutic applications that use the highly versatile adeno-associated virus (AAV) delivery vehicle.
Abstract: The RNA-guided endonuclease Cas9 has emerged as a versatile genome-editing platform. However, the size of the commonly used Cas9 from Streptococcus pyogenes (SpCas9) limits its utility for basic research and therapeutic applications that use the highly versatile adeno-associated virus (AAV) delivery vehicle. Here, we characterize six smaller Cas9 orthologues and show that Cas9 from Staphylococcus aureus (SaCas9) can edit the genome with efficiencies similar to those of SpCas9, while being more than 1 kilobase shorter. We packaged SaCas9 and its single guide RNA expression cassette into a single AAV vector and targeted the cholesterol regulatory gene Pcsk9 in the mouse liver. Within one week of injection, we observed >40% gene modification, accompanied by significant reductions in serum Pcsk9 and total cholesterol levels. We further assess the genome-wide targeting specificity of SaCas9 and SpCas9 using BLESS, and demonstrate that SaCas9-mediated in vivo genome editing has the potential to be efficient and specific.

1,826 citations


Journal ArticleDOI
09 Apr 2015-Nature
TL;DR: Six smaller Cas9 orthologues are characterized and it is shown that Cas9 from Staphylococcus aureus (SaCas9) can edit the genome with efficiencies similar to those of SpCas9, while being more than 1 kilobase shorter.
Abstract: The RNA-guided endonuclease Cas9 has emerged as a versatile genome-editing platform. However, the size of the commonly used Cas9 from Streptococcus pyogenes (SpCas9) limits its utility for basic research and therapeutic applications that use the highly versatile adeno-associated virus (AAV) delivery vehicle. Here, we characterize six smaller Cas9 orthologues and show that Cas9 from Staphylococcus aureus (SaCas9) can edit the genome with efficiencies similar to those of SpCas9, while being more than 1 kilobase shorter. We packaged SaCas9 and its single guide RNA expression cassette into a single AAV vector and targeted the cholesterol regulatory gene Pcsk9 in the mouse liver. Within one week of injection, we observed >40% gene modification, accompanied by significant reductions in serum Pcsk9 and total cholesterol levels. We further assess the genome-wide targeting specificity of SaCas9 and SpCas9 using BLESS, and demonstrate that SaCas9-mediated in vivo genome editing has the potential to be efficient and specific.

1,756 citations


Journal ArticleDOI
23 Jul 2015-Nature
TL;DR: A robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform is developed and single-cell analysis of DNA accessibility provides new insight into cellular variation of the ‘regulome’.
Abstract: Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.

1,677 citations



Journal ArticleDOI
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

1,674 citations


Journal ArticleDOI
TL;DR: The RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines and offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job.
Abstract: The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

1,666 citations


Journal ArticleDOI
TL;DR: Online Mendelian Inheritance in Man, OMIM®, is a comprehensive, authoritative and timely research resource of curated descriptions of human genes and phenotypes and the relationships between them.
Abstract: Online Mendelian Inheritance in Man, OMIM(®), is a comprehensive, authoritative and timely research resource of curated descriptions of human genes and phenotypes and the relationships between them. The new official website for OMIM, OMIM.org (http://omim.org), was launched in January 2011. OMIM is based on the published peer-reviewed biomedical literature and is used by overlapping and diverse communities of clinicians, molecular biologists and genome scientists, as well as by students and teachers of these disciplines. Genes and phenotypes are described in separate entries and are given unique, stable six-digit identifiers (MIM numbers). OMIM entries have a structured free-text format that provides the flexibility necessary to describe the complex and nuanced relationships between genes and genetic phenotypes in an efficient manner. OMIM also has a derivative table of genes and genetic phenotypes, the Morbid Map. OMIM.org has enhanced search capabilities such as genome coordinate searching and thesaurus-enhanced search term options. Phenotypic series have been created to facilitate viewing genetic heterogeneity of phenotypes. Clinical synopsis features are enhanced with UMLS, Human Phenotype Ontology and Elements of Morphology terms and image links. All OMIM data are available for FTP download and through an API. MIMmatch is a novel outreach feature to disseminate updates and encourage collaboration.

1,613 citations


Journal ArticleDOI
TL;DR: Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity.
Abstract: The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates' recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X.

Journal ArticleDOI
TL;DR: A robust CRISPR/Cas9 vector system, utilizing a plant codon optimized Cas9 gene, for convenient and high-efficiency multiplex genome editing in monocot and dicot plants and provides examples of loss-of-function gene mutations in T0 rice and Arabidopsis plants.

Journal ArticleDOI
27 Nov 2015-Science
TL;DR: Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, this article constructed a genome-wide single-guide RNA library to screen for genes required for proliferation and survival in a human cancer cell line.
Abstract: Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated with an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Last, screens in additional cell lines showed a high degree of overlap in gene essentiality but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells.

Journal ArticleDOI
TL;DR: Genomic signatures of selection and domestication are associated with positively selected genes (PSGs) for fiber improvement in the A subgenome and for stress tolerance in the D subgenomes, suggesting asymmetric evolution.
Abstract: Upland cotton is a model for polyploid crop domestication and transgenic improvement. Here we sequenced the allotetraploid Gossypium hirsutum L. acc. TM-1 genome by integrating whole-genome shotgun reads, bacterial artificial chromosome (BAC)-end sequences and genotype-by-sequencing genetic maps. We assembled and annotated 32,032 A-subgenome genes and 34,402 D-subgenome genes. Structural rearrangements, gene loss, disrupted genes and sequence divergence were more common in the A subgenome than in the D subgenome, suggesting asymmetric evolution. However, no genome-wide expression dominance was found between the subgenomes. Genomic signatures of selection and domestication are associated with positively selected genes (PSGs) for fiber improvement in the A subgenome and for stress tolerance in the D subgenome. This draft genome sequence provides a resource for engineering superior cotton lines.

Journal ArticleDOI
TL;DR: This work has assembled de novo the Escherichia coli K-12 MG1655 chromosome in a single 4.6-Mb contig using only nanopore data and reconstructs gene order and has 99.5% nucleotide identity.
Abstract: We have assembled de novo the Escherichia coli K-12 MG1655 chromosome in a single 4.6-Mb contig using only nanopore data. Our method has three stages: (i) overlaps are detected between reads and then corrected by a multiple-alignment process; (ii) corrected reads are assembled using the Celera Assembler; and (iii) the assembly is polished using a probabilistic model of the signal-level data. The assembly reconstructs gene order and has 99.5% nucleotide identity.

Journal ArticleDOI
14 May 2015-Nature
TL;DR: The discovery of ‘Lokiarchaeota’ is described, a novel candidate archaeal phylum which forms a monophyletic group with eukaryotes in phylogenomic analyses, and whose genomes encode an expanded repertoire of eUKaryotic signature proteins that are suggestive of sophisticated membrane remodelling capabilities.
Abstract: The origin of the eukaryotic cell remains one of the most contentious puzzles in modern biology. Recent studies have provided support for the emergence of the eukaryotic host cell from within the archaeal domain of life, but the identity and nature of the putative archaeal ancestor remain a subject of debate. Here we describe the discovery of 'Lokiarchaeota', a novel candidate archaeal phylum, which forms a monophyletic group with eukaryotes in phylogenomic analyses, and whose genomes encode an expanded repertoire of eukaryotic signature proteins that are suggestive of sophisticated membrane remodelling capabilities. Our results provide strong support for hypotheses in which the eukaryotic host evolved from a bona fide archaeon, and demonstrate that many components that underpin eukaryote-specific features were already present in that ancestor. This provided the host with a rich genomic 'starter-kit' to support the increase in the cellular and genomic complexity that is characteristic of eukaryotes.

Posted ContentDOI
13 May 2015-bioRxiv
TL;DR: Roary is introduced, a tool that rapidly builds large-scale pan genomes, identifying the core and dispensable accessory genes and making construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results.
Abstract: A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and dispensable accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors.

Journal ArticleDOI
TL;DR: Transfected preassembled complexes of purified Cas9 protein and guide RNA into plant protoplasts of Arabidopsis thaliana, tobacco, lettuce and rice and achieved targeted mutagenesis in regenerated plants at frequencies of up to 46%.
Abstract: Editing plant genomes without introducing foreign DNA into cells may alleviate regulatory concerns related to genetically modified plants. We transfected preassembled complexes of purified Cas9 protein and guide RNA into plant protoplasts of Arabidopsis thaliana, tobacco, lettuce and rice and achieved targeted mutagenesis in regenerated plants at frequencies of up to 46%. The targeted sites contained germline-transmissible small insertions or deletions that are indistinguishable from naturally occurring genetic variation.

Journal ArticleDOI
TL;DR: This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication, to contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory.
Abstract: The honey bee is an important model system for increasing understanding of molecular and neural mechanisms underlying social behaviors relevant to the agricultural industry and basic science. The western honey bee, Apis mellifera, has served as a model species, and its genome sequence has been published. In contrast, the genome of the Asian honey bee, Apis cerana, has not yet been sequenced. A. cerana has been raised in Asian countries for thousands of years and has brought considerable economic benefits to the apicultural industry. A cerana has divergent biological traits compared to A. mellifera and it has played a key role in maintaining biodiversity in eastern and southern Asia. Here we report the first whole genome sequence of A. cerana. Using de novo assembly methods, we produced a 238 Mbp draft of the A. cerana genome and generated 10,651 genes. A.cerana-specific genes were analyzed to better understand the novel characteristics of this honey bee species. Seventy-two percent of the A. cerana-specific genes had more than one GO term, and 1,696 enzymes were categorized into 125 pathways. Genes involved in chemoreception and immunity were carefully identified and compared to those from other sequenced insect models. These included 10 gustatory receptors, 119 odorant receptors, 10 ionotropic receptors, and 160 immune-related genes. This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication. These important tools will contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory.

Journal ArticleDOI
TL;DR: Co-delivering chemically modified sgRNAs with Cas9 mRNA or protein is an efficient RNA- or ribonucleoprotein (RNP)-based delivery method for the CRISPR-Cas system, without the toxicity associated with DNA delivery.
Abstract: CRISPR-Cas-mediated genome editing relies on guide RNAs that direct site-specific DNA cleavage facilitated by the Cas endonuclease. Here we report that chemical alterations to synthesized single guide RNAs (sgRNAs) enhance genome editing efficiency in human primary T cells and CD34+ hematopoietic stem and progenitor cells. Co-delivering chemically modified sgRNAs with Cas9 mRNA or protein is an efficient RNA- or ribonucleoprotein (RNP)-based delivery method for the CRISPR-Cas system, without the toxicity associated with DNA delivery. This approach is a simple and effective way to streamline the development of genome editing with the potential to accelerate a wide array of biotechnological and therapeutic applications of the CRISPR-Cas technology.

Journal ArticleDOI
TL;DR: In this article, the authors use Capture Hi-C (CHi-C) to examine the long-range interactions of almost 22,000 promoters in 2 human blood cell types and identify over 1.6 million shared and cell type-restricted interactions spanning hundreds of kilobases between promoters and distal loci.
Abstract: Transcriptional control in large genomes often requires looping interactions between distal DNA elements, such as enhancers and target promoters. Current chromosome conformation capture techniques do not offer sufficiently high resolution to interrogate these regulatory interactions on a genomic scale. Here we use Capture Hi-C (CHi-C), an adapted genome conformation assay, to examine the long-range interactions of almost 22,000 promoters in 2 human blood cell types. We identify over 1.6 million shared and cell type-restricted interactions spanning hundreds of kilobases between promoters and distal loci. Transcriptionally active genes contact enhancer-like elements, whereas transcriptionally inactive genes interact with previously uncharacterized elements marked by repressive features that may act as long-range silencers. Finally, we show that interacting loci are enriched for disease-associated SNPs, suggesting how distal mutations may disrupt the regulation of relevant genes. This study provides new insights and accessible tools to dissect the regulatory interactions that underlie normal and aberrant gene regulation.

Journal ArticleDOI
28 May 2015-PeerJ
TL;DR: VirSorter is a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses.
Abstract: Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter's prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in "reverse" to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.

Journal ArticleDOI
TL;DR: Digenome-seq is a robust, sensitive, unbiased and cost-effective method for profiling genome-wide off-target effects of programmable nucleases including Cas9, and shows that Cas9 off- target effects can be avoided by replacing 'promiscuous' single guide RNAs (sgRNAs) with modified sgRNAs.
Abstract: Although RNA-guided genome editing via the CRISPR-Cas9 system is now widely used in biomedical research, genome-wide target specificities of Cas9 nucleases remain controversial. Here we present Digenome-seq, in vitro Cas9-digested whole-genome sequencing, to profile genome-wide Cas9 off-target effects in human cells. This in vitro digest yields sequence reads with the same 5' ends at cleavage sites that can be computationally identified. We validated off-target sites at which insertions or deletions were induced with frequencies below 0.1%, near the detection limit of targeted deep sequencing. We also showed that Cas9 nucleases can be highly specific, inducing off-target mutations at merely several, rather than thousands of, sites in the entire genome and that Cas9 off-target effects can be avoided by replacing 'promiscuous' single guide RNAs (sgRNAs) with modified sgRNAs. Digenome-seq is a robust, sensitive, unbiased and cost-effective method for profiling genome-wide off-target effects of programmable nucleases including Cas9.

01 Feb 2015
TL;DR: Current progress toward developing programmable nuclease–based therapies as well as future prospects and challenges are discussed.
Abstract: Recent advances in the development of genome editing technologies based on programmable nucleases have substantially improved our ability to make precise changes in the genomes of eukaryotic cells. Genome editing is already broadening our ability to elucidate the contribution of genetics to disease by facilitating the creation of more accurate cellular and animal models of pathological processes. A particularly tantalizing application of programmable nucleases is the potential to directly correct genetic mutations in affected tissues and cells to treat diseases that are refractory to traditional therapies. Here we discuss current progress toward developing programmable nuclease–based therapies as well as future prospects and challenges.

Journal ArticleDOI
17 Dec 2015-Cell
TL;DR: 3D genome simulation suggests a model of chromatin folding around chromosomal axes, where CTCF is involved in defining the interface between condensed and open compartments for structural regulation, and provides unique insights in the topological mechanism of human variations and diseases.

Journal ArticleDOI
TL;DR: A draft genome using 181-fold paired-end sequences assisted by fivefold BAC-to-BAC sequences and a high-resolution genetic map is produced for G. hirsutum, revealing conserved gene order and concerted evolution of different regulatory mechanisms for Cellulose synthase and 1-Aminocyclopropane-1-carboxylic acid oxidase1 and 3 may be important for enhanced fiber production.
Abstract: Gossypium hirsutum has proven difficult to sequence owing to its complex allotetraploid (AtDt) genome. Here we produce a draft genome using 181-fold paired-end sequences assisted by fivefold BAC-to-BAC sequences and a high-resolution genetic map. In our assembly 88.5% of the 2,173-Mb scaffolds, which cover 89.6%∼96.7% of the AtDt genome, are anchored and oriented to 26 pseudochromosomes. Comparison of this G. hirsutum AtDt genome with the already sequenced diploid Gossypium arboreum (AA) and Gossypium raimondii (DD) genomes revealed conserved gene order. Repeated sequences account for 67.2% of the AtDt genome, and transposable elements (TEs) originating from Dt seem more active than from At. Reduction in the AtDt genome size occurred after allopolyploidization. The A or At genome may have undergone positive selection for fiber traits. Concerted evolution of different regulatory mechanisms for Cellulose synthase (CesA) and 1-Aminocyclopropane-1-carboxylic acid oxidase1 and 3 (ACO1,3) may be important for enhanced fiber production in G. hirsutum.

Journal ArticleDOI
TL;DR: A targeted, continual multigene editing strategy that was applied to the Escherichia coli genome by using the Streptococcus pyogenes type II CRISPR-Cas9 system to realize a variety of precise genome modifications, including gene deletion and insertion, with the highest efficiency of 100%, is described.
Abstract: An efficient genome-scale editing tool is required for construction of industrially useful microbes. We describe a targeted, continual multigene editing strategy that was applied to the Escherichia coli genome by using the Streptococcus pyogenes type II CRISPR-Cas9 system to realize a variety of precise genome modifications, including gene deletion and insertion, with a highest efficiency of 100%, which was able to achieve simultaneous multigene editing of up to three targets. The system also demonstrated successful targeted chromosomal deletions in Tatumella citrea, another species of the Enterobacteriaceae, with highest efficiency of 100%.

Journal ArticleDOI
TL;DR: A high-resolution sequencing–based method is presented to detect G4s in the human genome and observed a high G4 density in functional regions, as well as in genes previously not predicted to contain these structures (such as BRCA2).
Abstract: G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. Here we present a high-resolution sequencing-based method to detect G4s in the human genome. We identified 716,310 distinct G4 structures, 451,646 of which were not predicted by computational methods. These included previously uncharacterized noncanonical long loop and bulged structures. We observed a high G4 density in functional regions, such as 5' untranslated regions and splicing sites, as well as in genes previously not predicted to contain these structures (such as BRCA2). G4 formation was significantly associated with oncogenes, tumor suppressors and somatic copy number alterations related to cancer development. The G4s identified in this study may therefore represent promising targets for cancer intervention.

Journal ArticleDOI
TL;DR: In this paper, an adeno-associated viral (AAV)-associated endonuclease (Cas)9 was used to edit single or multiple genes in replicating eukaryotic cells, resulting in frame-shifting insertion/deletion (indel) mutations and subsequent protein depletion.
Abstract: Probing gene function in the mammalian brain can be greatly assisted with methods to manipulate the genome of neurons in vivo. The clustered, regularly interspaced, short palindromic repeats (CRISPR)-associated endonuclease (Cas)9 from Streptococcus pyogenes (SpCas9)1 can be used to edit single or multiple genes in replicating eukaryotic cells, resulting in frame-shifting insertion/deletion (indel) mutations and subsequent protein depletion. Here, we delivered SpCas9 and guide RNAs using adeno-associated viral (AAV) vectors to target single (Mecp2) as well as multiple genes (Dnmt1, Dnmt3a and Dnmt3b) in the adult mouse brain in vivo. We characterized the effects of genome modifications in postmitotic neurons using biochemical, genetic, electrophysiological and behavioral readouts. Our results demonstrate that AAV-mediated SpCas9 genome editing can enable reverse genetic studies of gene function in the brain.

Journal ArticleDOI
24 Apr 2015-PLOS ONE
TL;DR: CCTop provides the bench biologist with a tool for the rapid and efficient identification of high quality target sites and was experimentally validated for gene inactivation, non-homologous end-joining as well as homology directed repair.
Abstract: Engineering of the CRISPR/Cas9 system has opened a plethora of new opportunities for site-directed mutagenesis and targeted genome modification. Fundamental to this is a stretch of twenty nucleotides at the 5’ end of a guide RNA that provides specificity to the bound Cas9 endonuclease. Since a sequence of twenty nucleotides can occur multiple times in a given genome and some mismatches seem to be accepted by the CRISPR/Cas9 complex, an efficient and reliable in silico selection and evaluation of the targeting site is key prerequisite for the experimental success. Here we present the CRISPR/Cas9 target online predictor (CCTop, http://crispr.cos.uni-heidelberg.de) to overcome limitations of already available tools. CCTop provides an intuitive user interface with reasonable default parameters that can easily be tuned by the user. From a given query sequence, CCTop identifies and ranks all candidate sgRNA target sites according to their off-target quality and displays full documentation. CCTop was experimentally validated for gene inactivation, non-homologous end-joining as well as homology directed repair. Thus, CCTop provides the bench biologist with a tool for the rapid and efficient identification of high quality target sites.