scispace - formally typeset
Search or ask a question

Showing papers on "Genome published in 2018"


Journal ArticleDOI
TL;DR: FastANI is developed, a method to compute ANI using alignment-free approximate sequence mapping, and it is shown 95% ANI is an accurate threshold for demarcating prokaryotic species by analyzing about 90,000 proKaryotic genomes.
Abstract: A fundamental question in microbiology is whether there is continuum of genetic diversity among genomes, or clear species boundaries prevail instead. Whole-genome similarity metrics such as Average Nucleotide Identity (ANI) help address this question by facilitating high resolution taxonomic analysis of thousands of genomes from diverse phylogenetic lineages. To scale to available genomes and beyond, we present FastANI, a new method to estimate ANI using alignment-free approximate sequence mapping. FastANI is accurate for both finished and draft genomes, and is up to three orders of magnitude faster compared to alignment-based approaches. We leverage FastANI to compute pairwise ANI values among all prokaryotic genomes available in the NCBI database. Our results reveal clear genetic discontinuity, with 99.8% of the total 8 billion genome pairs analyzed conforming to >95% intra-species and <83% inter-species ANI values. This discontinuity is manifested with or without the most frequently sequenced species, and is robust to historic additions in the genome databases. Average Nucleotide Identity (ANI) is a robust and useful measure to gauge genetic relatedness between two genomes. Here, the authors develop FastANI, a method to compute ANI using alignment-free approximate sequence mapping, and show 95% ANI is an accurate threshold for demarcating prokaryotic species by analyzing about 90,000 prokaryotic genomes.

2,176 citations


Journal ArticleDOI
Rudi Appels1, Rudi Appels2, Kellye Eversole, Nils Stein3  +204 moreInstitutions (45)
17 Aug 2018-Science
TL;DR: This annotated reference sequence of wheat is a resource that can now drive disruptive innovation in wheat improvement, as this community resource establishes the foundation for accelerating wheat research and application through improved understanding of wheat biology and genomics-assisted breeding.
Abstract: An annotated reference sequence representing the hexaploid bread wheat genome in 21 pseudomolecules has been analyzed to identify the distribution and genomic context of coding and noncoding elements across the A, B, and D subgenomes. With an estimated coverage of 94% of the genome and containing 107,891 high-confidence gene models, this assembly enabled the discovery of tissue- and developmental stage-related coexpression networks by providing a transcriptome atlas representing major stages of wheat development. Dynamics of complex gene families involved in environmental adaptation and end-use quality were revealed at subgenome resolution and contextualized to known agronomic single-gene or quantitative trait loci. This community resource establishes the foundation for accelerating wheat research and application through improved understanding of wheat biology and genomics-assisted breeding.

2,118 citations


Journal ArticleDOI
TL;DR: The minimal standards for the quality of genome sequences and how they can be applied for taxonomic purposes are described.
Abstract: Advancement of DNA sequencing technology allows the routine use of genome sequences in the various fields of microbiology. The information held in genome sequences proved to provide objective and reliable means in the taxonomy of prokaryotes. Here, we describe the minimal standards for the quality of genome sequences and how they can be applied for taxonomic purposes.

1,908 citations


Journal ArticleDOI
TL;DR: Two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences are presented, including the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum information about a Metagenome-Assembled Genomes (MIMAG), including estimates of genome completeness and contamination.
Abstract: We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

1,171 citations


Journal ArticleDOI
TL;DR: MUMmer4 is described, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of Mummer to a 48- bit suffix array, and that offers improved speed through parallel processing of input query sequences.
Abstract: The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.

1,131 citations


Journal ArticleDOI
TL;DR: CRISPOR tries to provide a comprehensive solution from selection, cloning and expression of guide RNA as well as providing primers needed for testing guide activity and potential off-targets.
Abstract: CRISPOR.org is a web tool for genome editing experiments with the CRISPR-Cas9 system. It finds guide RNAs in an input sequence and ranks them according to different scores that evaluate potential off-targets in the genome of interest and predict on-target activity. The list of genomes is continuously expanded, with more 150 genomes added in the last two years. CRISPOR tries to provide a comprehensive solution from selection, cloning and expression of guide RNA as well as providing primers needed for testing guide activity and potential off-targets. Recent developments include batch design for genome-wide CRISPR and saturation screens, creating custom oligonucleotides for guide cloning and the design of next generation sequencing primers to test for off-target mutations. CRISPOR is available from http://crispor.org, including the full source code of the website and a stand-alone, command-line version.

864 citations


Journal ArticleDOI
TL;DR: Long intergenic non-coding RNA genes have diverse features that distinguish them from mRNA-encoding genes and exercise functions such as remodelling chromatin and genome architecture, RNA stabilization and transcription regulation, including enhancer-associated activity.
Abstract: Long intergenic non-coding RNA (lincRNA) genes have diverse features that distinguish them from mRNA-encoding genes and exercise functions such as remodelling chromatin and genome architecture, RNA stabilization and transcription regulation, including enhancer-associated activity. Some genes currently annotated as encoding lincRNAs include small open reading frames (smORFs) and encode functional peptides and thus may be more properly classified as coding RNAs. lincRNAs may broadly serve to fine-tune the expression of neighbouring genes with remarkable tissue specificity through a diversity of mechanisms, highlighting our rapidly evolving understanding of the non-coding genome.

829 citations


Journal ArticleDOI
26 Oct 2018-Science
TL;DR: These chromatin accessibility profiles identify cancer- and tissue-specific DNA regulatory elements that enable classification of tumor subtypes with newly recognized prognostic importance, and identify distinct TF activities in cancer based on differences in the inferred patterns of TF-DNA interaction and gene expression.
Abstract: INTRODUCTION Cancer is one of the leading causes of death worldwide. Although the 2% of the human genome that encodes proteins has been extensively studied, much remains to be learned about the noncoding genome and gene regulation in cancer. Genes are turned on and off in the proper cell types and cell states by transcription factor (TF) proteins acting on DNA regulatory elements that are scattered over the vast noncoding genome and exert long-range influences. The Cancer Genome Atlas (TCGA) is a global consortium that aims to accelerate the understanding of the molecular basis of cancer. TCGA has systematically collected DNA mutation, methylation, RNA expression, and other comprehensive datasets from primary human cancer tissue. TCGA has served as an invaluable resource for the identification of genomic aberrations, altered transcriptional networks, and cancer subtypes. Nonetheless, the gene regulatory landscapes of these tumors have largely been inferred through indirect means. RATIONALE A hallmark of active DNA regulatory elements is chromatin accessibility. Eukaryotic genomes are compacted in chromatin, a complex of DNA and proteins, and only the active regulatory elements are accessible by the cell’s machinery such as TFs. The assay for transposase-accessible chromatin using sequencing (ATAC-seq) quantifies DNA accessibility through the use of transposase enzymes that insert sequencing adapters at these accessible chromatin sites. ATAC-seq enables the genome-wide profiling of TF binding events that orchestrate gene expression programs and give a cell its identity. RESULTS We generated high-quality ATAC-seq data in 410 tumor samples from TCGA, identifying diverse regulatory landscapes across 23 cancer types. These chromatin accessibility profiles identify cancer- and tissue-specific DNA regulatory elements that enable classification of tumor subtypes with newly recognized prognostic importance. We identify distinct TF activities in cancer based on differences in the inferred patterns of TF-DNA interaction and gene expression. Genome-wide correlation of gene expression and chromatin accessibility predicts tens of thousands of putative interactions between distal regulatory elements and gene promoters, including key oncogenes and targets in cancer immunotherapy, such as MYC , SRC , BCL2 , and PDL1 . Moreover, these regulatory interactions inform known genetic risk loci linked to cancer predisposition, nominating biochemical mechanisms and target genes for many cancer-linked genetic variants. Lastly, integration with mutation profiling by whole-genome sequencing identifies cancer-relevant noncoding mutations that are associated with altered gene expression. A single-base mutation located 12 kilobases upstream of the FGD4 gene, a regulator of the actin cytoskeleton, generates a putative de novo binding site for an NKX TF and is associated with an increase in chromatin accessibility and a concomitant increase in FGD4 gene expression. CONCLUSION The accessible genome of primary human cancers provides a wealth of information on the susceptibility, mechanisms, prognosis, and potential therapeutic strategies of diverse cancer types. Prediction of interactions between DNA regulatory elements and gene promoters sets the stage for future integrative gene regulatory network analyses. The discovery of hundreds of noncoding somatic mutations that exhibit allele-specific regulatory effects suggests a pervasive mechanism for cancer cells to manipulate gene expression and increase cellular fitness. These data may serve as a foundational resource for the cancer research community.

774 citations


Journal ArticleDOI
TL;DR: FastQ Screen is a tool to validate the origin of DNA samples by quantifying the proportion of reads that map to a panel of reference genomes and is intended to be used routinely as a quality control measure and for analysing samples in which theorigin of the DNA is uncertain or has multiple sources.
Abstract: DNA sequencing analysis typically involves mapping reads to just one reference genome. Mapping against multiple genomes is necessary, however, when the genome of origin requires confirmation. Mapping against multiple genomes is also advisable for detecting contamination or for identifying sample swaps which, if left undetected, may lead to incorrect experimental conclusions. Consequently, we present FastQ Screen, a tool to validate the origin of DNA samples by quantifying the proportion of reads that map to a panel of reference genomes. FastQ Screen is intended to be used routinely as a quality control measure and for analysing samples in which the origin of the DNA is uncertain or has multiple sources.

738 citations


Journal ArticleDOI
11 Apr 2018-Nature
TL;DR: Whole-genome sequencing and phenotyping of 1,011 natural isolates of the yeast Saccharomyces cerevisiae reveal its evolutionary history, including a single out-of-China origin and multiple domestication events, and provides a framework for genotype–phenotype studies in this model organism.
Abstract: Large-scale population genomic surveys are essential to explore the phenotypic diversity of natural populations. Here we report the whole-genome sequencing and phenotyping of 1,011 Saccharomyces cerevisiae isolates, which together provide an accurate evolutionary picture of the genomic variants that shape the species-wide phenotypic landscape of this yeast. Genomic analyses support a single ‘out-of-China’ origin for this species, followed by several independent domestication events. Although domesticated isolates exhibit high variation in ploidy, aneuploidy and genome content, genome evolution in wild isolates is mainly driven by the accumulation of single nucleotide polymorphisms. A common feature is the extensive loss of heterozygosity, which represents an essential source of inter-individual variation in this mainly asexual species. Most of the single nucleotide polymorphisms, including experimentally identified functional polymorphisms, are present at very low frequencies. The largest numbers of variants identified by genome-wide association are copy-number changes, which have a greater phenotypic effect than do single nucleotide polymorphisms. This resource will guide future population genomics and genotype–phenotype studies in this classic model system. Whole-genome sequencing of 1,011 natural isolates of the yeast Saccharomyces cerevisiae reveals its evolutionary history, including a single out-of-China origin and multiple domestication events, and provides a framework for genotype–phenotype studies in this model organism.

727 citations


Journal ArticleDOI
Alison M. Taylor1, Alison M. Taylor2, Juliann Shih1, Gavin Ha2  +729 moreInstitutions (4)
TL;DR: The genomic and phenotypic correlates of cancer aneuploidy are defined and genome engineering is applied to delete 3p in lung cells, causing decreased proliferation rescued in part by chromosome 3 duplication.

Journal ArticleDOI
04 May 2018-Science
TL;DR: Saturation-scale mutagenesis allows prioritization of intervention targets in the genome of the most important cause of malaria, and confirms the proteasome-degradation pathway is a high-value druggable target.
Abstract: INTRODUCTION Malaria remains a devastating global parasitic disease, with the majority of malaria deaths caused by the highly virulent Plasmodium falciparum . The extreme AT-bias of the P. falciparum genome has hampered genetic studies through targeted approaches such as homologous recombination or CRISPR-Cas9, and only a few hundred P. falciparum mutants have been experimentally generated in the past decades. In this study, we have used high-throughput piggyBac transposon insertional mutagenesis and quantitative insertion site sequencing (QIseq) to reach saturation-level mutagenesis of this parasite. RATIONALE Our study exploits the AT-richness of the P. falciparum genome, which provides numerous piggyBac transposon insertion targets within both gene coding and noncoding flanking sequences, to generate more than 38,000 P. falciparum mutants. At this level of mutagenesis, we could distinguish essential genes as nonmutable and dispensable genes as mutable. Subsequently, we identified 2680 genes essential for in vitro asexual blood-stage growth. RESULTS We calculated mutagenesis index scores (MISs) and mutagenesis fitness scores (MFSs) in order to functionally define the relative fitness cost of disruption for 5399 genes. A competitive growth phenotype screen confirmed that MIS and MFS were predictive of the fitness cost for in vitro asexual growth. Genes predicted to be essential included genes implicated in drug resistance—such as the “ K13 ” Kelch propeller, mdr , and dhfr-ts —as well as targets considered to be high value for drugs development, such as pkg and cdpk5 . The screen revealed essential genes that are specific to human Plasmodium parasites but absent from rodent-infective species, such as lipid metabolic genes that may be crucial to transmission commitment in human infections. MIS and MFS profiling provides a clear ranking of the relative essentiality of gene ontology (GO) functions in P. falciparum . GO pathways associated with translation, RNA metabolism, and cell cycle control are more essential, whereas genes associated with protein phosphorylation, virulence factors, and transcription are more likely to be dispensable. Last, we confirm that the proteasome-degradation pathway is a high-value druggable target on the basis of its high ratio of essential to dispensable genes, and by functionally confirming its link to the mode of action of artemisinin, the current front-line antimalarial. CONCLUSION Saturation-scale mutagenesis allows prioritization of intervention targets in the genome of the most important cause of malaria. The identification of more than 2680 essential genes, including ~1000 Plasmodium -conserved essential genes, will be valuable for antimalarial therapeutic research.

Journal ArticleDOI
TL;DR: A high-quality genome assembly of Camellia sinensis var.
Abstract: Tea, one of the world’s most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties.

Journal ArticleDOI
TL;DR: Purge Haplotigs improves the haploid and diploid representations of third-gen sequencing based genome assemblies by identifying and reassigning allelic contigs and is less likely to over-purge repetitive or paralogous elements compared to alignment-only based methods.
Abstract: Recent developments in third-gen long read sequencing and diploid-aware assemblers have resulted in the rapid release of numerous reference-quality assemblies for diploid genomes. However, assembly of highly heterozygous genomes is still problematic when regional heterogeneity is so high that haplotype homology is not recognised during assembly. This results in regional duplication rather than consolidation into allelic variants and can cause issues with downstream analysis, for example variant discovery, or haplotype reconstruction using the diploid assembly with unpaired allelic contigs. A new pipeline—Purge Haplotigs—was developed specifically for third-gen sequencing-based assemblies to automate the reassignment of allelic contigs, and to assist in the manual curation of genome assemblies. The pipeline uses a draft haplotype-fused assembly or a diploid assembly, read alignments, and repeat annotations to identify allelic variants in the primary assembly. The pipeline was tested on a simulated dataset and on four recent diploid (phased) de novo assemblies from third-generation long-read sequencing, and compared with a similar tool. After processing with Purge Haplotigs, haploid assemblies were less duplicated with minimal impact on genome completeness, and diploid assemblies had more pairings of allelic contigs. Purge Haplotigs improves the haploid and diploid representations of third-gen sequencing based genome assemblies by identifying and reassigning allelic contigs. The implementation is fast and scales well with large genomes, and it is less likely to over-purge repetitive or paralogous elements compared to alignment-only based methods. The software is available at https://bitbucket.org/mroachawri/purge_haplotigs under a permissive MIT licence.

Journal ArticleDOI
TL;DR: Software to identify high-resolution TAD boundaries and reveal their relationship to underlying DNA motifs is developed and it is demonstrated that boundaries can be accurately predicted using only the motif sequences at open chromatin sites.
Abstract: Despite an abundance of new studies about topologically associating domains (TADs), the role of genetic information in TAD formation is still not fully understood. Here we use our software, HiCExplorer (hicexplorer.readthedocs.io) to annotate >2800 high-resolution (570 bp) TAD boundaries in Drosophila melanogaster. We identify eight DNA motifs enriched at boundaries, including a motif bound by the M1BP protein, and two new boundary motifs. In contrast to mammals, the CTCF motif is only enriched on a small fraction of boundaries flanking inactive chromatin while most active boundaries contain the motifs bound by the M1BP or Beaf-32 proteins. We demonstrate that boundaries can be accurately predicted using only the motif sequences at open chromatin sites. We propose that DNA sequence guides the genome architecture by allocation of boundary proteins in the genome. Finally, we present an interactive online database to access and explore the spatial organization of fly, mouse and human genomes, available at http://chorogenome.ie-freiburg.mpg.de . Although topologically associating domains (TADs) have been extensively investigated, it is not clear to what extent DNA sequence contributes to their formation. Here the authors develop software to identify high-resolution TAD boundaries and reveal their relationship to underlying DNA motifs.

Journal ArticleDOI
TL;DR: The authors review the role of genetic structural variation in disease and the pathogenic potential of changes to the 3D genome.
Abstract: Structural and quantitative chromosomal rearrangements, collectively referred to as structural variation (SV), contribute to a large extent to the genetic diversity of the human genome and thus are of high relevance for cancer genetics, rare diseases and evolutionary genetics. Recent studies have shown that SVs can not only affect gene dosage but also modulate basic mechanisms of gene regulation. SVs can alter the copy number of regulatory elements or modify the 3D genome by disrupting higher-order chromatin organization such as topologically associating domains. As a result of these position effects, SVs can influence the expression of genes distant from the SV breakpoints, thereby causing disease. The impact of SVs on the 3D genome and on gene expression regulation has to be considered when interpreting the pathogenic potential of these variant types.

Journal ArticleDOI
TL;DR: A new visualization tool that is specifically designed for chloroplast genomes is announced that allows the users to depict the genetic architecture of up to ten chlorop last genomes in the vicinity of the sites connecting the inverted repeats to the short and long single copy regions.
Abstract: Motivation Genome plotting is performed using a wide range of visualizations tools each with emphasis on a different informative dimension of the genome. These tools can provide a deeper insight into the genomic structure of the organism. Results Here, we announce a new visualization tool that is specifically designed for chloroplast genomes. It allows the users to depict the genetic architecture of up to ten chloroplast genomes in the vicinity of the sites connecting the inverted repeats to the short and long single copy regions. The software and its dependent libraries are fully coded in R and the reflected plot is scaled up to realistic size of nucleotide base pairs in the vicinity of the junction sites. We introduce a website for easier use of the program and R source code of the software to be used in case of preferences to be changed and integrated into personal pipelines. The input of the program is an annotation GenBank (.gb) file, the accession or GI number of the sequence or a DOGMA output file. The software was tested using over a 100 embryophyte chloroplast genomes and in all cases a reliable output was obtained. Availability and implementation Source codes and the online suit available at https://irscope.shinyapps.io/irapp/ or https://github.com/Limpfrog/irscope.

Journal ArticleDOI
TL;DR: Nicholas K. Hayward*, James S. Wilmott*, Nicola Waddell*, Peter A. Johansson*, Matthew A. Spillane, Robyn P. Lau, Rebecca A. Dagg, Sarah-Jane Schramm, Antonia Pritchard, Ken Dutton-Regester, Felicity Newell, Anna Fitzgerald, Catherine A. Shang, Sean M.

Journal ArticleDOI
TL;DR: A pan-genome dataset of the Oryza sativa–Oryza rufipogon species complex generated through deep sequencing and de novo genome assembly of 66 divergent accessions will be helpful in pinpointing new causal variants underlying complex traits and in promoting evolutionary and functional studies in rice.
Abstract: The rich genetic diversity in Oryza sativa and Oryza rufipogon serves as the main sources in rice breeding. Large-scale resequencing has been undertaken to discover allelic variants in rice, but much of the information for genetic variation is often lost by direct mapping of short sequence reads onto the O. sativa japonica Nipponbare reference genome. Here we constructed a pan-genome dataset of the O. sativa–O. rufipogon species complex through deep sequencing and de novo assembly of 66 divergent accessions. Intergenomic comparisons identified 23 million sequence variants in the rice genome. This catalog of sequence variations includes many known quantitative trait nucleotides and will be helpful in pinpointing new causal variants that underlie complex traits. In particular, we systemically investigated the whole set of coding genes using this pan-genome data, which revealed extensive presence and absence of variation among rice accessions. This pan-genome resource will further promote evolutionary and functional studies in rice. A pan-genome dataset of the Oryza sativa–Oryza rufipogon species complex generated through deep sequencing and de novo genome assembly of 66 divergent accessions will be helpful in pinpointing new causal variants underlying complex traits and in promoting evolutionary and functional studies in rice.

Journal ArticleDOI
16 May 2018-Nature
TL;DR: A large-scale mutagenesis screen identifies mutant phenotypes for over 11,000 protein-coding genes in bacteria that had previously not been assigned a specific function, demonstrating the scalability of microbial genetics and its utility for improving gene annotations.
Abstract: One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because they are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.

Journal ArticleDOI
TL;DR: Application of CRISPR/Cas9 techniques will result in the development of non-genetically modified (Non-GMO) crops with the desired trait that can contribute to increased yield potential under biotic and abiotic stress conditions.
Abstract: The availability of genome sequences for several crops and advances in genome editing approaches has opened up possibilities to breed for almost any given desirable trait. Advancements in genome editing technologies such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) has made it possible for molecular biologists to more precisely target any gene of interest. However, these methodologies are expensive and time-consuming as they involve complicated steps that require protein engineering. Unlike first-generation genome editing tools, CRISPR/Cas9 genome editing involves simple designing and cloning methods, with the same Cas9 being potentially available for use with different guide RNAs targeting multiple sites in the genome. After proof-of-concept demonstrations in crop plants involving the primary CRISPR-Cas9 module, several modified Cas9 cassettes have been utilized in crop plants for improving target specificity and reducing off-target cleavage (eg. Nmcas9, Sacas9, Stcas9). Further, the availability of Cas9 enzymes from additional bacterial species has made available options to enhance specificity and efficiency of gene editing methodologies. This review summarizes the options available to plant biotechnologists to bring about crop improvement using CRISPR/Cas9 based genome editing tools and also presents studies where CRISPR/Cas9 has been used for enhancing biotic and abiotic stress tolerance. Application of these techniques will result in the development of non-genetically modified (Non-GMO) crops with the desired trait that can contribute to increased yield potential under biotic and abiotic stress conditions.

Journal ArticleDOI
Jisen Zhang1, Xingtan Zhang2, Haibao Tang2, Qing Zhang2, Xiuting Hua2, Xiaokai Ma2, Fan Zhu2, Tyler Jones, Xin-Guang Zhu3, John E. Bowers4, Ching Man Wai5, Chunfang Zheng6, Yan Shi2, Shuai Chen2, Xiuming Xu2, Jingjing Yue2, David R. Nelson7, Lixian Huang2, Zhen Li2, Huimin Xu2, Dong Zhou2, Yongjun Wang2, Weichang Hu2, Jishan Lin2, Youjin Deng2, Neha Pandey2, Melina Cristina Mancini2, Dessireé Zerpa2, Julie K. Nguyen2, Liming Wang2, Liang Yu2, Yinghui Xin2, Liangfa Ge2, Jie Arro2, Jennifer Han2, Setu Chakrabarty2, Marija Pushko2, Wenping Zhang2, Yanhong Ma2, Panpan Ma2, Mingju Lv3, Faming Chen8, Guangyong Zheng8, Jingsheng Xu2, Zhenhui Yang2, Fang Deng2, Xuequn Chen2, Zhenyang Liao2, Xunxiao Zhang2, Zhicong Lin2, Hai Lin2, Hansong Yan2, Zheng Kuang2, Weimin Zhong2, Pingping Liang2, Guofeng Wang2, Yuan Yuan2, Jiaxian Shi2, Jinxiang Hou2, Jingxian Lin2, Jingjing Jin, Peijian Cao, Qiaochu Shen2, Qing Jiang2, Ping Zhou2, Yaying Ma2, Xiaodan Zhang2, Rongrong Xu2, Juan Liu2, Yongmei Zhou2, Haifeng Jia2, Qing Ma2, Rui Qi2, Zhiliang Zhang2, Jingping Fang2, Hongkun Fang2, Jinjin Song2, Mengjuan Wang2, Guangrui Dong2, Gang Wang2, Zheng Chen2, Teng Ma2, Hong Liu2, Singha R. Dhungana9, Sarah E. Huss2, Xiping Yang10, Anupma Sharma11, Jhon H. Trujillo, Maria C. Martinez, Matthew E. Hudson2, John J. Riascos, Mary A. Schuler2, Li Qing Chen2, David M. Braun9, Lei Li2, Qingyi Yu11, Jianping Wang1, Jianping Wang10, Kai Wang2, Michael C. Schatz12, David Heckerman13, Marie-Anne Van Sluys14, Glaucia Mendes Souza14, Paul H. Moore, David Sankoff6, Robert VanBuren5, Andrew H. Paterson4, Chifumi Nagai, Ray Ming2, Ray Ming1 
TL;DR: In this article, a haplotype of S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined.
Abstract: Modern sugarcanes are polyploid interspecific hybrids, combining high sugar content from Saccharum officinarum with hardiness, disease resistance and ratooning of Saccharum spontaneum. Sequencing of a haploid S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined. The reduction of basic chromosome number from 10 to 8 in S. spontaneum was caused by fissions of 2 ancestral chromosomes followed by translocations to 4 chromosomes. Surprisingly, 80% of nucleotide binding site-encoding genes associated with disease resistance are located in 4 rearranged chromosomes and 51% of those in rearranged regions. Resequencing of 64 S. spontaneum genomes identified balancing selection in rearranged regions, maintaining their diversity. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. The allele-defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.

Journal ArticleDOI
TL;DR: This work sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize and validated candidates from two sets of plant-associated genes, including one involved in plant colonization and the other serving in microbe–microbe competition between plant and microbe.
Abstract: Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. We sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3,837 bacterial genomes to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. We experimentally validated candidates from two sets of plant-associated genes: one involved in plant colonization, and the other serving in microbe-microbe competition between plant-associated bacteria. We also identified 64 plant-associated protein domains that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. This work expands the genome-based understanding of plant-microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering.

Journal ArticleDOI
24 Jan 2018-Nature
TL;DR: The sequencing and assembly of the 32-gigabase-pair axolotl genome is reported using an approach that combined long-read sequencing, optical mapping and development of a new genome assembler (MARVEL).
Abstract: Salamanders serve as important tetrapod models for developmental, regeneration and evolutionary studies. An extensive molecular toolkit makes the Mexican axolotl (Ambystoma mexicanum) a key representative salamander for molecular investigations. Here we report the sequencing and assembly of the 32-gigabase-pair axolotl genome using an approach that combined long-read sequencing, optical mapping and development of a new genome assembler (MARVEL). We observed a size expansion of introns and intergenic regions, largely attributable to multiplication of long terminal repeat retroelements. We provide evidence that intron size in developmental genes is under constraint and that species-restricted genes may contribute to limb regeneration. The axolotl genome assembly does not contain the essential developmental gene Pax3. However, mutation of the axolotl Pax3 paralogue Pax7 resulted in an axolotl phenotype that was similar to those seen in Pax3-/- and Pax7-/- mutant mice. The axolotl genome provides a rich biological resource for developmental and evolutionary studies.

Journal ArticleDOI
TL;DR: Findings show that evolutionary events based on horizontal gene transfer occur within an ongoing CDI and contribute to the adaptation of the species by the introduction of new genes into the genomes.
Abstract: Clostridioides difficile infections (CDI) have emerged over the past decade causing symptoms that range from mild, antibiotic-associated diarrhea (AAD) to life-threatening toxic megacolon. In this study, we describe a multiple and isochronal (mixed) CDI caused by the isolates DSM 27638, DSM 27639 and DSM 27640 that already initially showed different morphotypes on solid media. The three isolates belonging to the ribotypes (RT) 012 (DSM 27639) and 027 (DSM 27638 and DSM 27640) were phenotypically characterized and high quality closed genome sequences were generated. The genomes were compared with seven reference strains including three strains of the RT 027, two of the RT 017, and one of the RT 078 as well as a multi-resistant RT 012 strain. The analysis of horizontal gene transfer events revealed gene acquisition incidents that sort the strains within the time line of the spread of their RTs within Germany. We could show as well that horizontal gene transfer between the members of different RTs occurred within this multiple infection. In addition, acquisition and exchange of virulence-related features including antibiotic resistance genes were observed. Analysis of the two genomes assigned to RT 027 revealed three single nucleotide polymorphisms (SNPs) and apparently a regional genome modification within the flagellar switch that regulates the fli operon. Our findings show that (i) evolutionary events based on horizontal gene transfer occur within an ongoing CDI and contribute to the adaptation of the species by the introduction of new genes into the genomes, (ii) within a multiple infection of a single patient the exchange of genetic material was responsible for a much higher genome variation than the observed SNPs.

Journal ArticleDOI
TL;DR: Recently, CRISPR/Cas9 has largely overtaken the other genome editing technologies due to the fact that it is easier to design and implement, has a higher success rate, and is more versatile and less expensive.
Abstract: Genome editing technologies have progressed rapidly and become one of the most important genetic tools in the implementation of pathogen resistance in plants. Recent years have witnessed the emergence of site directed modification methods using meganucleases, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindrome repeats (CRISPR)/CRISPR-associated protein 9 (Cas9). Recently, CRISPR/Cas9 has largely overtaken the other genome editing technologies due to the fact that it is easier to design and implement, has a higher success rate, and is more versatile and less expensive. This review focuses on the recent advances in plant protection using CRISPR/Cas9 technology in model plants and crops in response to viral, fungal and bacterial diseases. As regards the achievement of viral disease resistance, the main strategies employed in model species such as Arabidopsis and Nicotiana benthamiana, which include the integration of CRISPR-encoding sequences that target and interfere with the viral genome and the induction of a CRISPR-mediated targeted mutation in the host plant genome, will be discussed. Furthermore, as regards fungal and bacterial disease resistance, the strategies based on CRISPR/Cas9 targeted modification of susceptibility genes in crop species such as rice, tomato, wheat, and citrus will be reviewed. After spending years deciphering and reading genomes, researchers are now editing and rewriting them to develop crop plants resistant to specific pests and pathogens.

Journal ArticleDOI
TL;DR: The Cancer Genome Interpreter is presented, a versatile platform that automates the interpretation of newly sequenced cancer genomes, annotating the potential of alterations detected in tumors to act as drivers and their possible effect on treatment response.
Abstract: While tumor genome sequencing has become widely available in clinical and research settings, the interpretation of tumor somatic variants remains an important bottleneck. Here we present the Cancer Genome Interpreter, a versatile platform that automates the interpretation of newly sequenced cancer genomes, annotating the potential of alterations detected in tumors to act as drivers and their possible effect on treatment response. The results are organized in different levels of evidence according to current knowledge, which we envision can support a broad range of oncology use cases. The resource is publicly available at http://www.cancergenomeinterpreter.org .

Journal ArticleDOI
TL;DR: The generation and characterization of an antibody fragment (iMab) is reported that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i- Motifs in the nuclei of human cells and providing evidence that i-Motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions.
Abstract: Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.

Journal ArticleDOI
TL;DR: The Microbial Genomes Atlas (MiGA) is a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity concepts.
Abstract: The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called ‘Clade Project’) to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.

Journal ArticleDOI
TL;DR: Next-generation sequencing technologies have enabled the comparison of editomes from multiple individuals and from multiple species and the results have changed the understanding of the extent and distribution of A-to-I editing and its role in evolution and disease.
Abstract: Modifications of RNA affect its function and stability. RNA editing is unique among these modifications because it not only alters the cellular fate of RNA molecules but also alters their sequence relative to the genome. The most common type of RNA editing is A-to-I editing by double-stranded RNA-specific adenosine deaminase (ADAR) enzymes. Recent transcriptomic studies have identified a number of 'recoding' sites at which A-to-I editing results in non-synonymous substitutions in protein-coding sequences. Many of these recoding sites are conserved within (but not usually across) lineages, are under positive selection and have functional and evolutionary importance. However, systematic mapping of the editome across the animal kingdom has revealed that most A-to-I editing sites are located within mobile elements in non-coding parts of the genome. Editing of these non-coding sites is thought to have a critical role in protecting against activation of innate immunity by self-transcripts. Both recoding and non-coding events have implications for genome evolution and, when deregulated, may lead to disease. Finally, ADARs are now being adapted for RNA engineering purposes.