scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2017"


Journal ArticleDOI
TL;DR: This work presents xCell, a novel gene signature-based method, and uses it to infer 64 immune and stromal cell types and shows that xCell outperforms other methods.
Abstract: Tissues are complex milieus consisting of numerous cell types. Several recent methods have attempted to enumerate cell subsets from transcriptomes. However, the available methods have used limited sources for training and give only a partial portrayal of the full cellular landscape. Here we present xCell, a novel gene signature-based method, and use it to infer 64 immune and stromal cell types. We harmonized 1822 pure human cell type transcriptomes from various sources and employed a curve fitting approach for linear comparison of cell types and introduced a novel spillover compensation technique for separating them. Using extensive in silico analyses and comparison to cytometry immunophenotyping, we show that xCell outperforms other methods. xCell is available at http://xCell.ucsf.edu/ .

2,040 citations


Journal ArticleDOI
TL;DR: This review provides an overview of omics technologies and methods for their integration across multiple omics layers and offers the opportunity to understand the flow of information that underlies disease.
Abstract: High-throughput technologies have revolutionized medical research. The advent of genotyping arrays enabled large-scale genome-wide association studies and methods for examining global transcript levels, which gave rise to the field of "integrative genetics". Other omics technologies, such as proteomics and metabolomics, are now often incorporated into the everyday methodology of biological researchers. In this review, we provide an overview of such omics technologies and focus on methods for their integration across multiple omics layers. As compared to studies of a single omics type, multi-omics offers the opportunity to understand the flow of information that underlies disease.

1,307 citations


Journal ArticleDOI
TL;DR: The functional interactions that lncRNAs establish with other molecules as well as the relationship between lncRNA transcription and function are discussed and some mechanisms are shared with other types of genes.
Abstract: A major shift in our understanding of genome regulation has emerged recently. It is now apparent that the majority of cellular transcripts do not code for proteins, and many of them are long noncoding RNAs (lncRNAs). Increasingly, studies suggest that lncRNAs regulate gene expression through diverse mechanisms. We review emerging mechanistic views of lncRNAs in gene regulation in the cell nucleus. We discuss the functional interactions that lncRNAs establish with other molecules as well as the relationship between lncRNA transcription and function. While some of these mechanisms are specific to lncRNAs, others might be shared with other types of genes.

734 citations


Journal ArticleDOI
TL;DR: The Splatter Bioconductor package is presented for simple, reproducible, and well-documented simulation of scRNA-seq data and provides an interface to multiple simulation methods including Splatter, the authors' own simulation, based on a gamma-Poisson distribution.
Abstract: As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.

568 citations


Journal ArticleDOI
TL;DR: It is shown that the generalist aphid pest M. persicae is able to colonise diverse host plant species in the absence of genetic specialisation through rapid transcriptional plasticity of genes that have duplicated during aphid evolution.
Abstract: The prevailing paradigm of host-parasite evolution is that arms races lead to increasing specialisation via genetic adaptation. Insect herbivores are no exception and the majority have evolved to colonise a small number of closely related host species. Remarkably, the green peach aphid, Myzus persicae, colonises plant species across 40 families and single M. persicae clonal lineages can colonise distantly related plants. This remarkable ability makes M. persicae a highly destructive pest of many important crop species. To investigate the exceptional phenotypic plasticity of M. persicae, we sequenced the M. persicae genome and assessed how one clonal lineage responds to host plant species of different families. We show that genetically identical individuals are able to colonise distantly related host species through the differential regulation of genes belonging to aphid-expanded gene families. Multigene clusters collectively upregulate in single aphids within two days upon host switch. Furthermore, we demonstrate the functional significance of this rapid transcriptional change using RNA interference (RNAi)-mediated knock-down of genes belonging to the cathepsin B gene family. Knock-down of cathepsin B genes reduced aphid fitness, but only on the host that induced upregulation of these genes. Previous research has focused on the role of genetic adaptation of parasites to their hosts. Here we show that the generalist aphid pest M. persicae is able to colonise diverse host plant species in the absence of genetic specialisation. This is achieved through rapid transcriptional plasticity of genes that have duplicated during aphid evolution.

538 citations


Journal ArticleDOI
TL;DR: In this review, both the involvement of chromatin in stress responses and the current evidence on somatic, intergenerational, and transgenerational stress memory are discussed.
Abstract: Plants frequently have to weather both biotic and abiotic stressors, and have evolved sophisticated adaptation and defense mechanisms. In recent years, chromatin modifications, nucleosome positioning, and DNA methylation have been recognized as important components in these adaptations. Given their potential epigenetic nature, such modifications may provide a mechanistic basis for a stress memory, enabling plants to respond more efficiently to recurring stress or even to prepare their offspring for potential future assaults. In this review, we discuss both the involvement of chromatin in stress responses and the current evidence on somatic, intergenerational, and transgenerational stress memory.

443 citations


Journal ArticleDOI
TL;DR: It is concluded that blood-derived TAMs significantly infiltrate pre-treatment gliomas, to a degree that varies by glioma subtype and tumor compartment, and a novel signature that distinguishes TAMs by ontogeny in humangliomas is presented.
Abstract: Tumor-associated macrophages (TAMs) are abundant in gliomas and immunosuppressive TAMs are a barrier to emerging immunotherapies. It is unknown to what extent macrophages derived from peripheral blood adopt the phenotype of brain-resident microglia in pre-treatment gliomas. The relative proportions of blood-derived macrophages and microglia have been poorly quantified in clinical samples due to a paucity of markers that distinguish these cell types in malignant tissue. We perform single-cell RNA-sequencing of human gliomas and identify phenotypic differences in TAMs of distinct lineages. We isolate TAMs from patient biopsies and compare them with macrophages from non-malignant human tissue, glioma atlases, and murine glioma models. We present a novel signature that distinguishes TAMs by ontogeny in human gliomas. Blood-derived TAMs upregulate immunosuppressive cytokines and show an altered metabolism compared to microglial TAMs. They are also enriched in perivascular and necrotic regions. The gene signature of blood-derived TAMs, but not microglial TAMs, correlates with significantly inferior survival in low-grade glioma. Surprisingly, TAMs frequently co-express canonical pro-inflammatory (M1) and alternatively activated (M2) genes in individual cells. We conclude that blood-derived TAMs significantly infiltrate pre-treatment gliomas, to a degree that varies by glioma subtype and tumor compartment. Blood-derived TAMs do not universally conform to the phenotype of microglia, but preferentially express immunosuppressive cytokines and show an altered metabolism. Our results argue against status quo therapeutic strategies that target TAMs indiscriminately and in favor of strategies that specifically target immunosuppressive blood-derived TAMs.

425 citations


Journal ArticleDOI
TL;DR: This work introduces CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner.
Abstract: Most existing dimensionality reduction and clustering packages for single-cell RNA-seq (scRNA-seq) data deal with dropouts by heavy modeling and computational machinery. Here, we introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner. Using a range of simulated and real data, we show that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA, and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds when processing a data set of hundreds of cells and minutes for a data set of thousands of cells. CIDR can be downloaded at https://github.com/VCCRI/CIDR .

397 citations


Journal ArticleDOI
TL;DR: DeepCpG, a computational approach based on deep neural networks to predict methylation states in single cells, yields substantially more accurate predictions than previous methods and can be interpreted, thereby providing insights into how sequence composition affects methylation variability.
Abstract: Recent technological advances have enabled DNA methylation to be assayed at single-cell resolution. However, current protocols are limited by incomplete CpG coverage and hence methods to predict missing methylation states are critical to enable genome-wide analyses. We report DeepCpG, a computational approach based on deep neural networks to predict methylation states in single cells. We evaluate DeepCpG on single-cell methylation data from five cell types generated using alternative sequencing protocols. DeepCpG yields substantially more accurate predictions than previous methods. Additionally, we show that the model parameters can be interpreted, thereby providing insights into how sequence composition affects methylation variability.

388 citations


Journal ArticleDOI
TL;DR: Advances in genetic screens, imputation, and analyses of non-additive and epistatic effects have contributed to a better understanding of the shared and specific roles of MHC variants in different diseases.
Abstract: In the past 50 years, variants in the major histocompatibility complex (MHC) locus, also known as the human leukocyte antigen (HLA), have been reported as major risk factors for complex diseases. Recent advances, including large genetic screens, imputation, and analyses of non-additive and epistatic effects, have contributed to a better understanding of the shared and specific roles of MHC variants in different diseases. We review these advances and discuss the relationships between MHC variants involved in autoimmune and infectious diseases. Further work in this area will help to distinguish between alternative hypotheses for the role of pathogens in autoimmune disease development.

379 citations


Journal ArticleDOI
TL;DR: This study provides insights into the genetic correlation among complex traits and will facilitate future soybean functional studies and breeding through molecular design.
Abstract: Soybean (Glycine max [L.] Merr.) is one of the most important oil and protein crops. Ever-increasing soybean consumption necessitates the improvement of varieties for more efficient production. However, both correlations among different traits and genetic interactions among genes that affect a single trait pose a challenge to soybean breeding. To understand the genetic networks underlying phenotypic correlations, we collected 809 soybean accessions worldwide and phenotyped them for two years at three locations for 84 agronomic traits. Genome-wide association studies identified 245 significant genetic loci, among which 95 genetically interacted with other loci. We determined that 14 oil synthesis-related genes are responsible for fatty acid accumulation in soybean and function in line with an additive model. Network analyses demonstrated that 51 traits could be linked through the linkage disequilibrium of 115 associated loci and these links reflect phenotypic correlations. We revealed that 23 loci, including the known Dt1, E2, E1, Ln, Dt2, Fan, and Fap loci, as well as 16 undefined associated loci, have pleiotropic effects on different traits. This study provides insights into the genetic correlation among complex traits and will facilitate future soybean functional studies and breeding through molecular design.

Journal ArticleDOI
TL;DR: This work provides a guide to the currently available alignment-free sequence analysis tools and addresses questions about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research.
Abstract: Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.

Journal ArticleDOI
TL;DR: It is shown that a double cut HDR donor, which is flanked by single guide RNA (sgRNA)-PAM sequences and is released after CRISPR/Cas9 cleavage, increases HDR efficiency by twofold to fivefold relative to circular plasmid donors.
Abstract: Precise genome editing via homology-directed repair (HDR) after double-stranded DNA (dsDNA) cleavage facilitates functional genomic research and holds promise for gene therapy. However, HDR efficiency remains low in some cell types, including some of great research and clinical interest, such as human induced pluripotent stem cells (iPSCs). Here, we show that a double cut HDR donor, which is flanked by single guide RNA (sgRNA)-PAM sequences and is released after CRISPR/Cas9 cleavage, increases HDR efficiency by twofold to fivefold relative to circular plasmid donors at one genomic locus in 293 T cells and two distinct genomic loci in iPSCs. We find that a 600 bp homology in both arms leads to high-level genome knockin, with 97–100% of the donor insertion events being mediated by HDR. The combined use of CCND1, a cyclin that functions in G1/S transition, and nocodazole, a G2/M phase synchronizer, doubles HDR efficiency to up to 30% in iPSCs. Taken together, these findings provide guidance for the design of HDR donor vectors and the selection of HDR-enhancing factors for applications in genome research and precision medicine.

Journal ArticleDOI
TL;DR: Easi-CRISPR solves the major problem of animal genome engineering, namely the inefficiency of targeted DNA cassette insertion, as treating an average of only 50 zygotes is sufficient to produce a correctly targeted allele in up to 100% of live offspring.
Abstract: Conditional knockout mice and transgenic mice expressing recombinases, reporters, and inducible transcriptional activators are key for many genetic studies and comprise over 90% of mouse models created. Conditional knockout mice are generated using labor-intensive methods of homologous recombination in embryonic stem cells and are available for only ~25% of all mouse genes. Transgenic mice generated by random genomic insertion approaches pose problems of unreliable expression, and thus there is a need for targeted-insertion models. Although CRISPR-based strategies were reported to create conditional and targeted-insertion alleles via one-step delivery of targeting components directly to zygotes, these strategies are quite inefficient. Here we describe Easi-CRISPR (Efficient additions with ssDNA inserts-CRISPR), a targeting strategy in which long single-stranded DNA donors are injected with pre-assembled crRNA + tracrRNA + Cas9 ribonucleoprotein (ctRNP) complexes into mouse zygotes. We show for over a dozen loci that Easi-CRISPR generates correctly targeted conditional and insertion alleles in 8.5–100% of the resulting live offspring. Easi-CRISPR solves the major problem of animal genome engineering, namely the inefficiency of targeted DNA cassette insertion. The approach is robust, succeeding for all tested loci. It is versatile, generating both conditional and targeted insertion alleles. Finally, it is highly efficient, as treating an average of only 50 zygotes is sufficient to produce a correctly targeted allele in up to 100% of live offspring. Thus, Easi-CRISPR offers a comprehensive means of building large-scale Cre-LoxP animal resources.

Journal ArticleDOI
Ronald P. de Vries1, Robert Riley2, Ad Wiebenga1, Guillermo Aguilar-Osorio3, Sotiris Amillis4, Cristiane Uchima, Gregor Anderluh, Mojtaba Asadollahi5, Marion Askin6, Marion Askin7, Kerrie Barry2, Evy Battaglia1, Özgür Bayram8, Özgür Bayram9, Tiziano Benocci1, Susanna A. Braus-Stromeyer8, Camila Caldana, David Cánovas10, David Cánovas11, Gustavo C. Cerqueira12, Fusheng Chen13, Wanping Chen13, Cindy Choi2, Alicia Clum2, Renato Augusto Corrêa dos Santos, André Damasio14, George Diallinas4, Tamás Emri5, Erzsébet Fekete5, Michel Flipphi5, Susanne Freyberg8, Antonia Gallo15, Christos Gournas16, Rob Habgood17, Matthieu Hainaut18, María Harispe19, Bernard Henrissat18, Bernard Henrissat20, Bernard Henrissat21, Kristiina Hildén22, Ryan Hope17, Abeer Hossain23, Eugenia Karabika24, Eugenia Karabika25, Levente Karaffa5, Zsolt Karányi5, Nada Kraševec, Alan Kuo2, Harald Kusch8, Kurt LaButti2, Ellen Lagendijk6, Alla Lapidus26, Alla Lapidus2, Anthony Levasseur18, Erika Lindquist2, Anna Lipzen2, Antonio F. Logrieco15, Andrew MacCabe27, Miia R. Mäkelä22, Iran Malavazi28, Petter Melin29, Vera Meyer30, Natalia Mielnichuk10, Natalia Mielnichuk31, Márton Miskei5, Ákos Molnár5, Giuseppina Mulè15, Chew Yee Ngan2, Margarita Orejas27, Erzsébet Orosz5, Erzsébet Orosz1, Jean Paul Ouedraogo6, Jean Paul Ouedraogo32, Karin M. Overkamp, Hee-Soo Park33, Giancarlo Perrone15, François Piumi21, François Piumi18, Peter J. Punt6, Arthur F. J. Ram6, Ana Ramón34, Stefan Rauscher35, Eric Record18, Diego Mauricio Riaño-Pachón, Vincent Robert1, Julian Röhrig35, Roberto Ruller, Asaf Salamov2, Nadhira Salih17, Nadhira Salih36, Rob Samson1, Erzsébet Sándor5, Manuel Sanguinetti34, Tabea Schütze6, Tabea Schütze30, Kristina Sepčić37, Ekaterina Shelest38, Gavin Sherlock39, Vicky Sophianopoulou, Fabio M. Squina, Hui Sun2, Antonia Susca15, Richard B. Todd40, Adrian Tsang32, Shiela E. Unkles24, Nathalie van de Wiele1, Diana van Rossen-Uffink6, Juliana Velasco de Castro Oliveira, Tammi Camilla Vesth41, Jaap Visser1, Jae-Hyuk Yu42, Miaomiao Zhou1, Mikael Rørdam Andersen41, David B. Archer17, Scott E. Baker43, Isabelle Benoit32, Isabelle Benoit1, Axel A. Brakhage44, Gerhard H. Braus8, Reinhard Fischer35, Jens Christian Frisvad41, Gustavo H. Goldman45, Jos Houbraken1, Berl R. Oakley46, István Pócsi5, Claudio Scazzocchio47, Claudio Scazzocchio48, Bernhard Seiboth49, Patricia A. vanKuyk6, Patricia A. vanKuyk1, Jennifer R. Wortman12, Paul S. Dyer17, Igor V. Grigoriev2 
Utrecht University1, United States Department of Energy2, National Autonomous University of Mexico3, National and Kapodistrian University of Athens4, University of Debrecen5, Leiden University6, Commonwealth Scientific and Industrial Research Organisation7, University of Göttingen8, Maynooth University9, University of Seville10, University of Natural Resources and Life Sciences, Vienna11, Broad Institute12, Huazhong Agricultural University13, State University of Campinas14, International Sleep Products Association15, Université libre de Bruxelles16, University of Nottingham17, Aix-Marseille University18, Pasteur Institute19, King Abdulaziz University20, Institut national de la recherche agronomique21, University of Helsinki22, University of Amsterdam23, University of St Andrews24, University of Ioannina25, Saint Petersburg State University26, Spanish National Research Council27, Federal University of São Carlos28, Swedish University of Agricultural Sciences29, Technical University of Berlin30, National Scientific and Technical Research Council31, Concordia University32, Kyungpook National University33, University of the Republic34, Karlsruhe Institute of Technology35, University of Sulaymaniyah36, University of Ljubljana37, Leibniz Association38, Stanford University39, Kansas State University40, Technical University of Denmark41, University of Wisconsin-Madison42, Pacific Northwest National Laboratory43, University of Jena44, University of São Paulo45, University of Kansas46, Imperial College London47, Université Paris-Saclay48, Vienna University of Technology49
TL;DR: In this article, a comparative genomics and experimental study of the aspergilli genus is presented, which allows for the first time a genus-wide view of the biological diversity of the Aspergillus and in many, but not all, cases linked genome differences to phenotype.
Abstract: Background: The fungal genus Aspergillus is of critical importance to humankind. Species include those with industrial applications, important pathogens of humans, animals and crops, a source of potent carcinogenic contaminants of food, and an important genetic model. The genome sequences of eight aspergilli have already been explored to investigate aspects of fungal biology, raising questions about evolution and specialization within this genus. Results: We have generated genome sequences for ten novel, highly diverse Aspergillus species and compared these in detail to sister and more distant genera. Comparative studies of key aspects of fungal biology, including primary and secondary metabolism, stress response, biomass degradation, and signal transduction, revealed both conservation and diversity among the species. Observed genomic differences were validated with experimental studies. This revealed several highlights, such as the potential for sex in asexual species, organic acid production genes being a key feature of black aspergilli, alternative approaches for degrading plant biomass, and indications for the genetic basis of stress response. A genome-wide phylogenetic analysis demonstrated in detail the relationship of the newly genome sequenced species with other aspergilli. Conclusions: Many aspects of biological differences between fungal species cannot be explained by current knowledge obtained from genome sequences. The comparative genomics and experimental study, presented here, allows for the first time a genus-wide view of the biological diversity of the aspergilli and in many, but not all, cases linked genome differences to phenotype. Insights gained could be exploited for biotechnological and medical applications of fungi.

Journal ArticleDOI
TL;DR: This study presents the first comprehensive picture of cytosine methylation in the epitranscriptome of pluripotent and differentiated stages in the mouse and analyses potential correlations between m5C and micro RNA target sites, binding sites of RNA binding proteins, and N6-methyladenosine.
Abstract: Recent work has identified and mapped a range of posttranscriptional modifications in mRNA, including methylation of the N6 and N1 positions in adenine, pseudouridylation, and methylation of carbon 5 in cytosine (m5C). However, knowledge about the prevalence and transcriptome-wide distribution of m5C is still extremely limited; thus, studies in different cell types, tissues, and organisms are needed to gain insight into possible functions of this modification and implications for other regulatory processes. We have carried out an unbiased global analysis of m5C in total and nuclear poly(A) RNA of mouse embryonic stem cells and murine brain. We show that there are intriguing differences in these samples and cell compartments with respect to the degree of methylation, functional classification of methylated transcripts, and position bias within the transcript. Specifically, we observe a pronounced accumulation of m5C sites in the vicinity of the translational start codon, depletion in coding sequences, and mixed patterns of enrichment in the 3′ UTR. Degree and pattern of methylation distinguish transcripts modified in both embryonic stem cells and brain from those methylated in either one of the samples. We also analyze potential correlations between m5C and micro RNA target sites, binding sites of RNA binding proteins, and N6-methyladenosine. Our study presents the first comprehensive picture of cytosine methylation in the epitranscriptome of pluripotent and differentiated stages in the mouse. These data provide an invaluable resource for future studies of function and biological significance of m5C in mRNA in mammals.

Journal ArticleDOI
TL;DR: An epigenetic predictor of age in mice is identified and characterised, which will be instrumental for understanding the biology of ageing and will allow modulation of its ticking rate and resetting the clock in vivo to study the impact on biological age.
Abstract: DNA methylation changes at a discrete set of sites in the human genome are predictive of chronological and biological age. However, it is not known whether these changes are causative or a consequence of an underlying ageing process. It has also not been shown whether this epigenetic clock is unique to humans or conserved in the more experimentally tractable mouse. We have generated a comprehensive set of genome-scale base-resolution methylation maps from multiple mouse tissues spanning a wide range of ages. Many CpG sites show significant tissue-independent correlations with age which allowed us to develop a multi-tissue predictor of age in the mouse. Our model, which estimates age based on DNA methylation at 329 unique CpG sites, has a median absolute error of 3.33 weeks and has similar properties to the recently described human epigenetic clock. Using publicly available datasets, we find that the mouse clock is accurate enough to measure effects on biological age, including in the context of interventions. While females and males show no significant differences in predicted DNA methylation age, ovariectomy results in significant age acceleration in females. Furthermore, we identify significant differences in age-acceleration dependent on the lipid content of the diet. Here we identify and characterise an epigenetic predictor of age in mice, the mouse epigenetic clock. This clock will be instrumental for understanding the biology of ageing and will allow modulation of its ticking rate and resetting the clock in vivo to study the impact on biological age.

Journal ArticleDOI
TL;DR: Experimental approaches and current knowledge on the contribution of low-frequency and rare variants in complex disease and the challenges and opportunities for personalised medicine are reviewed.
Abstract: Despite thousands of genetic loci identified to date, a large proportion of genetic variation predisposing to complex disease and traits remains unaccounted for. Advances in sequencing technology enable focused explorations on the contribution of low-frequency and rare variants to human traits. Here we review experimental approaches and current knowledge on the contribution of these genetic variants in complex disease and discuss challenges and opportunities for personalised medicine.

Journal ArticleDOI
TL;DR: It is shown that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
Abstract: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

Journal ArticleDOI
TL;DR: The state of the field and recent advances are discussed, and significant challenges remain in the analysis, integration, and interpretation of single-cell omics data.
Abstract: Single-cell analysis is a rapidly evolving approach to characterize genome-scale molecular information at the individual cell level. Development of single-cell technologies and computational methods has enabled systematic investigation of cellular heterogeneity in a wide range of tissues and cell populations, yielding fresh insights into the composition, dynamics, and regulatory mechanisms of cell states in development and disease. Despite substantial advances, significant challenges remain in the analysis, integration, and interpretation of single-cell omics data. Here, we discuss the state of the field and recent advances and look to future opportunities.

Journal ArticleDOI
TL;DR: This study shows that lifespan-extending conditions can slow molecular changes associated with an epigenetic clock in mice livers, and finds that mice treated with lifespan-Extending interventions were significantly younger in epigenetic age than their untreated, wild-type age-matched controls.
Abstract: Global but predictable changes impact the DNA methylome as we age, acting as a type of molecular clock. This clock can be hastened by conditions that decrease lifespan, raising the question of whether it can also be slowed, for example, by conditions that increase lifespan. Mice are particularly appealing organisms for studies of mammalian aging; however, epigenetic clocks have thus far been formulated only in humans. We first examined whether mice and humans experience similar patterns of change in the methylome with age. We found moderate conservation of CpG sites for which methylation is altered with age, with both species showing an increase in methylome disorder during aging. Based on this analysis, we formulated an epigenetic-aging model in mice using the liver methylomes of 107 mice from 0.2 to 26.0 months old. To examine whether epigenetic aging signatures are slowed by longevity-promoting interventions, we analyzed 28 additional methylomes from mice subjected to lifespan-extending conditions, including Prop1df/df dwarfism, calorie restriction or dietary rapamycin. We found that mice treated with these lifespan-extending interventions were significantly younger in epigenetic age than their untreated, wild-type age-matched controls. This study shows that lifespan-extending conditions can slow molecular changes associated with an epigenetic clock in mice livers.

Journal ArticleDOI
TL;DR: The data suggest biomarkers identified in this study might participate in the pathogenesis or development process of ankylosing spondylitis, providing new leads for the development of new diagnostic tools and potential treatments.
Abstract: The assessment and characterization of the gut microbiome has become a focus of research in the area of human autoimmune diseases. Ankylosing spondylitis is an inflammatory autoimmune disease and evidence showed that ankylosing spondylitis may be a microbiome-driven disease. To investigate the relationship between the gut microbiome and ankylosing spondylitis, a quantitative metagenomics study based on deep shotgun sequencing was performed, using gut microbial DNA from 211 Chinese individuals. A total of 23,709 genes and 12 metagenomic species were shown to be differentially abundant between ankylosing spondylitis patients and healthy controls. Patients were characterized by a form of gut microbial dysbiosis that is more prominent than previously reported cases with inflammatory bowel disease. Specifically, the ankylosing spondylitis patients demonstrated increases in the abundance of Prevotella melaninogenica, Prevotella copri, and Prevotella sp. C561 and decreases in Bacteroides spp. It is noteworthy that the Bifidobacterium genus, which is commonly used in probiotics, accumulated in the ankylosing spondylitis patients. Diagnostic algorithms were established using a subset of these gut microbial biomarkers. Alterations of the gut microbiome are associated with development of ankylosing spondylitis. Our data suggest biomarkers identified in this study might participate in the pathogenesis or development process of ankylosing spondylitis, providing new leads for the development of new diagnostic tools and potential treatments.

Journal ArticleDOI
TL;DR: A Bayesian method to control bias and inflation in EWAS and TWAS based on estimation of the empirical null distribution is proposed and it is demonstrated that the method maximizes power while properly controlling the false positive rate.
Abstract: We show that epigenome- and transcriptome-wide association studies (EWAS and TWAS) are prone to significant inflation and bias of test statistics, an unrecognized phenomenon introducing spurious findings if left unaddressed. Neither GWAS-based methodology nor state-of-the-art confounder adjustment methods completely remove bias and inflation. We propose a Bayesian method to control bias and inflation in EWAS and TWAS based on estimation of the empirical null distribution. Using simulations and real data, we demonstrate that our method maximizes power while properly controlling the false positive rate. We illustrate the utility of our method in large-scale EWAS and TWAS meta-analyses of age and smoking.

Journal ArticleDOI
TL;DR: This study identifies novel relationships between the composition of the gut microbiota and circulating metabolites and provides a resource for future studies to understand host–gut microbiota relationships.
Abstract: The gut microbiome is a complex and metabolically active community that directly influences host phenotypes. In this study, we profile gut microbiota using 16S rRNA gene sequencing in 531 well-phenotyped Finnish men from the Metabolic Syndrome In Men (METSIM) study. We investigate gut microbiota relationships with a variety of factors that have an impact on the development of metabolic and cardiovascular traits. We identify novel associations between gut microbiota and fasting serum levels of a number of metabolites, including fatty acids, amino acids, lipids, and glucose. In particular, we detect associations with fasting plasma trimethylamine N-oxide (TMAO) levels, a gut microbiota-dependent metabolite associated with coronary artery disease and stroke. We further investigate the gut microbiota composition and microbiota–metabolite relationships in subjects with different body mass index and individuals with normal or altered oral glucose tolerance. Finally, we perform microbiota co-occurrence network analysis, which shows that certain metabolites strongly correlate with microbial community structure and that some of these correlations are specific for the pre-diabetic state. Our study identifies novel relationships between the composition of the gut microbiota and circulating metabolites and provides a resource for future studies to understand host–gut microbiota relationships.

Journal ArticleDOI
TL;DR: A probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors, outperforms two established multi-class classification methods on simulations and real data.
Abstract: We propose a probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. CancerLocator outperforms two established multi-class classification methods on simulations and real data, even with the low proportion of tumor-derived DNA in the cell-free DNA scenarios. CancerLocator also achieves promising results on patient plasma samples with low DNA methylation sequencing coverage.

Journal ArticleDOI
TL;DR: This study reveals that retroduplication has played key roles for the massive emergence of NLR genes including functional disease-resistance genes in pepper plants.
Abstract: Transposable elements are major evolutionary forces which can cause new genome structure and species diversification. The role of transposable elements in the expansion of nucleotide-binding and leucine-rich-repeat proteins (NLRs), the major disease-resistance gene families, has been unexplored in plants. We report two high-quality de novo genomes (Capsicum baccatum and C. chinense) and an improved reference genome (C. annuum) for peppers. Dynamic genome rearrangements involving translocations among chromosomes 3, 5, and 9 were detected in comparison between C. baccatum and the two other peppers. The amplification of athila LTR-retrotransposons, members of the gypsy superfamily, led to genome expansion in C. baccatum. In-depth genome-wide comparison of genes and repeats unveiled that the copy numbers of NLRs were greatly increased by LTR-retrotransposon-mediated retroduplication. Moreover, retroduplicated NLRs are abundant across the angiosperms and, in most cases, are lineage-specific. Our study reveals that retroduplication has played key roles for the massive emergence of NLR genes including functional disease-resistance genes in pepper plants.

Journal ArticleDOI
TL;DR: This study provides new insights into dynamic DNA methylation reprogramming events during seed development and germination and suggests possible mechanisms of regulation in seed dormancy.
Abstract: Unlike animals, plants can pause their life cycle as dormant seeds. In both plants and animals, DNA methylation is involved in the regulation of gene expression and genome integrity. In animals, reprogramming erases and re-establishes DNA methylation during development. However, knowledge of reprogramming or reconfiguration in plants has been limited to pollen and the central cell. To better understand epigenetic reconfiguration in the embryo, which forms the plant body, we compared time-series methylomes of dry and germinating seeds to publicly available seed development methylomes. Time-series whole genome bisulfite sequencing reveals extensive gain of CHH methylation during seed development and drastic loss of CHH methylation during germination. These dynamic changes in methylation mainly occur within transposable elements. Active DNA methylation during seed development depends on both RNA-directed DNA methylation and heterochromatin formation pathways, whereas global demethylation during germination occurs in a passive manner. However, an active DNA demethylation pathway is initiated during late seed development. This study provides new insights into dynamic DNA methylation reprogramming events during seed development and germination and suggests possible mechanisms of regulation. The observed sequential methylation/demethylation cycle suggests an important role of DNA methylation in seed dormancy.

Journal ArticleDOI
TL;DR: This study shows the oncogenic role of circPVT1 in HNSCC, extending current knowledge about the role of circular RNAs in cancer.
Abstract: Circular RNAs are a class of endogenous RNAs with various functions in eukaryotic cells. Worthy of note, circular RNAs play a critical role in cancer. Currently, nothing is known about their role in head and neck squamous cell carcinoma (HNSCC). The identification of circular RNAs in HNSCC might become useful for diagnostic and therapeutic strategies in HNSCC. Using samples from 115 HNSCC patients, we find that circPVT1 is over-expressed in tumors compared to matched non-tumoral tissues, with particular enrichment in patients with TP53 mutations. circPVT1 up- and down-regulation determine, respectively, an increase and a reduction of the malignant phenotype in HNSCC cell lines. We show that circPVT1 expression is transcriptionally enhanced by the mut-p53/YAP/TEAD complex. circPVT1 acts as an oncogene modulating the expression of miR-497-5p and genes involved in the control of cell proliferation. This study shows the oncogenic role of circPVT1 in HNSCC, extending current knowledge about the role of circular RNAs in cancer.

Journal ArticleDOI
TL;DR: Genome-wide demographic analyses reveal that maize experienced pronounced declines in effective population size due to both a protracted domestication bottleneck and serial founder effects during post-domestication spread, while parviglumis in the Balsas River Valley experienced population growth.
Abstract: The history of maize has been characterized by major demographic events, including population size changes associated with domestication and range expansion, and gene flow with wild relatives. The interplay between demographic history and selection has shaped diversity across maize populations and genomes. We investigate these processes using high-depth resequencing data from 31 maize landraces spanning the pre-Columbian distribution of maize, and four wild teosinte individuals (Zea mays ssp. parviglumis). Genome-wide demographic analyses reveal that maize experienced pronounced declines in effective population size due to both a protracted domestication bottleneck and serial founder effects during post-domestication spread, while parviglumis in the Balsas River Valley experienced population growth. The domestication bottleneck and subsequent spread led to an increase in deleterious alleles in the domesticate compared to the wild progenitor. This cost is particularly pronounced in Andean maize, which has experienced a more dramatic founder event compared to other maize populations. Additionally, we detect introgression from the wild teosinte Zea mays ssp. mexicana into maize in the highlands of Mexico, Guatemala, and the southwestern USA, which reduces the prevalence of deleterious alleles likely due to the higher long-term effective population size of teosinte. These findings underscore the strong interaction between historical demography and the efficiency of selection and illustrate how domesticated species are particularly useful for understanding these processes. The landscape of deleterious alleles and therefore evolutionary potential is clearly influenced by recent demography, a factor that could bear importantly on many species that have experienced recent demographic shifts.

Journal ArticleDOI
TL;DR: Analysis of 2573 samples showed that IR occurs in all tissues analyzed, affects over 80% of all coding genes and is associated with cell differentiation and the cell cycle.
Abstract: Intron retention (IR) occurs when an intron is transcribed into pre-mRNA and remains in the final mRNA. We have developed a program and database called IRFinder to accurately detect IR from mRNA sequencing data. Analysis of 2573 samples showed that IR occurs in all tissues analyzed, affects over 80% of all coding genes and is associated with cell differentiation and the cell cycle. Frequently retained introns are enriched for specific RNA binding protein sites and are often retained in clusters in the same gene. IR is associated with lower protein levels and intron-retaining transcripts that escape nonsense-mediated decay are not actively translated.