scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2015"


Journal ArticleDOI
TL;DR: A novel orthogroups inference algorithm called OrthoFinder is provided that solves a previously undetected gene length bias in orthogroup inference, resulting in significant improvements in accuracy and utility.
Abstract: Identifying homology relationships between sequences is fundamental to biological research. Here we provide a novel orthogroup inference algorithm called OrthoFinder that solves a previously undetected gene length bias in orthogroup inference, resulting in significant improvements in accuracy. Using real benchmark datasets we demonstrate that OrthoFinder is more accurate than other orthogroup inference methods by between 8 % and 33 %. Furthermore, we demonstrate the utility of OrthoFinder by providing a complete classification of transcription factor gene families in plants revealing 6.9 million previously unobserved relationships.

2,478 citations


Journal ArticleDOI
TL;DR: This work argues that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation and provides gene set enrichment analysis tailored to single-cell data.
Abstract: Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST .

1,770 citations


Journal ArticleDOI
TL;DR: This work applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time and its fast implementation of the iterative correction method.
Abstract: HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-efficient data format for Hi-C contact maps. In addition, HiC-Pro can use phased genotype data to build allele-specific contact maps. We applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time. Source code and documentation are available at http://github.com/nservant/HiC-Pro .

1,444 citations


Journal ArticleDOI
Brian Tjaden1
TL;DR: This work presents novel algorithms, specific to bacterial gene structures and transcriptomes, for analysis of bacterial RNA-seq data and de novo transcriptome assembly, implemented in an open source software system called Rockhopper 2.
Abstract: Transcriptome assays are increasingly being performed by high-throughput RNA sequencing (RNA-seq). For organisms whose genomes have not been sequenced and annotated, transcriptomes must be assembled de novo from the RNA-seq data. Here, we present novel algorithms, specific to bacterial gene structures and transcriptomes, for analysis of bacterial RNA-seq data and de novo transcriptome assembly. The algorithms are implemented in an open source software system called Rockhopper 2. We find that Rockhopper 2 outperforms other de novo transcriptome assemblers and offers accurate and efficient analysis of bacterial RNA-seq data. Rockhopper 2 is available at http://cs.wellesley.edu/~btjaden/Rockhopper.

1,437 citations


Journal ArticleDOI
TL;DR: DNA methylation-derived measures of accelerated aging are heritable traits that predict mortality independently of health status, lifestyle factors, and known genetic factors.
Abstract: Background: DNA methylation levels change with age. Recent studies have identified biomarkers of chronological age based on DNA methylation levels. It is not yet known whether DNA methylation age captures aspects of biological age. Results: Here we test whether differences between people’s chronological ages and estimated ages, DNA methylation age, predict all-cause mortality in later life. The difference between DNA methylation age and chronological age (Δage) was calculated in four longitudinal cohorts of older people. Meta-analysis of proportional hazards models from the four cohorts was used to determine the association between Δage and mortality. A 5-year higher Δage is associated with a 21% higher mortality risk, adjusting for age and sex. After further adjustments for childhood IQ, education, social class, hypertension, diabetes, cardiovascular disease, and APOE e4 status, there is a 16% increased mortality risk for those with a 5-year higher Δage. A pedigree-based heritability analysis of Δage was conducted in a separate cohort. The heritability of Δage was 0.43. Conclusions: DNA methylation-derived measures of accelerated aging are heritable traits that predict mortality independently of health status, lifestyle factors, and known genetic factors.

916 citations


Journal ArticleDOI
TL;DR: A novel chiastic clipping signal-based algorithm, CIRI, is presented, to unbiasedly and accurately detect circRNAs from transcriptome data by employing multiple filtration strategies and to identify and experimentally validate the prevalence of intronic/intergenic circ RNAs as well as fragments specific to them in the human transcriptome.
Abstract: Recent studies reveal that circular RNAs (circRNAs) are a novel class of abundant, stable and ubiquitous noncoding RNA molecules in animals. Comprehensive detection of circRNAs from high-throughput transcriptome data is an initial and crucial step to study their biogenesis and function. Here, we present a novel chiastic clipping signal-based algorithm, CIRI, to unbiasedly and accurately detect circRNAs from transcriptome data by employing multiple filtration strategies. By applying CIRI to ENCODE RNA-seq data, we for the first time identify and experimentally validate the prevalence of intronic/intergenic circRNAs as well as fragments specific to them in the human transcriptome.

798 citations


Journal ArticleDOI
TL;DR: Comparisons of 12 combinations of eight promoters and two terminators found that the efficiency of the egg cell-specific promoter-controlled CRISPR/Cas9 system depended on the presence of a suitable terminator, and the composite promoter generated by fusing two eggcell-specific promoters resulted in much higher efficiency of mutation in the T1 generation compared with the single promoters.
Abstract: Arabidopsis mutants produced by constitutive overexpression of the CRISPR/Cas9 genome editing system are usually mosaics in the T1 generation. In this study, we used egg cell-specific promoters to drive the expression of Cas9 and obtained non-mosaic T1 mutants for multiple target genes with high efficiency. Comparisons of 12 combinations of eight promoters and two terminators found that the efficiency of the egg cell-specific promoter-controlled CRISPR/Cas9 system depended on the presence of a suitable terminator, and the composite promoter generated by fusing two egg cell-specific promoters resulted in much higher efficiency of mutation in the T1 generation compared with the single promoters.

715 citations


Journal ArticleDOI
TL;DR: Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences, correctly circularized 26 of 27 circularizable sequences.
Abstract: The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/.

681 citations


Journal ArticleDOI
TL;DR: The resulting data is assembled into a centralized data resource that contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.
Abstract: The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

656 citations


Journal ArticleDOI
TL;DR: The role of host genetic variation in shaping the composition of the human microbiome is highlighted, and the results provide a starting point toward understanding the complex interaction between human genetics and the microbiome in the context of human evolution and disease.
Abstract: Background: The composition of bacteria in and on the human body varies widely across human individuals, and has been associated with multiple health conditions. While microbial communities are influenced by environmental factors, some degree of genetic influence of the host on the microbiome is also expected. This study is part of an expanding effort to comprehensively profile the interactions between human genetic variation and the composition of this microbial ecosystem on a genome- and microbiome-wide scale. Results: Here, we jointly analyze the composition of the human microbiome and host genetic variation. By mining the shotgun metagenomic data from the Human Microbiome Project for host DNA reads, we gathered information on host genetic variation for 93 individuals for whom bacterial abundance data are also available. Using this dataset, we identify significant associations between host genetic variation and microbiome composition in 10 of the 15 body sites tested. These associations are driven by host genetic variation in immunity-related pathways, and are especially enriched in host genes that have been previously associated with microbiome-related complex diseases, such as inflammatory bowel disease and obesity-related disorders. Lastly, we show that host genomic regions associated with the microbiome have high levels of genetic differentiation among human populations, possibly indicating host genomic adaptation to environment-specific microbiomes. Conclusions: Our results highlight the role of host genetic variation in shaping the composition of the human microbiome, and provide a starting point toward understanding the complex interaction between human genetics and the microbiome in the context of human evolution and disease.

598 citations


Journal ArticleDOI
TL;DR: A dimensionality-reduction method is developed, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and it is shown that it improves modeling accuracy on simulated and biological data sets.
Abstract: Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data. Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.

Journal ArticleDOI
TL;DR: A systematic, high-resolution survey of lncRNA localization reveals aspects of lNCRNAs that are similar to mRNAs, such as cell-to-cell variability, but also several distinct properties that may correspond to particular functional roles.
Abstract: Long non-coding RNAs (lncRNAs) have been implicated in diverse biological processes. In contrast to extensive genomic annotation of lncRNA transcripts, far fewer have been characterized for subcellular localization and cell-to-cell variability. Addressing this requires systematic, direct visualization of lncRNAs in single cells at single-molecule resolution. We use single-molecule RNA-FISH to systematically quantify and categorize the subcellular localization patterns of a representative set of 61 lncRNAs in three different cell types. Our survey yields high-resolution quantification and stringent validation of the number and spatial positions of these lncRNA, with an mRNA set for comparison. Using this highly quantitative image-based dataset, we observe a variety of subcellular localization patterns, ranging from bright sub-nuclear foci to almost exclusively cytoplasmic localization. We also find that the low abundance of lncRNAs observed from cell population measurements cannot be explained by high expression in a small subset of ‘jackpot’ cells. Additionally, nuclear lncRNA foci dissolve during mitosis and become widely dispersed, suggesting these lncRNAs are not mitotic bookmarking factors. Moreover, we see that divergently transcribed lncRNAs do not always correlate with their cognate mRNA, nor do they have a characteristic localization pattern. Our systematic, high-resolution survey of lncRNA localization reveals aspects of lncRNAs that are similar to mRNAs, such as cell-to-cell variability, but also several distinct properties. These characteristics may correspond to particular functional roles. Our study also provides a quantitative description of lncRNAs at the single-cell level and a universally applicable framework for future study and validation of lncRNAs.

Journal ArticleDOI
TL;DR: High-frequency, precise modification of the tomato genome was achieved using geminivirus replicons, suggesting that these vectors can overcome the efficiency barrier that has made gene targeting in plants challenging.
Abstract: The use of homologous recombination to precisely modify plant genomes has been challenging, due to the lack of efficient methods for delivering DNA repair templates to plant cells. Even with the advent of sequence-specific nucleases, which stimulate homologous recombination at predefined genomic sites by creating targeted DNA double-strand breaks, there are only a handful of studies that report precise editing of endogenous genes in crop plants. More efficient methods are needed to modify plant genomes through homologous recombination, ideally without randomly integrating foreign DNA. Here, we use geminivirus replicons to create heritable modifications to the tomato genome at frequencies tenfold higher than traditional methods of DNA delivery (i.e., Agrobacterium). A strong promoter was inserted upstream of a gene controlling anthocyanin biosynthesis, resulting in overexpression and ectopic accumulation of pigments in tomato tissues. More than two-thirds of the insertions were precise, and had no unanticipated sequence modifications. Both TALENs and CRISPR/Cas9 achieved gene targeting at similar efficiencies. Further, the targeted modification was transmitted to progeny in a Mendelian fashion. Even though donor molecules were replicated in the vectors, no evidence was found of persistent extra-chromosomal replicons or off-target integration of T-DNA or replicon sequences. High-frequency, precise modification of the tomato genome was achieved using geminivirus replicons, suggesting that these vectors can overcome the efficiency barrier that has made gene targeting in plants challenging. This work provides a foundation for efficient genome editing of crop genomes without the random integration of foreign DNA.

Journal ArticleDOI
TL;DR: A new algorithm is presented that increases the sensitivity and specificity of circular RNA detection by discovering and quantifying circular and linear RNA splicing events at both annotated and un-annotated exon boundaries, including intergenic regions of the genome, with high statistical confidence.
Abstract: Background: The pervasive expression of circular RNA is a recently discovered feature of gene expression in highly diverged eukaryotes, but the functions of most circular RNAs are still unknown. Computational methods to discover and quantify circular RNA are essential. Moreover, discovering biological contexts where circular RNAs are regulated will shed light on potential functional roles they may play. Results: We present a new algorithm that increases the sensitivity and specificity of circular RNA detection by discovering and quantifying circular and linear RNA splicing events at both annotated and un-annotated exon boundaries, including intergenic regions of the genome, with high statistical confidence. Unlike approaches that rely on read count and exon homology to determine confidence in prediction of circular RNA expression, our algorithm uses a statistical approach. Using our algorithm, we unveiled striking induction of general and tissue-specific circular RNAs, including in the heart and lung, during human fetal development. We discover regions of the human fetal brain, such as the frontal cortex, with marked enrichment for genes where circular RNA isoforms are dominant. Conclusions: The vast majority of circular RNA production occurs at major spliceosome splice sites; however, we find the first examples of developmentally induced circular RNAs processed by the minor spliceosome, and an enriched propensity of minor spliceosome donors to splice into circular RNA at un-annotated, rather than annotated, exons. Together, these results suggest a potentially significant role for circular RNA in human development.

Journal ArticleDOI
TL;DR: Focusing primarily on the aggrecanases and proteoglycanases, a perspective on the evolution of the ADAMTS family, their links with developmental and disease mechanisms, and key questions for the future are provided.
Abstract: The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) enzymes are secreted, multi-domain matrix-associated zinc metalloendopeptidases that have diverse roles in tissue morphogenesis and patho-physiological remodeling, in inflammation and in vascular biology. The human family includes 19 members that can be sub-grouped on the basis of their known substrates, namely the aggrecanases or proteoglycanases (ADAMTS1, 4, 5, 8, 9, 15 and 20), the procollagen N-propeptidases (ADAMTS2, 3 and 14), the cartilage oligomeric matrix protein-cleaving enzymes (ADAMTS7 and 12), the von-Willebrand Factor proteinase (ADAMTS13) and a group of orphan enzymes (ADAMTS6, 10, 16, 17, 18 and 19). Control of the structure and function of the extracellular matrix (ECM) is a central theme of the biology of the ADAMTS, as exemplified by the actions of the procollagen-N-propeptidases in collagen fibril assembly and of the aggrecanases in the cleavage or modification of ECM proteoglycans. Defects in certain family members give rise to inherited genetic disorders, while the aberrant expression or function of others is associated with arthritis, cancer and cardiovascular disease. In particular, ADAMTS4 and 5 have emerged as therapeutic targets in arthritis. Multiple ADAMTSs from different sub-groupings exert either positive or negative effects on tumorigenesis and metastasis, with both metalloproteinase-dependent and -independent actions known to occur. The basic ADAMTS structure comprises a metalloproteinase catalytic domain and a carboxy-terminal ancillary domain, the latter determining substrate specificity and the localization of the protease and its interaction partners; ancillary domains probably also have independent biological functions. Focusing primarily on the aggrecanases and proteoglycanases, this review provides a perspective on the evolution of the ADAMTS family, their links with developmental and disease mechanisms, and key questions for the future.

Journal ArticleDOI
TL;DR: The use of RNA-guided Cas9 is demonstrated to generate mutations in target genes of both barley and B. oleracea and show stable transmission of these mutations thus establishing the potential for rapid characterisation of gene function in these species.
Abstract: The RNA-guided Cas9 system represents a flexible approach for genome editing in plants. This method can create specific mutations that knock-out or alter target gene function. It provides a valuable tool for plant research and offers opportunities for crop improvement. We investigate the use and target specificity requirements of RNA-guided Cas9 genome editing in barley (Hordeum vulgare) and Brassica oleracea by targeting multicopy genes. In barley, we target two copies of HvPM19 and observe Cas9-induced mutations in the first generation of 23 % and 10 % of the lines, respectively. In B. oleracea, targeting of BolC.GA4.a leads to Cas9-induced mutations in 10 % of first generation plants screened. In addition, a phenotypic screen identifies T0 plants with the expected dwarf phenotype associated with knock-out of the target gene. In both barley and B. oleracea stable Cas9-induced mutations are transmitted to T2 plants independently of the T-DNA construct. We observe off-target activity in both species, despite the presence of at least one mismatch between the single guide RNA and the non-target gene sequences. In barley, a transgene-free plant has concurrent mutations in the target and non-target copies of HvPM19. We demonstrate the use of RNA-guided Cas9 to generate mutations in target genes of both barley and B. oleracea and show stable transmission of these mutations thus establishing the potential for rapid characterisation of gene function in these species. In addition, the off-target effects reported offer both potential difficulties and specific opportunities to target members of multigene families in crops.

Journal ArticleDOI
TL;DR: The immunophenotypes of the tumors and the cancer antigenome remain widely unexplored, and the findings represent a step toward the development of personalized cancer immunotherapies.
Abstract: Background: While large-scale cancer genomic projects are comprehensively characterizing the mutational spectrum of various cancers, so far little attention has been devoted to either define the antigenicity of these mutations or to characterize the immune responses they elicit. Here we present a strategy to characterize the immunophenotypes and the antigen-ome of human colorectal cancer. Results: We apply our strategy to a large colorectal cancer cohort (n = 598) and show that subpopulations of tumor-infiltrating lymphocytes are associated with distinct molecular phenotypes. The characterization of the antigenome shows that a large number of cancer-germline antigens are expressed in all patients. In contrast, neo-antigens are rarely shared between patients, indicating that cancer vaccination requires individualized strategy. Analysis of the genetic basis of the tumors reveals distinct tumor escape mechanisms for the patient subgroups. Hypermutated tumors are depleted of immunosuppressive cells and show upregulation of immunoinhibitory molecules. Non-hypermutated tumors are enriched with immunosuppressive cells, and the expression of immunoinhibitors and MHC molecules is downregulated. Reconstruction of the interaction network of tumor-infiltrating lymphocytes and immunomodulatory molecules followed by a validation with 11 independent cohorts (n = 1,945) identifies BCMA as a novel druggable target. Finally, linear regression modeling identifies major determinants of tumor immunogenicity, which include well-characterized modulators as well as a novel candidate, CCR8, which is then tested in an orthologous immunodeficient mouse model. Conclusions: The immunophenotypes of the tumors and the cancer antigenome remain widely unexplored, and our findings represent a step toward the development of personalized cancer immunotherapies.

Journal ArticleDOI
TL;DR: A principled phylogenic correction for VAFs in loci affected by copy number alterations is introduced and it is shown that this correction greatly improves subclonal reconstruction compared to existing methods.
Abstract: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Journal ArticleDOI
TL;DR: It is demonstrated that circRNAs are highly abundant and dynamically expressed in a spatio-temporal manner in porcine fetal brain, suggesting important functions during mammalian brain development.
Abstract: Recently, thousands of circular RNAs (circRNAs) have been discovered in various tissues and cell types from human, mouse, fruit fly and nematodes. However, expression of circRNAs across mammalian brain development has never been examined. Here we profile the expression of circRNA in five brain tissues at up to six time-points during fetal porcine development, constituting the first report of circRNA in the brain development of a large animal. An unbiased analysis reveals a highly complex regulation pattern of thousands of circular RNAs, with a distinct spatio-temporal expression profile. The amount and complexity of circRNA expression was most pronounced in cortex at day 60 of gestation. At this time-point we find 4634 unique circRNAs expressed from 2195 genes out of a total of 13,854 expressed genes. Approximately 20 % of the porcine splice sites involved in circRNA production are functionally conserved between mouse and human. Furthermore, we observe that “hot-spot” genes produce multiple circRNA isoforms, which are often differentially expressed across porcine brain development. A global comparison of porcine circRNAs reveals that introns flanking circularized exons are longer than average and more frequently contain proximal complementary SINEs, which potentially can facilitate base pairing between the flanking introns. Finally, we report the first use of RNase R treatment in combination with in situ hybridization to show dynamic subcellular localization of circRNA during development. These data demonstrate that circRNAs are highly abundant and dynamically expressed in a spatio-temporal manner in porcine fetal brain, suggesting important functions during mammalian brain development.

Journal ArticleDOI
TL;DR: This work collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome, which is then verified against independent assays for sensitivity.
Abstract: Most genomic variants associated with phenotypic traits or disease do not fall within gene coding regions, but in regulatory regions, rendering their interpretation difficult. We collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome. We verified it against independent assays for sensitivity. The Ensembl Regulatory Build will be progressively enriched when more data is made available. It is freely available on the Ensembl browser, from the Ensembl Regulation MySQL database server and in a dedicated track hub.

Journal ArticleDOI
TL;DR: In this paper, the CRISPR/Cas9 system was used in plants to confer molecular immunity against DNA viruses, including the tomato yellow leaf curl virus (TYLCV).
Abstract: The CRISPR/Cas9 system provides bacteria and archaea with molecular immunity against invading phages and conjugative plasmids. Recently, CRISPR/Cas9 has been used for targeted genome editing in diverse eukaryotic species. In this study, we investigate whether the CRISPR/Cas9 system could be used in plants to confer molecular immunity against DNA viruses. We deliver sgRNAs specific for coding and non-coding sequences of tomato yellow leaf curl virus (TYLCV) into Nicotiana benthamiana plants stably overexpressing the Cas9 endonuclease, and subsequently challenge these plants with TYLCV. Our data demonstrate that the CRISPR/Cas9 system targeted TYLCV for degradation and introduced mutations at the target sequences. All tested sgRNAs exhibit interference activity, but those targeting the stem-loop sequence within the TYLCV origin of replication in the intergenic region (IR) are the most effective. N. benthamiana plants expressing CRISPR/Cas9 exhibit delayed or reduced accumulation of viral DNA, abolishing or significantly attenuating symptoms of infection. Moreover, this system could simultaneously target multiple DNA viruses. These data establish the efficacy of the CRISPR/Cas9 system for viral interference in plants, thereby extending the utility of this technology and opening the possibility of producing plants resistant to multiple viral infections.

Journal ArticleDOI
Ben M. Sadd1, Ben M. Sadd2, Seth M. Barribeau1, Seth M. Barribeau3  +151 moreInstitutions (51)
TL;DR: Overall, gene repertoires suggest that the route to advanced eusociality in bees was mediated by many small changes in many genes and processes, and not by notable expansion or depauperation.
Abstract: The shift from solitary to social behavior is one of the major evolutionary transitions Primitively eusocial bumblebees are uniquely placed to illuminate the evolution of highly eusocial insect societies Bumblebees are also invaluable natural and agricultural pollinators, and there is widespread concern over recent population declines in some species High-quality genomic data will inform key aspects of bumblebee biology, including susceptibility to implicated population viability threats We report the high quality draft genome sequences of Bombus terrestris and Bombus impatiens, two ecologically dominant bumblebees and widely utilized study species Comparing these new genomes to those of the highly eusocial honeybee Apis mellifera and other Hymenoptera, we identify deeply conserved similarities, as well as novelties key to the biology of these organisms Some honeybee genome features thought to underpin advanced eusociality are also present in bumblebees, indicating an earlier evolution in the bee lineage Xenobiotic detoxification and immune genes are similarly depauperate in bumblebees and honeybees, and multiple categories of genes linked to social organization, including development and behavior, show high conservation Key differences identified include a bias in bumblebee chemoreception towards gustation from olfaction, and striking differences in microRNAs, potentially responsible for gene regulation underlying social and other traits These two bumblebee genomes provide a foundation for post-genomic research on these key pollinators and insect societies Overall, gene repertoires suggest that the route to advanced eusociality in bees was mediated by many small changes in many genes and processes, and not by notable expansion or depauperation

Journal ArticleDOI
TL;DR: AllMAPS is a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps, which is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps.
Abstract: The ordering and orientation of genomic scaffolds to reconstruct chromosomes is an essential step during de novo genome assembly. Because this process utilizes various mapping techniques that each provides an independent line of evidence, a combination of multiple maps can improve the accuracy of the resulting chromosomal assemblies. We present ALLMAPS, a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps. ALLMAPS is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps. ALLMAPS is a useful tool in building high-quality genome assemblies. ALLMAPS is available at: https://github.com/tanghaibao/jcvi/wiki/ALLMAPS.

Journal ArticleDOI
TL;DR: A single-cell universal poly(A)-independent RNA sequencing (SUPeR-seq) method to sequence both polyadenylated and non-polyadenylation RNAs from individual cells, which is key to deciphering regulation mechanisms of circRNAs during mammalian early embryonic development.
Abstract: Circular RNAs (circRNAs) are a new class of non-polyadenylated non-coding RNAs that may play important roles in many biological processes. Here we develop a single-cell universal poly(A)-independent RNA sequencing (SUPeR-seq) method to sequence both polyadenylated and non-polyadenylated RNAs from individual cells. This method exhibits robust sensitivity, precision and accuracy. We discover 2891 circRNAs and 913 novel linear transcripts in mouse preimplantation embryos and further analyze the abundance of circRNAs along development, the function of enriched genes, and sequence features of circRNAs. Our work is key to deciphering regulation mechanisms of circRNAs during mammalian early embryonic development.

Journal ArticleDOI
TL;DR: Cumulative lifetime stress may accelerate epigenetic aging, an effect that could be driven by glucocorticoid-induced epigenetic changes, which contribute to the understanding of mechanisms linking chronic stress with accelerated aging and heightened disease risk.
Abstract: Background: Chronic psychological stress is associated with accelerated aging and increased risk for aging-related diseases, but the underlying molecular mechanisms are unclear. Results: We examined the effect of lifetime stressors on a DNA methylation-based age predictor, epigenetic clock. After controlling for blood cell-type composition and lifestyle parameters, cumulative lifetime stress, but not childhood maltreatment or current stress alone, predicted accelerated epigenetic aging in an urban, African American cohort (n = 392). This effect was primarily driven by personal life stressors, was more pronounced with advancing age, and was blunted in individuals with higher childhood abuse exposure. Hypothesizing that these epigenetic effects could be mediated by glucocorticoid signaling, we found that a high number (n = 85) of epigenetic clock CpG sites were located within glucocorticoid response elements. We further examined the functional effects of glucocorticoids on epigenetic clock CpGs in an independent sample with genome-wide DNA methylation (n = 124) and gene expression data (n = 297) before and after exposure to the glucocorticoid receptor agonist dexamethasone. Dexamethasone induced dynamic changes in methylation in 31.2 % (110/353) of these CpGs and transcription in 81.7 % (139/170) of genes neighboring epigenetic clock CpGs. Disease enrichment analysis of these dexamethasone-regulated genes showed enriched association for aging-related diseases, including coronary artery disease, arteriosclerosis, and leukemias. Conclusions: Cumulative lifetime stress may accelerate epigenetic aging, an effect that could be driven by glucocorticoid-induced epigenetic changes. These findings contribute to our understanding of mechanisms linking chronic stress with accelerated aging and heightened disease risk.

Journal ArticleDOI
TL;DR: This work analyzes the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth.
Abstract: Allelic expression analysis has become important for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. We analyze the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting such errors, show that our quality control measures improve the detection of relevant allelic expression, and introduce tools for the high-throughput production of allelic expression data from RNA-sequencing data.

Journal ArticleDOI
TL;DR: It is demonstrated thatRNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction.
Abstract: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

Journal ArticleDOI
TL;DR: MAGeCK-VISPR defines a set of QC measures to assess the quality of an experiment, and includes a maximum-likelihood algorithm to call essential genes simultaneously under multiple conditions to iteratively estimate sgRNA knockout efficiency and gene essentiality.
Abstract: High-throughput CRISPR screens have shown great promise in functional genomics. We present MAGeCK-VISPR, a comprehensive quality control (QC), analysis, and visualization workflow for CRISPR screens. MAGeCK-VISPR defines a set of QC measures to assess the quality of an experiment, and includes a maximum-likelihood algorithm to call essential genes simultaneously under multiple conditions. The algorithm uses a generalized linear model to deconvolute different effects, and employs expectation-maximization to iteratively estimate sgRNA knockout efficiency and gene essentiality. MAGeCK-VISPR also includes VISPR, a framework for the interactive visualization and exploration of QC and analysis results. MAGeCK-VISPR is freely available at http://bitbucket.org/liulab/mageck-vispr .

Journal ArticleDOI
TL;DR: A systematic investigation of sgRNA structure finds that extending the duplex by approximately 5 bp combined with mutating the continuous sequence of thymines at position 4 to cytosine or guanine significantly increases gene knockout efficiency in CRISPR-Cas9-based genome editing experiments.
Abstract: Single-guide RNA (sgRNA) is one of the two key components of the clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 genome-editing system. The current commonly used sgRNA structure has a shortened duplex compared with the native bacterial CRISPR RNA (crRNA)–transactivating crRNA (tracrRNA) duplex and contains a continuous sequence of thymines, which is the pause signal for RNA polymerase III and thus could potentially reduce transcription efficiency. Here, we systematically investigate the effect of these two elements on knockout efficiency and showed that modifying the sgRNA structure by extending the duplex length and mutating the fourth thymine of the continuous sequence of thymines to cytosine or guanine significantly, and sometimes dramatically, improves knockout efficiency in cells. In addition, the optimized sgRNA structure also significantly increases the efficiency of more challenging genome-editing procedures, such as gene deletion, which is important for inducing a loss of function in non-coding genes. By a systematic investigation of sgRNA structure we find that extending the duplex by approximately 5 bp combined with mutating the continuous sequence of thymines at position 4 to cytosine or guanine significantly increases gene knockout efficiency in CRISPR-Cas9-based genome editing experiments.

Journal ArticleDOI
TL;DR: The N6-methyladenosine (m6A) modification of mRNA has a crucial function in regulating pluripotency in murine stem cells: it facilitates resolution of naive pluripOTency towards differentiation.
Abstract: The N6-methyladenosine (m6A) modification of mRNA has a crucial function in regulating pluripotency in murine stem cells: it facilitates resolution of naive pluripotency towards differentiation.