Showing papers in "Genome Biology in 2015"

PDF

Open Access

Journal Article•DOI•

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy

[...]

David M. Emms¹, Steven L. Kelly¹•Institutions (1)

06 Aug 2015-Genome Biology

TL;DR: A novel orthogroups inference algorithm called OrthoFinder is provided that solves a previously undetected gene length bias in orthogroup inference, resulting in significant improvements in accuracy and utility.

...read moreread less

Abstract: Identifying homology relationships between sequences is fundamental to biological research. Here we provide a novel orthogroup inference algorithm called OrthoFinder that solves a previously undetected gene length bias in orthogroup inference, resulting in significant improvements in accuracy. Using real benchmark datasets we demonstrate that OrthoFinder is more accurate than other orthogroup inference methods by between 8 % and 33 %. Furthermore, we demonstrate the utility of OrthoFinder by providing a complete classification of transcription factor gene families in plants revealing 6.9 million previously unobserved relationships.

...read moreread less

2,478 citations

Journal Article•DOI•

MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data

[...]

Greg Finak¹, Andrew McDavid¹, Masanao Yajima¹, Jingyuan Deng¹, Vivian H. Gersuk², Alex K. Shalek, Chloe K. Slichter¹, Hannah W. Miller¹, M. Juliana McElrath¹, Martin Prlic¹, Peter S. Linsley², Raphael Gottardo¹ - Show less +8 more•Institutions (2)

Fred Hutchinson Cancer Research Center¹, Benaroya Research Institute²

10 Dec 2015-Genome Biology

TL;DR: This work argues that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation and provides gene set enrichment analysis tailored to single-cell data.

...read moreread less

Abstract: Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST .

...read moreread less

1,770 citations

Journal Article•DOI•

HiC-Pro: an optimized and flexible pipeline for Hi-C data processing

[...]

Nicolas Servant¹, Nelle Varoquaux², Nelle Varoquaux³, Nelle Varoquaux¹, Bryan R. Lajoie⁴, Eric Viara, Chong-Jian Chen, Jean-Philippe Vert¹, Jean-Philippe Vert², Jean-Philippe Vert³, Edith Heard⁵, Edith Heard³, Edith Heard¹, Job Dekker⁴, Emmanuel Barillot³, Emmanuel Barillot¹, Emmanuel Barillot² - Show less +13 more•Institutions (5)

Curie Institute¹, PSL Research University², French Institute of Health and Medical Research³, University of Massachusetts Medical School⁴, Centre national de la recherche scientifique⁵

01 Dec 2015-Genome Biology

TL;DR: This work applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time and its fast implementation of the iterative correction method.

...read moreread less

Abstract: HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-efficient data format for Hi-C contact maps. In addition, HiC-Pro can use phased genotype data to build allele-specific contact maps. We applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time. Source code and documentation are available at http://github.com/nservant/HiC-Pro .

...read moreread less

1,444 citations

Journal Article•DOI•

De novo assembly of bacterial transcriptomes from RNA-seq data

[...]

Brian Tjaden¹•Institutions (1)

Wellesley College¹

13 Jan 2015-Genome Biology

TL;DR: This work presents novel algorithms, specific to bacterial gene structures and transcriptomes, for analysis of bacterial RNA-seq data and de novo transcriptome assembly, implemented in an open source software system called Rockhopper 2.

...read moreread less

Abstract: Transcriptome assays are increasingly being performed by high-throughput RNA sequencing (RNA-seq). For organisms whose genomes have not been sequenced and annotated, transcriptomes must be assembled de novo from the RNA-seq data. Here, we present novel algorithms, specific to bacterial gene structures and transcriptomes, for analysis of bacterial RNA-seq data and de novo transcriptome assembly. The algorithms are implemented in an open source software system called Rockhopper 2. We find that Rockhopper 2 outperforms other de novo transcriptome assemblers and offers accurate and efficient analysis of bacterial RNA-seq data. Rockhopper 2 is available at http://cs.wellesley.edu/~btjaden/Rockhopper.

...read moreread less

1,437 citations

Journal Article•DOI•

DNA methylation age of blood predicts all-cause mortality in later life.

[...]

Riccardo E. Marioni¹, Riccardo E. Marioni², Sonia Shah², Allan F. McRae², Brian H. Chen³, Elena Colicino⁴, Sarah E. Harris¹, Jude Gibson⁵, Anjali K. Henders⁶, Paul Redmond¹, Simon R. Cox¹, Alison Pattie¹, Janie Corley¹, Lee Murphy⁵, Nicholas G. Martin⁶, Grant W. Montgomery⁶, Andrew P. Feinberg⁷, M. Daniele Fallin⁸, M. Daniele Fallin⁷, Michael L. Multhaup⁷, Andrew E. Jaffe⁸, Roby Joehanes⁴, Roby Joehanes³, Joel Schwartz⁴, Allan C. Just⁴, Kathryn L. Lunetta⁹, Kathryn L. Lunetta³, Joanne M. Murabito³, Joanne M. Murabito⁹, John M. Starr¹, Steve Horvath¹⁰, Andrea A. Baccarelli⁴, Daniel Levy³, Peter M. Visscher¹, Peter M. Visscher², Naomi R. Wray², Ian J. Deary¹ - Show less +33 more•Institutions (10)

University of Edinburgh¹, University of Queensland², National Institutes of Health³, Harvard University⁴, Western General Hospital⁵, QIMR Berghofer Medical Research Institute⁶, Johns Hopkins University School of Medicine⁷, Johns Hopkins University⁸, Boston University⁹, University of California, Los Angeles¹⁰

30 Jan 2015-Genome Biology

TL;DR: DNA methylation-derived measures of accelerated aging are heritable traits that predict mortality independently of health status, lifestyle factors, and known genetic factors.

...read moreread less

Abstract: Background: DNA methylation levels change with age. Recent studies have identified biomarkers of chronological age based on DNA methylation levels. It is not yet known whether DNA methylation age captures aspects of biological age. Results: Here we test whether differences between people’s chronological ages and estimated ages, DNA methylation age, predict all-cause mortality in later life. The difference between DNA methylation age and chronological age (Δage) was calculated in four longitudinal cohorts of older people. Meta-analysis of proportional hazards models from the four cohorts was used to determine the association between Δage and mortality. A 5-year higher Δage is associated with a 21% higher mortality risk, adjusting for age and sex. After further adjustments for childhood IQ, education, social class, hypertension, diabetes, cardiovascular disease, and APOE e4 status, there is a 16% increased mortality risk for those with a 5-year higher Δage. A pedigree-based heritability analysis of Δage was conducted in a separate cohort. The heritability of Δage was 0.43. Conclusions: DNA methylation-derived measures of accelerated aging are heritable traits that predict mortality independently of health status, lifestyle factors, and known genetic factors.

...read moreread less

916 citations

Journal Article•DOI•

CIRI: an efficient and unbiased algorithm for de novo circular RNA identification.

[...]

Yuan Gao¹, Jinfeng Wang¹, Fangqing Zhao¹•Institutions (1)

Chinese Academy of Sciences¹

13 Jan 2015-Genome Biology

TL;DR: A novel chiastic clipping signal-based algorithm, CIRI, is presented, to unbiasedly and accurately detect circRNAs from transcriptome data by employing multiple filtration strategies and to identify and experimentally validate the prevalence of intronic/intergenic circ RNAs as well as fragments specific to them in the human transcriptome.

...read moreread less

Abstract: Recent studies reveal that circular RNAs (circRNAs) are a novel class of abundant, stable and ubiquitous noncoding RNA molecules in animals. Comprehensive detection of circRNAs from high-throughput transcriptome data is an initial and crucial step to study their biogenesis and function. Here, we present a novel chiastic clipping signal-based algorithm, CIRI, to unbiasedly and accurately detect circRNAs from transcriptome data by employing multiple filtration strategies. By applying CIRI to ENCODE RNA-seq data, we for the first time identify and experimentally validate the prevalence of intronic/intergenic circRNAs as well as fragments specific to them in the human transcriptome.

...read moreread less

798 citations

Journal Article•DOI•

Egg cell-specific promoter-controlled CRISPR/Cas9 efficiently generates homozygous mutants for multiple target genes in Arabidopsis in a single generation.

[...]

Zhi-Ping Wang¹, Hui Li Xing¹, Li Dong¹, Hai-Yan Zhang¹, Chun Yan Han¹, Xue Chen Wang¹, Qi-Jun Chen¹ - Show less +3 more•Institutions (1)

University of Minnesota¹

21 Jul 2015-Genome Biology

TL;DR: Comparisons of 12 combinations of eight promoters and two terminators found that the efficiency of the egg cell-specific promoter-controlled CRISPR/Cas9 system depended on the presence of a suitable terminator, and the composite promoter generated by fusing two eggcell-specific promoters resulted in much higher efficiency of mutation in the T1 generation compared with the single promoters.

...read moreread less

Abstract: Arabidopsis mutants produced by constitutive overexpression of the CRISPR/Cas9 genome editing system are usually mosaics in the T1 generation. In this study, we used egg cell-specific promoters to drive the expression of Cas9 and obtained non-mosaic T1 mutants for multiple target genes with high efficiency. Comparisons of 12 combinations of eight promoters and two terminators found that the efficiency of the egg cell-specific promoter-controlled CRISPR/Cas9 system depended on the presence of a suitable terminator, and the composite promoter generated by fusing two egg cell-specific promoters resulted in much higher efficiency of mutation in the T1 generation compared with the single promoters.

...read moreread less

715 citations

Journal Article•DOI•

Circlator: automated circularization of genome assemblies using long sequencing reads

[...]

Martin Hunt¹, Nishadi De Silva¹, Thomas D. Otto¹, Julian Parkhill¹, Jacqueline A. Keane¹, Simon R. Harris¹ - Show less +2 more•Institutions (1)

Wellcome Trust Sanger Institute¹

29 Dec 2015-Genome Biology

TL;DR: Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences, correctly circularized 26 of 27 circularizable sequences.

...read moreread less

Abstract: The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/.

...read moreread less

681 citations

Journal Article•DOI•

Gateways to the FANTOM5 promoter level mammalian expression atlas

[...]

Marina Lizio, Jayson Harshbarger, Hisashi Shimoji, Jessica Severin, Takeya Kasukawa, Serkan Sahin, Imad Abugessaisa, Shiro Fukuda, Fumi Hori, Sachi Ishikawa-Kato, Christopher J. Mungall¹, Erik Arner, J Kenneth Baillie², Nicolas Bertin³, Hidemasa Bono, Michiel J. L. de Hoon, Alexander D. Diehl⁴, Emmanuel Dimont⁵, Tom C. Freeman², Kaori Fujieda, Winston Hide⁶, Winston Hide⁵, Rajaram Kaliyaperumal⁷, Toshiaki Katayama, Timo Lassmann⁸, Terrence F. Meehan⁹, Koro Nishikata, Hiromasa Ono, Michael Rehli¹⁰, Albin Sandelin¹¹, Erik Anthony Schultes⁷, Erik Anthony Schultes¹², Peter A C 't Hoen⁷, Zuotian Tatum⁷, Mark Thompson⁷, Tetsuro Toyoda, Derek W. Wright², Carsten O. Daub, Masayoshi Itoh, Piero Carninci, Yoshihide Hayashizaki, Alistair R. R. Forrest, Hideya Kawaji - Show less +39 more•Institutions (12)

Lawrence Berkeley National Laboratory¹, University of Edinburgh², National University of Singapore³, University at Buffalo⁴, Harvard University⁵, University of Sheffield⁶, Leiden University Medical Center⁷, University of Western Australia⁸, European Bioinformatics Institute⁹, University Hospital Regensburg¹⁰, University of Copenhagen¹¹, Leiden University¹²

05 Jan 2015-Genome Biology

TL;DR: The resulting data is assembled into a centralized data resource that contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

...read moreread less

Abstract: The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

...read moreread less

656 citations

Journal Article•DOI•

Host genetic variation impacts microbiome composition across human body sites

[...]

Ran Blekhman¹, Julia K. Goodrich², Katherine H. Huang³, Qi Sun², Robert Bukowski², Jordana T. Bell⁴, Tim D. Spector⁴, Alon Keinan², Ruth E. Ley², Dirk Gevers⁵, Dirk Gevers³, Andrew G. Clark² - Show less +8 more•Institutions (5)

University of Minnesota¹, Cornell University², Broad Institute³, King's College London⁴, Janssen Pharmaceutica⁵

15 Sep 2015-Genome Biology

TL;DR: The role of host genetic variation in shaping the composition of the human microbiome is highlighted, and the results provide a starting point toward understanding the complex interaction between human genetics and the microbiome in the context of human evolution and disease.

...read moreread less

Abstract: Background: The composition of bacteria in and on the human body varies widely across human individuals, and has been associated with multiple health conditions. While microbial communities are influenced by environmental factors, some degree of genetic influence of the host on the microbiome is also expected. This study is part of an expanding effort to comprehensively profile the interactions between human genetic variation and the composition of this microbial ecosystem on a genome- and microbiome-wide scale. Results: Here, we jointly analyze the composition of the human microbiome and host genetic variation. By mining the shotgun metagenomic data from the Human Microbiome Project for host DNA reads, we gathered information on host genetic variation for 93 individuals for whom bacterial abundance data are also available. Using this dataset, we identify significant associations between host genetic variation and microbiome composition in 10 of the 15 body sites tested. These associations are driven by host genetic variation in immunity-related pathways, and are especially enriched in host genes that have been previously associated with microbiome-related complex diseases, such as inflammatory bowel disease and obesity-related disorders. Lastly, we show that host genomic regions associated with the microbiome have high levels of genetic differentiation among human populations, possibly indicating host genomic adaptation to environment-specific microbiomes. Conclusions: Our results highlight the role of host genetic variation in shaping the composition of the human microbiome, and provide a starting point toward understanding the complex interaction between human genetics and the microbiome in the context of human evolution and disease.

...read moreread less

598 citations

Journal Article•DOI•

ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis.

[...]

Emma Pierson¹, Christopher Yau¹, Christopher Yau²•Institutions (2)

University of Oxford¹, Wellcome Trust Centre for Human Genetics²

02 Nov 2015-Genome Biology

TL;DR: A dimensionality-reduction method is developed, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and it is shown that it improves modeling accuracy on simulated and biological data sets.

...read moreread less

Abstract: Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data. Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.

...read moreread less

Journal Article•DOI•

Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution.

[...]

Moran N. Cabili¹, Moran N. Cabili², Margaret C. Dunagin³, Patrick D. McClanahan³, Andrew G. Biaesch³, Olivia Padovan-Merhar³, Aviv Regev⁴, Aviv Regev², John L. Rinn², John L. Rinn¹, Arjun Raj³ - Show less +7 more•Institutions (4)

Harvard University¹, Broad Institute², University of Pennsylvania³, Massachusetts Institute of Technology⁴

29 Jan 2015-Genome Biology

TL;DR: A systematic, high-resolution survey of lncRNA localization reveals aspects of lNCRNAs that are similar to mRNAs, such as cell-to-cell variability, but also several distinct properties that may correspond to particular functional roles.

...read moreread less

Abstract: Long non-coding RNAs (lncRNAs) have been implicated in diverse biological processes. In contrast to extensive genomic annotation of lncRNA transcripts, far fewer have been characterized for subcellular localization and cell-to-cell variability. Addressing this requires systematic, direct visualization of lncRNAs in single cells at single-molecule resolution. We use single-molecule RNA-FISH to systematically quantify and categorize the subcellular localization patterns of a representative set of 61 lncRNAs in three different cell types. Our survey yields high-resolution quantification and stringent validation of the number and spatial positions of these lncRNA, with an mRNA set for comparison. Using this highly quantitative image-based dataset, we observe a variety of subcellular localization patterns, ranging from bright sub-nuclear foci to almost exclusively cytoplasmic localization. We also find that the low abundance of lncRNAs observed from cell population measurements cannot be explained by high expression in a small subset of ‘jackpot’ cells. Additionally, nuclear lncRNA foci dissolve during mitosis and become widely dispersed, suggesting these lncRNAs are not mitotic bookmarking factors. Moreover, we see that divergently transcribed lncRNAs do not always correlate with their cognate mRNA, nor do they have a characteristic localization pattern. Our systematic, high-resolution survey of lncRNA localization reveals aspects of lncRNAs that are similar to mRNAs, such as cell-to-cell variability, but also several distinct properties. These characteristics may correspond to particular functional roles. Our study also provides a quantitative description of lncRNAs at the single-cell level and a universally applicable framework for future study and validation of lncRNAs.

...read moreread less

Journal Article•DOI•

High-frequency, precise modification of the tomato genome

[...]

Tomas Cermak¹, Nicholas J. Baltes¹, Radim Cegan², Yong Zhang³, Daniel F. Voytas¹ - Show less +1 more•Institutions (3)

University of Minnesota¹, Academy of Sciences of the Czech Republic², University of Electronic Science and Technology of China³

06 Nov 2015-Genome Biology

TL;DR: High-frequency, precise modification of the tomato genome was achieved using geminivirus replicons, suggesting that these vectors can overcome the efficiency barrier that has made gene targeting in plants challenging.

...read moreread less

Abstract: The use of homologous recombination to precisely modify plant genomes has been challenging, due to the lack of efficient methods for delivering DNA repair templates to plant cells. Even with the advent of sequence-specific nucleases, which stimulate homologous recombination at predefined genomic sites by creating targeted DNA double-strand breaks, there are only a handful of studies that report precise editing of endogenous genes in crop plants. More efficient methods are needed to modify plant genomes through homologous recombination, ideally without randomly integrating foreign DNA. Here, we use geminivirus replicons to create heritable modifications to the tomato genome at frequencies tenfold higher than traditional methods of DNA delivery (i.e., Agrobacterium). A strong promoter was inserted upstream of a gene controlling anthocyanin biosynthesis, resulting in overexpression and ectopic accumulation of pigments in tomato tissues. More than two-thirds of the insertions were precise, and had no unanticipated sequence modifications. Both TALENs and CRISPR/Cas9 achieved gene targeting at similar efficiencies. Further, the targeted modification was transmitted to progeny in a Mendelian fashion. Even though donor molecules were replicated in the vectors, no evidence was found of persistent extra-chromosomal replicons or off-target integration of T-DNA or replicon sequences. High-frequency, precise modification of the tomato genome was achieved using geminivirus replicons, suggesting that these vectors can overcome the efficiency barrier that has made gene targeting in plants challenging. This work provides a foundation for efficient genome editing of crop genomes without the random integration of foreign DNA.

...read moreread less

Journal Article•DOI•

Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development

[...]

Linda Szabo¹, Robert Morey², Nathan J. Palpant³, Peter L. Wang¹, Nastaran Afari², Chuan Jiang², Mana M. Parast², Charles E. Murry³, Louise C. Laurent², Julia Salzman¹ - Show less +6 more•Institutions (3)

Stanford University¹, University of California, San Diego², University of Washington³

16 Jun 2015-Genome Biology

TL;DR: A new algorithm is presented that increases the sensitivity and specificity of circular RNA detection by discovering and quantifying circular and linear RNA splicing events at both annotated and un-annotated exon boundaries, including intergenic regions of the genome, with high statistical confidence.

...read moreread less

Abstract: Background: The pervasive expression of circular RNA is a recently discovered feature of gene expression in highly diverged eukaryotes, but the functions of most circular RNAs are still unknown. Computational methods to discover and quantify circular RNA are essential. Moreover, discovering biological contexts where circular RNAs are regulated will shed light on potential functional roles they may play. Results: We present a new algorithm that increases the sensitivity and specificity of circular RNA detection by discovering and quantifying circular and linear RNA splicing events at both annotated and un-annotated exon boundaries, including intergenic regions of the genome, with high statistical confidence. Unlike approaches that rely on read count and exon homology to determine confidence in prediction of circular RNA expression, our algorithm uses a statistical approach. Using our algorithm, we unveiled striking induction of general and tissue-specific circular RNAs, including in the heart and lung, during human fetal development. We discover regions of the human fetal brain, such as the frontal cortex, with marked enrichment for genes where circular RNA isoforms are dominant. Conclusions: The vast majority of circular RNA production occurs at major spliceosome splice sites; however, we find the first examples of developmentally induced circular RNAs processed by the minor spliceosome, and an enriched propensity of minor spliceosome donors to splice into circular RNA at un-annotated, rather than annotated, exons. Together, these results suggest a potentially significant role for circular RNA in human development.

...read moreread less

Journal Article•DOI•

The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) family.

[...]

Richard Kelwick¹, Ines Desanlis¹, Grant N. Wheeler¹, Dylan R. Edwards¹•Institutions (1)

University of East Anglia¹

30 May 2015-Genome Biology

TL;DR: Focusing primarily on the aggrecanases and proteoglycanases, a perspective on the evolution of the ADAMTS family, their links with developmental and disease mechanisms, and key questions for the future are provided.

...read moreread less

Abstract: The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) enzymes are secreted, multi-domain matrix-associated zinc metalloendopeptidases that have diverse roles in tissue morphogenesis and patho-physiological remodeling, in inflammation and in vascular biology. The human family includes 19 members that can be sub-grouped on the basis of their known substrates, namely the aggrecanases or proteoglycanases (ADAMTS1, 4, 5, 8, 9, 15 and 20), the procollagen N-propeptidases (ADAMTS2, 3 and 14), the cartilage oligomeric matrix protein-cleaving enzymes (ADAMTS7 and 12), the von-Willebrand Factor proteinase (ADAMTS13) and a group of orphan enzymes (ADAMTS6, 10, 16, 17, 18 and 19). Control of the structure and function of the extracellular matrix (ECM) is a central theme of the biology of the ADAMTS, as exemplified by the actions of the procollagen-N-propeptidases in collagen fibril assembly and of the aggrecanases in the cleavage or modification of ECM proteoglycans. Defects in certain family members give rise to inherited genetic disorders, while the aberrant expression or function of others is associated with arthritis, cancer and cardiovascular disease. In particular, ADAMTS4 and 5 have emerged as therapeutic targets in arthritis. Multiple ADAMTSs from different sub-groupings exert either positive or negative effects on tumorigenesis and metastasis, with both metalloproteinase-dependent and -independent actions known to occur. The basic ADAMTS structure comprises a metalloproteinase catalytic domain and a carboxy-terminal ancillary domain, the latter determining substrate specificity and the localization of the protease and its interaction partners; ancillary domains probably also have independent biological functions. Focusing primarily on the aggrecanases and proteoglycanases, this review provides a perspective on the evolution of the ADAMTS family, their links with developmental and disease mechanisms, and key questions for the future.

...read moreread less

Journal Article•DOI•

Induction of targeted, heritable mutations in barley and Brassica oleracea using RNA-guided Cas9 nuclease

[...]

Tom Lawrenson¹, Oluwaseyi Shorinola¹, Nicola Stacey¹, Chengdao Li², Lars Østergaard¹, Nicola J. Patron¹, Cristobal Uauy¹, Wendy Harwood¹ - Show less +4 more•Institutions (2)

Norwich Research Park¹, Murdoch University²

30 Nov 2015-Genome Biology

TL;DR: The use of RNA-guided Cas9 is demonstrated to generate mutations in target genes of both barley and B. oleracea and show stable transmission of these mutations thus establishing the potential for rapid characterisation of gene function in these species.

...read moreread less

Abstract: The RNA-guided Cas9 system represents a flexible approach for genome editing in plants. This method can create specific mutations that knock-out or alter target gene function. It provides a valuable tool for plant research and offers opportunities for crop improvement. We investigate the use and target specificity requirements of RNA-guided Cas9 genome editing in barley (Hordeum vulgare) and Brassica oleracea by targeting multicopy genes. In barley, we target two copies of HvPM19 and observe Cas9-induced mutations in the first generation of 23 % and 10 % of the lines, respectively. In B. oleracea, targeting of BolC.GA4.a leads to Cas9-induced mutations in 10 % of first generation plants screened. In addition, a phenotypic screen identifies T0 plants with the expected dwarf phenotype associated with knock-out of the target gene. In both barley and B. oleracea stable Cas9-induced mutations are transmitted to T2 plants independently of the T-DNA construct. We observe off-target activity in both species, despite the presence of at least one mismatch between the single guide RNA and the non-target gene sequences. In barley, a transgene-free plant has concurrent mutations in the target and non-target copies of HvPM19. We demonstrate the use of RNA-guided Cas9 to generate mutations in target genes of both barley and B. oleracea and show stable transmission of these mutations thus establishing the potential for rapid characterisation of gene function in these species. In addition, the off-target effects reported offer both potential difficulties and specific opportunities to target members of multigene families in crops.

...read moreread less

Journal Article•DOI•

Characterization of the immunophenotypes and antigenomes of colorectal cancers reveals distinct tumor escape mechanisms and novel targets for immunotherapy

[...]

Mihaela Angelova¹, Pornpimol Charoentong¹, Hubert Hackl¹, Maria Fischer¹, Rene Snajder¹, Anne Krogsdam¹, Maximilian J. Waldner², Gabriela Bindea³, Gabriela Bindea⁴, Bernhard Mlecnik⁴, Bernhard Mlecnik³, Jérôme Galon³, Jérôme Galon⁴, Zlatko Trajanoski¹ - Show less +10 more•Institutions (4)

Innsbruck Medical University¹, University of Erlangen-Nuremberg², French Institute of Health and Medical Research³, Pierre-and-Marie-Curie University⁴

31 Mar 2015-Genome Biology

TL;DR: The immunophenotypes of the tumors and the cancer antigenome remain widely unexplored, and the findings represent a step toward the development of personalized cancer immunotherapies.

...read moreread less

Abstract: Background: While large-scale cancer genomic projects are comprehensively characterizing the mutational spectrum of various cancers, so far little attention has been devoted to either define the antigenicity of these mutations or to characterize the immune responses they elicit. Here we present a strategy to characterize the immunophenotypes and the antigen-ome of human colorectal cancer. Results: We apply our strategy to a large colorectal cancer cohort (n = 598) and show that subpopulations of tumor-infiltrating lymphocytes are associated with distinct molecular phenotypes. The characterization of the antigenome shows that a large number of cancer-germline antigens are expressed in all patients. In contrast, neo-antigens are rarely shared between patients, indicating that cancer vaccination requires individualized strategy. Analysis of the genetic basis of the tumors reveals distinct tumor escape mechanisms for the patient subgroups. Hypermutated tumors are depleted of immunosuppressive cells and show upregulation of immunoinhibitory molecules. Non-hypermutated tumors are enriched with immunosuppressive cells, and the expression of immunoinhibitors and MHC molecules is downregulated. Reconstruction of the interaction network of tumor-infiltrating lymphocytes and immunomodulatory molecules followed by a validation with 11 independent cohorts (n = 1,945) identifies BCMA as a novel druggable target. Finally, linear regression modeling identifies major determinants of tumor immunogenicity, which include well-characterized modulators as well as a novel candidate, CCR8, which is then tested in an orthologous immunodeficient mouse model. Conclusions: The immunophenotypes of the tumors and the cancer antigenome remain widely unexplored, and our findings represent a step toward the development of personalized cancer immunotherapies.

...read moreread less

Journal Article•DOI•

PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors

[...]

Amit G. Deshwar¹, Shankar Vembu¹, Christina K. Yung², Gun Ho Jang², Lincoln Stein², Lincoln Stein¹, Quaid Morris - Show less +3 more•Institutions (2)

University of Toronto¹, Ontario Institute for Cancer Research²

13 Feb 2015-Genome Biology

TL;DR: A principled phylogenic correction for VAFs in loci affected by copy number alterations is introduced and it is shown that this correction greatly improves subclonal reconstruction compared to existing methods.

...read moreread less

Abstract: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

...read moreread less

Journal Article•DOI•

Spatio-temporal regulation of circular RNA expression during porcine embryonic brain development

[...]

Morten T. Venø¹, Thomas B. Hansen¹, Susanne T. Venø¹, Bettina Hjelm Clausen², Manuela Grebing², Bente Finsen², Ida Elisabeth Holm, Jørgen Kjems¹ - Show less +4 more•Institutions (2)

Aarhus University¹, University of Southern Denmark²

05 Nov 2015-Genome Biology

TL;DR: It is demonstrated that circRNAs are highly abundant and dynamically expressed in a spatio-temporal manner in porcine fetal brain, suggesting important functions during mammalian brain development.

...read moreread less

Abstract: Recently, thousands of circular RNAs (circRNAs) have been discovered in various tissues and cell types from human, mouse, fruit fly and nematodes. However, expression of circRNAs across mammalian brain development has never been examined. Here we profile the expression of circRNA in five brain tissues at up to six time-points during fetal porcine development, constituting the first report of circRNA in the brain development of a large animal. An unbiased analysis reveals a highly complex regulation pattern of thousands of circular RNAs, with a distinct spatio-temporal expression profile. The amount and complexity of circRNA expression was most pronounced in cortex at day 60 of gestation. At this time-point we find 4634 unique circRNAs expressed from 2195 genes out of a total of 13,854 expressed genes. Approximately 20 % of the porcine splice sites involved in circRNA production are functionally conserved between mouse and human. Furthermore, we observe that “hot-spot” genes produce multiple circRNA isoforms, which are often differentially expressed across porcine brain development. A global comparison of porcine circRNAs reveals that introns flanking circularized exons are longer than average and more frequently contain proximal complementary SINEs, which potentially can facilitate base pairing between the flanking introns. Finally, we report the first use of RNase R treatment in combination with in situ hybridization to show dynamic subcellular localization of circRNA during development. These data demonstrate that circRNAs are highly abundant and dynamically expressed in a spatio-temporal manner in porcine fetal brain, suggesting important functions during mammalian brain development.

...read moreread less

Journal Article•DOI•

The Ensembl Regulatory Build

[...]

Daniel R. Zerbino¹, Steven P. Wilder¹, Nathan Johnson¹, Thomas Juettemann¹, Paul Flicek¹ - Show less +1 more•Institutions (1)

European Bioinformatics Institute¹

24 Mar 2015-Genome Biology

TL;DR: This work collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome, which is then verified against independent assays for sensitivity.

...read moreread less

Abstract: Most genomic variants associated with phenotypic traits or disease do not fall within gene coding regions, but in regulatory regions, rendering their interpretation difficult. We collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome. We verified it against independent assays for sensitivity. The Ensembl Regulatory Build will be progressively enriched when more data is made available. It is freely available on the Ensembl browser, from the Ensembl Regulation MySQL database server and in a dedicated track hub.

...read moreread less

Journal Article•DOI•

CRISPR/Cas9-mediated viral interference in plants

[...]

Zahir Ali¹, Aala A. Abulfaraj¹, Ali M. Idris¹, Shawkat Ali¹, Manal Tashkandi¹, Magdy M. Mahfouz¹ - Show less +2 more•Institutions (1)

King Abdullah University of Science and Technology¹

11 Nov 2015-Genome Biology

TL;DR: In this paper, the CRISPR/Cas9 system was used in plants to confer molecular immunity against DNA viruses, including the tomato yellow leaf curl virus (TYLCV).

...read moreread less

Abstract: The CRISPR/Cas9 system provides bacteria and archaea with molecular immunity against invading phages and conjugative plasmids. Recently, CRISPR/Cas9 has been used for targeted genome editing in diverse eukaryotic species. In this study, we investigate whether the CRISPR/Cas9 system could be used in plants to confer molecular immunity against DNA viruses. We deliver sgRNAs specific for coding and non-coding sequences of tomato yellow leaf curl virus (TYLCV) into Nicotiana benthamiana plants stably overexpressing the Cas9 endonuclease, and subsequently challenge these plants with TYLCV. Our data demonstrate that the CRISPR/Cas9 system targeted TYLCV for degradation and introduced mutations at the target sequences. All tested sgRNAs exhibit interference activity, but those targeting the stem-loop sequence within the TYLCV origin of replication in the intergenic region (IR) are the most effective. N. benthamiana plants expressing CRISPR/Cas9 exhibit delayed or reduced accumulation of viral DNA, abolishing or significantly attenuating symptoms of infection. Moreover, this system could simultaneously target multiple DNA viruses. These data establish the efficacy of the CRISPR/Cas9 system for viral interference in plants, thereby extending the utility of this technology and opening the possibility of producing plants resistant to multiple viral infections.

...read moreread less

Journal Article•DOI•

The genomes of two key bumblebee species with primitive eusocial organization

[...]

Ben M. Sadd¹, Ben M. Sadd², Seth M. Barribeau¹, Seth M. Barribeau³ +151 more•Institutions (51)

24 Apr 2015-Genome Biology

TL;DR: Overall, gene repertoires suggest that the route to advanced eusociality in bees was mediated by many small changes in many genes and processes, and not by notable expansion or depauperation.

...read moreread less

Abstract: The shift from solitary to social behavior is one of the major evolutionary transitions Primitively eusocial bumblebees are uniquely placed to illuminate the evolution of highly eusocial insect societies Bumblebees are also invaluable natural and agricultural pollinators, and there is widespread concern over recent population declines in some species High-quality genomic data will inform key aspects of bumblebee biology, including susceptibility to implicated population viability threats We report the high quality draft genome sequences of Bombus terrestris and Bombus impatiens, two ecologically dominant bumblebees and widely utilized study species Comparing these new genomes to those of the highly eusocial honeybee Apis mellifera and other Hymenoptera, we identify deeply conserved similarities, as well as novelties key to the biology of these organisms Some honeybee genome features thought to underpin advanced eusociality are also present in bumblebees, indicating an earlier evolution in the bee lineage Xenobiotic detoxification and immune genes are similarly depauperate in bumblebees and honeybees, and multiple categories of genes linked to social organization, including development and behavior, show high conservation Key differences identified include a bias in bumblebee chemoreception towards gustation from olfaction, and striking differences in microRNAs, potentially responsible for gene regulation underlying social and other traits These two bumblebee genomes provide a foundation for post-genomic research on these key pollinators and insect societies Overall, gene repertoires suggest that the route to advanced eusociality in bees was mediated by many small changes in many genes and processes, and not by notable expansion or depauperation

...read moreread less

Journal Article•DOI•

ALLMAPS: robust scaffold ordering based on multiple maps

[...]

Haibao Tang¹, Haibao Tang², Xingtan Zhang³, Chenyong Miao¹, Jisen Zhang¹, Ray Ming¹, James C. Schnable⁴, Patrick S. Schnable⁵, Eric Lyons², Jianguo Lu - Show less +6 more•Institutions (5)

Fujian Agriculture and Forestry University¹, University of Arizona², J. Craig Venter Institute³, University of Nebraska–Lincoln⁴, Iowa State University⁵

13 Jan 2015-Genome Biology

TL;DR: AllMAPS is a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps, which is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps.

...read moreread less

Abstract: The ordering and orientation of genomic scaffolds to reconstruct chromosomes is an essential step during de novo genome assembly. Because this process utilizes various mapping techniques that each provides an independent line of evidence, a combination of multiple maps can improve the accuracy of the resulting chromosomal assemblies. We present ALLMAPS, a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps. ALLMAPS is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps. ALLMAPS is a useful tool in building high-quality genome assemblies. ALLMAPS is available at: https://github.com/tanghaibao/jcvi/wiki/ALLMAPS.

...read moreread less

Journal Article•DOI•

Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos

[...]

Xiaoying Fan¹, Xiannian Zhang¹, Xinglong Wu¹, Hongshan Guo¹, Yuqiong Hu¹, Fuchou Tang, Yanyi Huang¹ - Show less +3 more•Institutions (1)

Peking University¹

23 Jul 2015-Genome Biology

TL;DR: A single-cell universal poly(A)-independent RNA sequencing (SUPeR-seq) method to sequence both polyadenylated and non-polyadenylation RNAs from individual cells, which is key to deciphering regulation mechanisms of circRNAs during mammalian early embryonic development.

...read moreread less

Abstract: Circular RNAs (circRNAs) are a new class of non-polyadenylated non-coding RNAs that may play important roles in many biological processes. Here we develop a single-cell universal poly(A)-independent RNA sequencing (SUPeR-seq) method to sequence both polyadenylated and non-polyadenylated RNAs from individual cells. This method exhibits robust sensitivity, precision and accuracy. We discover 2891 circRNAs and 913 novel linear transcripts in mouse preimplantation embryos and further analyze the abundance of circRNAs along development, the function of enriched genes, and sequence features of circRNAs. Our work is key to deciphering regulation mechanisms of circRNAs during mammalian early embryonic development.

...read moreread less

Journal Article•DOI•

Lifetime stress accelerates epigenetic aging in an urban, African American cohort: Relevance of glucocorticoid signaling

[...]

Anthony S. Zannas¹, Anthony S. Zannas², Janinez Arloth¹, Tania Carrillo-Roa¹, Stella Iurato¹, Simone Röh¹, Kerry J. Ressler³, Kerry J. Ressler⁴, Charles B. Nemeroff⁵, Alicia K. Smith⁴, Bekh Bradley⁴, Bekh Bradley⁶, Christine Heim⁷, Christine Heim⁸, Andreas Menke¹, Andreas Menke⁹, Jennifer Lange¹, Tanja Brückl¹, Marcus Ising¹, Naomi R. Wray¹⁰, Angelika Erhardt¹, Elisabeth B. Binder⁴, Elisabeth B. Binder¹, Divya Mehta¹⁰ - Show less +20 more•Institutions (10)

Max Planck Society¹, Duke University², Howard Hughes Medical Institute³, Emory University⁴, University of Miami⁵, Veterans Health Administration⁶, Pennsylvania State University⁷, Charité⁸, University of Würzburg⁹, University of Queensland¹⁰

17 Dec 2015-Genome Biology

TL;DR: Cumulative lifetime stress may accelerate epigenetic aging, an effect that could be driven by glucocorticoid-induced epigenetic changes, which contribute to the understanding of mechanisms linking chronic stress with accelerated aging and heightened disease risk.

...read moreread less

Abstract: Background: Chronic psychological stress is associated with accelerated aging and increased risk for aging-related diseases, but the underlying molecular mechanisms are unclear. Results: We examined the effect of lifetime stressors on a DNA methylation-based age predictor, epigenetic clock. After controlling for blood cell-type composition and lifestyle parameters, cumulative lifetime stress, but not childhood maltreatment or current stress alone, predicted accelerated epigenetic aging in an urban, African American cohort (n = 392). This effect was primarily driven by personal life stressors, was more pronounced with advancing age, and was blunted in individuals with higher childhood abuse exposure. Hypothesizing that these epigenetic effects could be mediated by glucocorticoid signaling, we found that a high number (n = 85) of epigenetic clock CpG sites were located within glucocorticoid response elements. We further examined the functional effects of glucocorticoids on epigenetic clock CpGs in an independent sample with genome-wide DNA methylation (n = 124) and gene expression data (n = 297) before and after exposure to the glucocorticoid receptor agonist dexamethasone. Dexamethasone induced dynamic changes in methylation in 31.2 % (110/353) of these CpGs and transcription in 81.7 % (139/170) of genes neighboring epigenetic clock CpGs. Disease enrichment analysis of these dexamethasone-regulated genes showed enriched association for aging-related diseases, including coronary artery disease, arteriosclerosis, and leukemias. Conclusions: Cumulative lifetime stress may accelerate epigenetic aging, an effect that could be driven by glucocorticoid-induced epigenetic changes. These findings contribute to our understanding of mechanisms linking chronic stress with accelerated aging and heightened disease risk.

...read moreread less

Journal Article•DOI•

Tools and best practices for data processing in allelic expression analysis

[...]

Stephane E. Castel¹, Ami Levy-Moonshine², Pejman Mohammadi¹, Eric Banks², Tuuli Lappalainen¹ - Show less +1 more•Institutions (2)

Columbia University¹, Broad Institute²

17 Sep 2015-Genome Biology

TL;DR: This work analyzes the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth.

...read moreread less

Abstract: Allelic expression analysis has become important for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. We analyze the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting such errors, show that our quality control measures improve the detection of relevant allelic expression, and introduce tools for the high-throughput production of allelic expression data from RNA-sequencing data.

...read moreread less

Journal Article•DOI•

Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

[...]

Wenqian Zhang, Ying Yu¹, Falk Hertwig², Falk Hertwig³, Jean Thierry-Mieg, Wenwei Zhang, Danielle Thierry-Mieg, Jian Wang⁴, Cesare Furlanello⁵, Viswanath Devanarayan⁶, Jie Cheng⁷, Youping Deng⁸, Barbara Hero³, Huixiao Hong⁹, Meiwen Jia¹, Li Li¹⁰, Simon Lin¹¹, Yuri Nikolsky¹², André Oberthuer³, Tao Qing¹, Zhenqiang Su⁹, Ruth Volland³, Charles Wang¹³, May D. Wang¹⁴, Junmei Ai⁸, Davide Albanese, Shahab Asgharzadeh¹⁵, Smadar Avigad, Wenjun Bao¹⁰, Marina Bessarabova¹², Murray H. Brilliant¹⁶, Benedikt Brors¹⁷, Marco Chierici⁵, Tzu-Ming Chu¹⁰, Jibin Zhang, Richard Grundy¹⁸, Min Max He¹¹, Scott J. Hebbring¹⁶, Howard L. Kaufman⁸, Samir Lababidi¹⁹, Lee Lancashire¹², Yan Li⁸, Xin X. Lu⁶, Heng Luo⁹, Heng Luo²⁰, Xiwen Ma⁴, Baitang Ning⁹, Rosa Noguera²¹, Martin Peifer², John H. Phan¹⁴, Frederik Roels², Frederik Roels³, Carolina Rosswog³, Susan Shao¹⁰, Jie Shen⁹, Jessica Theissen³, Gian Paolo Tonini²², Jo Vandesompele²³, Po-Yen Wu²⁴, Wenzhong Xiao²⁵, Joshua Xu⁹, Weihong Xu²⁶, Jiekun Xuan⁹, Yong Yang⁴, Zhan Ye¹¹, Zirui Dong, Ke Zhang²⁷, Ye Yin, Chen Zhao¹, Yuanting Zheng¹, Russell D. Wolfinger¹⁰, Tieliu Shi²⁸, Linda H. Malkas²⁹, Frank Berthold², Frank Berthold³, Jun Wang, Weida Tong⁹, Leming Shi⁹, Leming Shi¹, Zhiyu Peng³⁰, Matthias Fischer², Matthias Fischer³ - Show less +78 more•Institutions (30)

Fudan University¹, University of Cologne², Boston Children's Hospital³, Eli Lilly and Company⁴, fondazione bruno kessler⁵, AbbVie⁶, GlaxoSmithKline⁷, Rush University Medical Center⁸, Food and Drug Administration⁹, SAS Institute¹⁰, Marshfield Clinic¹¹, Thomson Reuters¹², Loma Linda University¹³, Emory University¹⁴, Children's Hospital Los Angeles¹⁵, Foundation Center¹⁶, German Cancer Research Center¹⁷, University of Nottingham¹⁸, Center for Biologics Evaluation and Research¹⁹, University of Arkansas at Little Rock²⁰, University of Valencia²¹, University of Padua²², Ghent University²³, Georgia Institute of Technology²⁴, Harvard University²⁵, Stanford University²⁶, University of North Dakota²⁷, East China Normal University²⁸, Beckman Research Institute²⁹, Guangzhou Higher Education Mega Center³⁰

25 Jun 2015-Genome Biology

TL;DR: It is demonstrated thatRNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction.

...read moreread less

Abstract: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

...read moreread less

Journal Article•DOI•

Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR

[...]

Wei Li¹, Johannes Köster¹, Han Xu², Chen-Hao Chen¹, Tengfei Xiao¹, Jun Liu¹, Myles Brown¹, Myles Brown³, X. Shirley Liu⁴, X. Shirley Liu¹ - Show less +6 more•Institutions (4)

Harvard University¹, Broad Institute², Brigham and Women's Hospital³, Tongji University⁴

16 Dec 2015-Genome Biology

TL;DR: MAGeCK-VISPR defines a set of QC measures to assess the quality of an experiment, and includes a maximum-likelihood algorithm to call essential genes simultaneously under multiple conditions to iteratively estimate sgRNA knockout efficiency and gene essentiality.

...read moreread less

Abstract: High-throughput CRISPR screens have shown great promise in functional genomics. We present MAGeCK-VISPR, a comprehensive quality control (QC), analysis, and visualization workflow for CRISPR screens. MAGeCK-VISPR defines a set of QC measures to assess the quality of an experiment, and includes a maximum-likelihood algorithm to call essential genes simultaneously under multiple conditions. The algorithm uses a generalized linear model to deconvolute different effects, and employs expectation-maximization to iteratively estimate sgRNA knockout efficiency and gene essentiality. MAGeCK-VISPR also includes VISPR, a framework for the interactive visualization and exploration of QC and analysis results. MAGeCK-VISPR is freely available at http://bitbucket.org/liulab/mageck-vispr .

...read moreread less

Journal Article•DOI•

Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency.

[...]

Ying Dang¹, Gengxiang Jia¹, Jennie Choi¹, Hongming Ma¹, Edgar Anaya¹, Chunting Ye¹, Premlata Shankar¹, Haoquan Wu¹ - Show less +4 more•Institutions (1)

Texas Tech University Health Sciences Center at El Paso¹

15 Dec 2015-Genome Biology

TL;DR: A systematic investigation of sgRNA structure finds that extending the duplex by approximately 5 bp combined with mutating the continuous sequence of thymines at position 4 to cytosine or guanine significantly increases gene knockout efficiency in CRISPR-Cas9-based genome editing experiments.

...read moreread less

Abstract: Single-guide RNA (sgRNA) is one of the two key components of the clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 genome-editing system. The current commonly used sgRNA structure has a shortened duplex compared with the native bacterial CRISPR RNA (crRNA)–transactivating crRNA (tracrRNA) duplex and contains a continuous sequence of thymines, which is the pause signal for RNA polymerase III and thus could potentially reduce transcription efficiency. Here, we systematically investigate the effect of these two elements on knockout efficiency and showed that modifying the sgRNA structure by extending the duplex length and mutating the fourth thymine of the continuous sequence of thymines to cytosine or guanine significantly, and sometimes dramatically, improves knockout efficiency in cells. In addition, the optimized sgRNA structure also significantly increases the efficiency of more challenging genome-editing procedures, such as gene deletion, which is important for inducing a loss of function in non-coding genes. By a systematic investigation of sgRNA structure we find that extending the duplex by approximately 5 bp combined with mutating the continuous sequence of thymines at position 4 to cytosine or guanine significantly increases gene knockout efficiency in CRISPR-Cas9-based genome editing experiments.

...read moreread less

Journal Article•DOI•

Fate by RNA methylation: m6A steers stem cell pluripotency.

[...]

Boxuan Simen Zhao¹, Boxuan Simen Zhao², Chuan He¹, Chuan He²•Institutions (2)

University of Chicago¹, Howard Hughes Medical Institute²

22 Feb 2015-Genome Biology

TL;DR: The N6-methyladenosine (m6A) modification of mRNA has a crucial function in regulating pluripotency in murine stem cells: it facilitates resolution of naive pluripOTency towards differentiation.

...read moreread less

Abstract: The N6-methyladenosine (m6A) modification of mRNA has a crucial function in regulating pluripotency in murine stem cells: it facilitates resolution of naive pluripotency towards differentiation.

...read moreread less

Collapse