scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2018"


Journal ArticleDOI
TL;DR: This work presents Scanpy, a scalable toolkit for analyzing single-cell gene expression data that includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks, and AnnData, a generic class for handling annotated data matrices.
Abstract: Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells ( https://github.com/theislab/Scanpy ). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices ( https://github.com/theislab/anndata ).

3,343 citations


Journal ArticleDOI
TL;DR: The data indicate that CRISPR/Cas13a can be used for engineering interference againstRNA viruses, providing a potential novel mechanism for RNA-guided immunity against RNA viruses and for other RNA manipulations in plants.
Abstract: CRISPR/Cas systems confer immunity against invading nucleic acids and phages in bacteria and archaea. CRISPR/Cas13a (known previously as C2c2) is a class 2 type VI-A ribonuclease capable of targeting and cleaving single-stranded RNA (ssRNA) molecules of the phage genome. Here, we employ CRISPR/Cas13a to engineer interference with an RNA virus, Turnip Mosaic Virus (TuMV), in plants. CRISPR/Cas13a produces interference against green fluorescent protein (GFP)-expressing TuMV in transient assays and stable overexpression lines of Nicotiana benthamiana. CRISPR RNA (crRNAs) targeting the HC-Pro and GFP sequences exhibit better interference than those targeting other regions such as coat protein (CP) sequence. Cas13a can also process pre-crRNAs into functional crRNAs. Our data indicate that CRISPR/Cas13a can be used for engineering interference against RNA viruses, providing a potential novel mechanism for RNA-guided immunity against RNA viruses and for other RNA manipulations in plants.

771 citations


Journal ArticleDOI
TL;DR: The fundamental properties of TEs and their complex interactions with their cellular environment are introduced, which are crucial to understanding their impact and manifold consequences for organismal biology.
Abstract: Transposable elements (TEs) are major components of eukaryotic genomes. However, the extent of their impact on genome evolution, function, and disease remain a matter of intense interrogation. The rise of genomics and large-scale functional assays has shed new light on the multi-faceted activities of TEs and implies that they should no longer be marginalized. Here, we introduce the fundamental properties of TEs and their complex interactions with their cellular environment, which are crucial to understanding their impact and manifold consequences for organismal biology. While we draw examples primarily from mammalian systems, the core concepts outlined here are relevant to a broad range of organisms.

691 citations


Journal ArticleDOI
TL;DR: Cell Hashing is introduced, where oligo-tagged antibodies against ubiquitously expressed surface proteins uniquely label cells from distinct samples, which can be subsequently pooled and can robustly identify cross-sample multiplets.
Abstract: Despite rapid developments in single cell sequencing, sample-specific batch effects, detection of cell multiplets, and experimental costs remain outstanding challenges. Here, we introduce Cell Hashing, where oligo-tagged antibodies against ubiquitously expressed surface proteins uniquely label cells from distinct samples, which can be subsequently pooled. By sequencing these tags alongside the cellular transcriptome, we can assign each cell to its original sample, robustly identify cross-sample multiplets, and “super-load” commercial droplet-based systems for significant cost reduction. We validate our approach using a complementary genetic approach and demonstrate how hashing can generalize the benefits of single cell multiplexing to diverse samples and experimental designs.

608 citations


Journal ArticleDOI
TL;DR: HiGlass is presented, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others.
Abstract: We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.

569 citations


Journal ArticleDOI
TL;DR: This work develops Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) and validate its ability to identify distinct human cancer cell types based on their proteomes and uses it to quantify over a thousand proteins in differentiating mouse embryonic stem cells.
Abstract: Some exciting biological questions require quantifying thousands of proteins in single cells. To achieve this goal, we develop Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) and validate its ability to identify distinct human cancer cell types based on their proteomes. We use SCoPE-MS to quantify over a thousand proteins in differentiating mouse embryonic stem cells. The single-cell proteomes enable us to deconstruct cell populations and infer protein abundance relationships. Comparison between single-cell proteomes and transcriptomes indicates coordinated mRNA and protein covariation, yet many genes exhibit functionally concerted and distinct regulatory patterns at the mRNA and the protein level.

508 citations


Journal ArticleDOI
TL;DR: Computational approaches determining the nanopore sequencing error rate are reviewed, and strategies for translation of raw sequencing data into base calls for detection of base modifications and for obtaining consensus sequences are outlined.
Abstract: Nanopore sequencing is a rapidly maturing technology delivering long reads in real time on a portable instrument at low cost. Not surprisingly, the community has rapidly taken up this new way of sequencing and has used it successfully for a variety of research applications. A major limitation of nanopore sequencing is its high error rate, which despite recent improvements to the nanopore chemistry and computational tools still ranges between 5% and 15%. Here, we review computational approaches determining the nanopore sequencing error rate. Furthermore, we outline strategies for translation of raw sequencing data into base calls for detection of base modifications and for obtaining consensus sequences.

451 citations


Journal ArticleDOI
TL;DR: The 3D Genome Browser is introduced, which provides multiple methods linking distal cis-regulatory elements with their potential target genes and a new binary data format for Hi-C data that reduces the file size by at least a magnitude and allows users to visualize chromatin interactions over millions of base pairs within seconds.
Abstract: Here, we introduce the 3D Genome Browser, http://3dgenome.org , which allows users to conveniently explore both their own and over 300 publicly available chromatin interaction data of different types. We design a new binary data format for Hi-C data that reduces the file size by at least a magnitude and allows users to visualize chromatin interactions over millions of base pairs within seconds. Our browser provides multiple methods linking distal cis-regulatory elements with their potential target genes. Users can seamlessly integrate thousands of other omics data to gain a comprehensive view of both regulatory landscape and 3D genome structure.

390 citations


Journal ArticleDOI
TL;DR: A new plant adenine base editor based on an evolved tRNA adenosine deaminase fused to the nickase CRISPR/Cas9 is described, enabling A•T to G•C conversion at frequencies up to 7.5% in protoplasts and 59.1% in regenerated rice and wheat plants.
Abstract: Nucleotide base editors in plants have been limited to conversion of cytosine to thymine. Here, we describe a new plant adenine base editor based on an evolved tRNA adenosine deaminase fused to the nickase CRISPR/Cas9, enabling A•T to G•C conversion at frequencies up to 7.5% in protoplasts and 59.1% in regenerated rice and wheat plants. An endogenous gene is also successfully modified through introducing a gain-of-function point mutation to directly produce an herbicide-tolerant rice plant. With this new adenine base editing system, it is now possible to precisely edit all base pairs, thus expanding the toolset for precise editing in plants.

343 citations


Journal ArticleDOI
TL;DR: This work uses SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.
Abstract: Despite the many approaches to study differential splicing from RNA-seq, many challenges remain unsolved, including computing capacity and sequencing depth requirements. Here we present SUPPA2, a new method that addresses these challenges, and enables streamlined analysis across multiple conditions taking into account biological variability. Using experimental and simulated data, we show that SUPPA2 achieves higher accuracy compared to other methods, especially at low sequencing depth and short read length. We use SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.

328 citations


Journal ArticleDOI
TL;DR: Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources.
Abstract: SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. SKESA has been used for assembling over 272,000 read sets in the Sequence Read Archive at NCBI and for real-time pathogen detection. Source code for SKESA is freely available at https://github.com/ncbi/SKESA/releases .

Journal ArticleDOI
TL;DR: DeepCRISPR is presented, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools.
Abstract: A major challenge for effective application of CRISPR systems is to accurately predict the single guide RNA (sgRNA) on-target knockout efficacy and off-target profile, which would facilitate the optimized design of sgRNAs with high sensitivity and specificity. Here we present DeepCRISPR, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools. In addition, DeepCRISPR fully automates the identification of sequence and epigenetic features that may affect sgRNA knockout efficacy in a data-driven manner. DeepCRISPR is available at http://www.deepcrispr.net/ .

Journal ArticleDOI
TL;DR: Data suggest that FECR1 circular RNA acts as an upstream regulator to control breast cancer tumor growth by coordinating the regulation of DNA methylating and demethylating enzymes.
Abstract: Friend leukemia virus integration 1 (FLI1), an ETS transcription factor family member, acts as an oncogenic driver in hematological malignancies and promotes tumor growth in solid tumors. However, little is known about the mechanisms underlying the activation of this proto-oncogene in tumors. Immunohistochemical staining showed that FLI1 is aberrantly overexpressed in advanced stage and metastatic breast cancers. Using a CRISPR Cas9-guided immunoprecipitation assay, we identify a circular RNA in the FLI1 promoter chromatin complex, consisting of FLI1 exons 4-2-3, referred to as FECR1.Overexpression of FECR1 enhances invasiveness of MDA-MB231 breast cancer cells. Notably, FECR1 utilizes a positive feedback mechanism to activate FLI1 by inducing DNA hypomethylation in CpG islands of the promoter. FECR1 binds to the FLI1 promoter in cis and recruits TET1, a demethylase that is actively involved in DNA demethylation. FECR1 also binds to and downregulates in trans DNMT1, a methyltransferase that is essential for the maintenance of DNA methylation. These data suggest that FECR1 circular RNA acts as an upstream regulator to control breast cancer tumor growth by coordinating the regulation of DNA methylating and demethylating enzymes. Thus, FLI1 drives tumor metastasis not only through the canonical oncoprotein pathway, but also by using epigenetic mechanisms mediated by its exonic circular RNA.

Journal ArticleDOI
TL;DR: The current applications of genome editing in plants are described, focusing on its potential for crop improvement in terms of adaptation, resilience, and end-use, and novel breakthroughs that are extending the potential of genome-edited crops and the possibilities of their commercialization are reviewed.
Abstract: Genome-editing tools provide advanced biotechnological techniques that enable the precise and efficient targeted modification of an organism’s genome. Genome-editing systems have been utilized in a wide variety of plant species to characterize gene functions and improve agricultural traits. We describe the current applications of genome editing in plants, focusing on its potential for crop improvement in terms of adaptation, resilience, and end-use. In addition, we review novel breakthroughs that are extending the potential of genome-edited crops and the possibilities of their commercialization. Future prospects for integrating this revolutionary technology with conventional and new-age crop breeding strategies are also discussed.

Journal ArticleDOI
TL;DR: The sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project are assembled to create a new catalog of human genes and transcripts, called CHESS, revealing a heretofore unappreciated amount of transcriptional noise in human cells.
Abstract: We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .

Journal ArticleDOI
TL;DR: KrakenUniq is a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset by using the probabilistic cardinality estimator HyperLogLog.
Abstract: False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq .

Journal ArticleDOI
TL;DR: It is demonstrated that neural stem/progenitor cell (NSPC) self-renewal and spatiotemporal generation of neurons and other cell types are severely impacted by the loss of Ythdf2 in embryonic neocortex.
Abstract: N 6 -methyladenosine (m6A) modification in mRNAs was recently shown to be dynamically regulated, indicating a pivotal role in multiple developmental processes. Most recently, it was shown that the Mettl3-Mettl14 writer complex of this mark is required for the temporal control of cortical neurogenesis. The m6A reader protein Ythdf2 promotes mRNA degradation by recognizing m6A and recruiting the mRNA decay machinery. We show that the conditional depletion of the m6A reader protein Ythdf2 in mice causes lethality at late embryonic developmental stages, with embryos characterized by compromised neural development. We demonstrate that neural stem/progenitor cell (NSPC) self-renewal and spatiotemporal generation of neurons and other cell types are severely impacted by the loss of Ythdf2 in embryonic neocortex. Combining in vivo and in vitro assays, we show that the proliferation and differentiation capabilities of NSPCs decrease significantly in Ythdf2 −/− embryos. The Ythdf2 −/− neurons are unable to produce normally functioning neurites, leading to failure in recovery upon reactive oxygen species stimulation. Consistently, expression of genes enriched in neural development pathways is significantly disturbed. Detailed analysis of the m6A-methylomes of Ythdf2 −/− NSPCs identifies that the JAK-STAT cascade inhibitory genes contribute to neuroprotection and neurite outgrowths show increased expression and m6A enrichment. In agreement with the function of Ythdf2, delayed degradation of neuron differentiation-related m6A-containing mRNAs is seen in Ythdf2 −/− NSPCs. We show that the m6A reader protein Ythdf2 modulates neural development by promoting m6A-dependent degradation of neural development-related mRNA targets.

Journal ArticleDOI
TL;DR: This comprehensive study demonstrates a lower alpha diversity in normal lung as compared to non-tumor adjacent or tumor tissue, and shows both microbiome-gene and microbiome-exposure interactions in squamous cell carcinoma lung cancer tissue.
Abstract: Lung cancer is the leading cancer diagnosis worldwide and the number one cause of cancer deaths. Exposure to cigarette smoke, the primary risk factor in lung cancer, reduces epithelial barrier integrity and increases susceptibility to infections. Herein, we hypothesize that somatic mutations together with cigarette smoke generate a dysbiotic microbiota that is associated with lung carcinogenesis. Using lung tissue from 33 controls and 143 cancer cases, we conduct 16S ribosomal RNA (rRNA) bacterial gene sequencing, with RNA-sequencing data from lung cancer cases in The Cancer Genome Atlas serving as the validation cohort. Overall, we demonstrate a lower alpha diversity in normal lung as compared to non-tumor adjacent or tumor tissue. In squamous cell carcinoma specifically, a separate group of taxa are identified, in which Acidovorax is enriched in smokers. Acidovorax temporans is identified within tumor sections by fluorescent in situ hybridization and confirmed by two separate 16S rRNA strategies. Further, these taxa, including Acidovorax, exhibit higher abundance among the subset of squamous cell carcinoma cases with TP53 mutations, an association not seen in adenocarcinomas. The results of this comprehensive study show both microbiome-gene and microbiome-exposure interactions in squamous cell carcinoma lung cancer tissue. Specifically, tumors harboring TP53 mutations, which can impair epithelial function, have a unique bacterial consortium that is higher in relative abundance in smoking-associated tumors of this type. Given the significant need for clinical diagnostic tools in lung cancer, this study may provide novel biomarkers for early detection.

Journal ArticleDOI
TL;DR: Three approaches to select reference libraries for deconvoluting neutrophil, monocyte, B-lymphocyte, natural killer, and CD4+ and CD8+ T-cell fractions based on blood-derived DNA methylation signatures assayed using the Illumina HumanMethylationEPIC array are compared.
Abstract: Genome-wide methylation arrays are powerful tools for assessing cell composition of complex mixtures. We compare three approaches to select reference libraries for deconvoluting neutrophil, monocyte, B-lymphocyte, natural killer, and CD4+ and CD8+ T-cell fractions based on blood-derived DNA methylation signatures assayed using the Illumina HumanMethylationEPIC array. The IDOL algorithm identifies a library of 450 CpGs, resulting in an average R2 = 99.2 across cell types when applied to EPIC methylation data collected on artificial mixtures constructed from the above cell types. Of the 450 CpGs, 69% are unique to EPIC. This library has the potential to reduce unintended technical differences across array platforms.

Journal ArticleDOI
TL;DR: Even though the intergenic space is changed by the TE turnover, an unexpected preservation is observed between the A, B, and D subgenomes for features like TE family proportions, gene spacing, and TE enrichment near genes.
Abstract: Transposable elements (TEs) are major components of large plant genomes and main drivers of genome evolution. The most recent assembly of hexaploid bread wheat recovered the highly repetitive TE space in an almost complete chromosomal context and enabled a detailed view into the dynamics of TEs in the A, B, and D subgenomes. The overall TE content is very similar between the A, B, and D subgenomes, although we find no evidence for bursts of TE amplification after the polyploidization events. Despite the near-complete turnover of TEs since the subgenome lineages diverged from a common ancestor, 76% of TE families are still present in similar proportions in each subgenome. Moreover, spacing between syntenic genes is also conserved, even though syntenic TEs have been replaced by new insertions over time, suggesting that distances between genes, but not sequences, are under evolutionary constraints. The TE composition of the immediate gene vicinity differs from the core intergenic regions. We find the same TE families to be enriched or depleted near genes in all three subgenomes. Evaluations at the subfamily level of timed long terminal repeat-retrotransposon insertions highlight the independent evolution of the diploid A, B, and D lineages before polyploidization and cases of concerted proliferation in the AB tetraploid. Even though the intergenic space is changed by the TE turnover, an unexpected preservation is observed between the A, B, and D subgenomes for features like TE family proportions, gene spacing, and TE enrichment near genes.

Journal ArticleDOI
TL;DR: A comprehensive and rigorous analysis of WGS data across multiple sample types suggests both Cas9 and Cpf1 nucleases are very specific in generating targeted DNA modifications and off-targeting can be avoided by designing guide RNAs with high specificity.
Abstract: Targeting specificity has been a barrier to applying genome editing systems in functional genomics, precise medicine and plant breeding. In plants, only limited studies have used whole-genome sequencing (WGS) to test off-target effects of Cas9. The cause of numerous discovered mutations is still controversial. Furthermore, WGS-based off-target analysis of Cpf1 (Cas12a) has not been reported in any higher organism to date. We conduct a WGS analysis of 34 plants edited by Cas9 and 15 plants edited by Cpf1 in T0 and T1 generations along with 20 diverse control plants in rice. The sequencing depths range from 45× to 105× with read mapping rates above 96%. Our results clearly show that most mutations in edited plants are created by the tissue culture process, which causes approximately 102 to 148 single nucleotide variations (SNVs) and approximately 32 to 83 insertions/deletions (indels) per plant. Among 12 Cas9 single guide RNAs (sgRNAs) and three Cpf1 CRISPR RNAs (crRNAs) assessed by WGS, only one Cas9 sgRNA resulted in off-target mutations in T0 lines at sites predicted by computer programs. Moreover, we cannot find evidence for bona fide off-target mutations due to continued expression of Cas9 or Cpf1 with guide RNAs in T1 generation. Our comprehensive and rigorous analysis of WGS data across multiple sample types suggests both Cas9 and Cpf1 nucleases are very specific in generating targeted DNA modifications and off-targeting can be avoided by designing guide RNAs with high specificity.

Journal ArticleDOI
TL;DR: The data demonstrate that abnormal histone modification-activated HOXC-AS3 may play important roles in gastric cancer oncogenesis and may serve as a target for Gastric cancer diagnosis and therapy.
Abstract: Recently, increasing evidence shows that long noncoding RNAs (lncRNAs) play a significant role in human tumorigenesis. However, the function of lncRNAs in human gastric cancer remains largely unknown. By using publicly available expression profiling data from gastric cancer and integrating bioinformatics analyses, we screen and identify a novel lncRNA, HOXC-AS3. HOXC-AS3 is significantly increased in gastric cancer tissues and is correlated with clinical outcomes of gastric cancer. In addition, HOXC-AS3 regulates cell proliferation and migration both in vitro and in vivo. RNA-seq analysis reveals that HOXC-AS3 knockdown preferentially affects genes that are linked to proliferation and migration. Mechanistically, we find that HOXC-AS3 is obviously activated by gain of H3K4me3 and H3K27ac, both in cells and in tissues. RNA pull-down mass spectrometry analysis identifies that YBX1 interacts with HOXC-AS3, and RNA-seq analysis finds a marked overlap in genes differentially expressed after YBX1 knockdown and those transcriptionally regulated by HOXC-AS3, suggesting that YBX1 participates in HOXC-AS3-mediated gene transcriptional regulation in the tumorigenesis of gastric cancer. Together, our data demonstrate that abnormal histone modification-activated HOXC-AS3 may play important roles in gastric cancer oncogenesis and may serve as a target for gastric cancer diagnosis and therapy.

Journal ArticleDOI
TL;DR: Findings provide strong evidence that RNA m6A methylation is controlled in a precise spatiotemporal manner and participates in the regulation of postnatal development of the mouse cerebellum.
Abstract: N6-methyladenosine (m6A) is an important epitranscriptomic mark with high abundance in the brain. Recently, it has been found to be involved in the regulation of memory formation and mammalian cortical neurogenesis. However, while it is now established that m6A methylation occurs in a spatially restricted manner, its functions in specific brain regions still await elucidation. We identify widespread and dynamic RNA m6A methylation in the developing mouse cerebellum and further uncover distinct features of continuous and temporal-specific m6A methylation across the four postnatal developmental processes. Temporal-specific m6A peaks from P7 to P60 exhibit remarkable changes in their distribution patterns along the mRNA transcripts. We also show spatiotemporal-specific expression of m6A writers METTL3, METTL14, and WTAP and erasers ALKBH5 and FTO in the mouse cerebellum. Ectopic expression of METTL3 mediated by lentivirus infection leads to disorganized structure of both Purkinje and glial cells. In addition, under hypobaric hypoxia exposure, Alkbh5-deletion causes abnormal cell proliferation and differentiation in the cerebellum through disturbing the balance of RNA m6A methylation in different cell fate determination genes. Notably, nuclear export of the hypermethylated RNAs is enhanced in the cerebellum of Alkbh5-deficient mice exposed to hypobaric hypoxia. Together, our findings provide strong evidence that RNA m6A methylation is controlled in a precise spatiotemporal manner and participates in the regulation of postnatal development of the mouse cerebellum.

Journal ArticleDOI
TL;DR: How genome-scale data can inform species delineation in the face of admixture, facilitate evolution through the identification of adaptive alleles, and enhance evolutionary rescue based on genomic patterns of inbreeding are discussed.
Abstract: “Conservation genomics” encompasses the idea that genome-scale data will improve the capacity of resource managers to protect species. Although genetic approaches have long been used in conservation research, it has only recently become tractable to generate genome-wide data at a scale that is useful for conservation. In this Review, we discuss how genome-scale data can inform species delineation in the face of admixture, facilitate evolution through the identification of adaptive alleles, and enhance evolutionary rescue based on genomic patterns of inbreeding. As genomic approaches become more widely adopted in conservation, we expect that they will have a positive impact on management and policy decisions.

Journal ArticleDOI
TL;DR: It is shown that amplification-free library preparation is the least biased approach for WGBS, and in protocols with amplification, the choice of bisulfite conversion protocol or polymerase can significantly minimize artefacts.
Abstract: Whole-genome bisulfite sequencing (WGBS) is becoming an increasingly accessible technique, used widely for both fundamental and disease-oriented research. Library preparation methods benefit from a variety of available kits, polymerases and bisulfite conversion protocols. Although some steps in the procedure, such as PCR amplification, are known to introduce biases, a systematic evaluation of biases in WGBS strategies is missing. We perform a comparative analysis of several commonly used pre- and post-bisulfite WGBS library preparation protocols for their performance and quality of sequencing outputs. Our results show that bisulfite conversion per se is the main trigger of pronounced sequencing biases, and PCR amplification builds on these underlying artefacts. The majority of standard library preparation methods yield a significantly biased sequence output and overestimate global methylation. Importantly, both absolute and relative methylation levels at specific genomic regions vary substantially between methods, with clear implications for DNA methylation studies. We show that amplification-free library preparation is the least biased approach for WGBS. In protocols with amplification, the choice of bisulfite conversion protocol or polymerase can significantly minimize artefacts. To aid with the quality assessment of existing WGBS datasets, we have integrated a bias diagnostic tool in the Bismark package and offer several approaches for consideration during the preparation and analysis of WGBS datasets.

Journal ArticleDOI
TL;DR: The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International Wheat Genome Sequencing Consortium.
Abstract: The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International Wheat Genome Sequencing Consortium. Genome browsers, BLAST, and InterMine tools have been established for in-depth exploration of the genome sequence together with additional linked datasets including physical maps, sequence variations, gene expression, and genetic and phenomic data from other international collaborative projects already stored in the GnpIS information system. The portal provides enhanced search and browser features that will facilitate the deployment of the latest genomics resources in wheat improvement.

Journal ArticleDOI
TL;DR: This study provides a reference for the analysis of chromatin domains from Hi-C experiments and useful guidelines for choosing a suitable approach based on the experimental design, available data, and biological question of interest.
Abstract: Chromatin folding gives rise to structural elements among which are clusters of densely interacting DNA regions termed topologically associating domains (TADs). TADs have been characterized across multiple species, tissue types, and differentiation stages, sometimes in association with regulation of biological functions. The reliability and reproducibility of these findings are intrinsically related with the correct identification of these domains from high-throughput chromatin conformation capture (Hi-C) experiments. Here, we test and compare 22 computational methods to identify TADs across 20 different conditions. We find that TAD sizes and numbers vary significantly among callers and data resolutions, challenging the definition of an average TAD size, but strengthening the hypothesis that TADs are hierarchically organized domains, rather than disjoint structural elements. Performances of these methods differ based on data resolution and normalization strategy, but a core set of TAD callers consistently retrieve reproducible domains, even at low sequencing depths, that are enriched for TAD-associated biological features. This study provides a reference for the analysis of chromatin domains from Hi-C experiments and useful guidelines for choosing a suitable approach based on the experimental design, available data, and biological question of interest.

Journal ArticleDOI
TL;DR: A weighting strategy is introduced, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero- inflated data, boosting performance for scRNA-seq.
Abstract: Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

Journal ArticleDOI
TL;DR: It is found that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations.
Abstract: The accurate description of ancestry is essential to interpret, access, and integrate human genomics data, and to ensure that these benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the representation of ancestry information. Here we describe a framework for the accurate and standardized description of sample ancestry, and validate it by application to the NHGRI-EBI GWAS Catalog. We confirm known biases and gaps in diversity, and find that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations. It is our hope that widespread adoption of this framework will lead to improved analysis, interpretation, and integration of human genomics data.

Journal ArticleDOI
TL;DR: A flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries is described.
Abstract: Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available. Here, we describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. We introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells.