scispace - formally typeset
Open accessJournal ArticleDOI: 10.1016/J.CELL.2019.05.031

Comprehensive Integration of Single-Cell Data.

13 Jun 2019-Cell (Cell Press)-Vol. 177, Iss: 7, pp 1888
Abstract: Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets. more


Open accessJournal ArticleDOI: 10.1016/J.CELL.2020.04.004
14 May 2020-Cell
Abstract: We have previously provided the first genetic evidence that angiotensin converting enzyme 2 (ACE2) is the critical receptor for severe acute respiratory syndrome coronavirus (SARS-CoV), and ACE2 protects the lung from injury, providing a molecular explanation for the severe lung failure and death due to SARS-CoV infections. ACE2 has now also been identified as a key receptor for SARS-CoV-2 infections, and it has been proposed that inhibiting this interaction might be used in treating patients with COVID-19. However, it is not known whether human recombinant soluble ACE2 (hrsACE2) blocks growth of SARS-CoV-2. Here, we show that clinical grade hrsACE2 reduced SARS-CoV-2 recovery from Vero cells by a factor of 1,000-5,000. An equivalent mouse rsACE2 had no effect. We also show that SARS-CoV-2 can directly infect engineered human blood vessel organoids and human kidney organoids, which can be inhibited by hrsACE2. These data demonstrate that hrsACE2 can significantly block early stages of SARS-CoV-2 infections. more

Topics: Vero cell (51%)

1,234 Citations

Open accessJournal ArticleDOI: 10.1038/S41591-020-0901-9
Mingfeng Liao, Yang Liu, Jing Yuan, Yanling Wen  +10 moreInstitutions (3)
12 May 2020-Nature Medicine
Abstract: Respiratory immune characteristics associated with Coronavirus Disease 2019 (COVID-19) severity are currently unclear. We characterized bronchoalveolar lavage fluid immune cells from patients with varying severity of COVID-19 and from healthy people by using single-cell RNA sequencing. Proinflammatory monocyte-derived macrophages were abundant in the bronchoalveolar lavage fluid from patients with severe COVID-9. Moderate cases were characterized by the presence of highly clonally expanded CD8+ T cells. This atlas of the bronchoalveolar immune microenvironment suggests potential mechanisms underlying pathogenesis and recovery in COVID-19. more

Topics: Bronchoalveolar lavage (65%), Immune system (57%)

1,090 Citations

Open accessJournal ArticleDOI: 10.1186/S13059-019-1874-1
Christoph Hafemeister, Rahul Satija1Institutions (1)
23 Dec 2019-Genome Biology
Abstract: Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat. more

Topics: Count data (60%), Negative binomial distribution (55%), Covariate (53%) more

817 Citations

Open accessJournal ArticleDOI: 10.1126/SCIENCE.ABD2985
13 Nov 2020-Science
Abstract: The causative agent of coronavirus disease 2019 (COVID-19) is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). For many viruses, tissue tropism is determined by the availability of virus receptors and entry cofactors on the surface of host cells. In this study, we found that neuropilin-1 (NRP1), known to bind furin-cleaved substrates, significantly potentiates SARS-CoV-2 infectivity, an effect blocked by a monoclonal blocking antibody against NRP1. A SARS-CoV-2 mutant with an altered furin cleavage site did not depend on NRP1 for infectivity. Pathological analysis of olfactory epithelium obtained from human COVID-19 autopsies revealed that SARS-CoV-2 infected NRP1-positive cells facing the nasal cavity. Our data provide insight into SARS-CoV-2 cell infectivity and define a potential target for antiviral intervention. more

Topics: Infectivity (64%), Tissue tropism (57%), Furin (53%) more

631 Citations

Open access
Atray Dixit1, Atray Dixit2, Oren Parnas1, Biyu Li1  +22 moreInstitutions (6)
01 Dec 2016-
Abstract: Genetic screens help infer gene function in mammalian cells, but it has remained difficult to assay complex phenotypes-such as transcriptional profiles-at scale. Here, we develop Perturb-seq, combining single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-based perturbations to perform many such assays in a pool. We demonstrate Perturb-seq by analyzing 200,000 cells in immune cells and cell lines, focusing on transcription factors regulating the response of dendritic cells to lipopolysaccharide (LPS). Perturb-seq accurately identifies individual gene targets, gene signatures, and cell states affected by individual perturbations and their genetic interactions. We posit new functions for regulators of differentiation, the anti-viral response, and mitochondrial function during immune activation. By decomposing many high content measurements into the effects of perturbations, their interactions, and diverse cell metadata, Perturb-seq dramatically increases the scope of pooled genomic assays. more

Topics: CRISPR (55%), Genetic screen (53%), Gene (52%) more

539 Citations


Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTP352
Heng Li1, Bob Handsaker2, Alec Wysoker2, T. J. Fennell2  +5 moreInstitutions (4)
01 Aug 2009-Bioinformatics
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: Contact: [email protected] more

Topics: Variant Call Format (62%), Stockholm format (61%), FASTQ format (56%) more

35,747 Citations

Open accessJournal ArticleDOI: 10.1186/S13059-014-0550-8
05 Dec 2014-Genome Biology
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at . more

Topics: MRNA Sequencing (54%), Integrator complex (51%), Count data (50%) more

29,675 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTS635
01 Jan 2013-Bioinformatics
Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from more

Topics: MRNA Sequencing (57%)

20,172 Citations

Open accessJournal ArticleDOI: 10.1186/GB-2009-10-3-R25
04 Mar 2009-Genome Biology
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source more

Topics: Hybrid genome assembly (51%)

18,079 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTQ033
Aaron R. Quinlan1, Ira M. Hall1Institutions (1)
15 Mar 2010-Bioinformatics
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at more

Topics: Software suite (52%), Source code (50%)

14,088 Citations

No. of citations received by the Paper in previous years
Network Information