scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments

01 Feb 2013-Bioinformatics (Oxford University Press)-Vol. 29, Iss: 4, pp 461-467
TL;DR: Gottard et al. as discussed by the authors proposed a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero).
Abstract: Motivation: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However, few analytic tools have been developed specifically for the statistical and analytical challenges of single-cell quantitative polymerase chain reactions data. Results: We present a statistical framework for the exploration, quality control and analysis of single-cell gene expression data from microfluidic arrays. We assess accuracy and within-sample heterogeneity of single-cell expression and develop quality control criteria to filter unreliable cell measurements. We propose a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero). Based on this model, we derive a combined likelihood ratio test for differential expression that incorporates both the discrete and continuous components. Using an experiment that examines treatment-specific changes in expression, we show that this combined test is more powerful than either the continuous or dichotomous component in isolation, or a t-test on the zero-inflated data. Although developed for measurements from a specific platform (Fluidigm), these tools are generalizable to other multi-parametric measures over large numbers of events. Availability: All results presented here were obtained using the SingleCellAssay R package available on GitHub ( http://github.com/RGLab/SingleCellAssay). Contact: rgottard@fhcrc.org Supplementary information:Supplementary data are available at Bioinformatics online.
Citations
More filters
Journal ArticleDOI
21 May 2015-Cell
TL;DR: Drop-seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell's RNAs, and sequencing them all together.

5,506 citations

Journal ArticleDOI
TL;DR: Seurat is a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns, and correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups.
Abstract: Spatial localization is a key determinant of cellular fate and behavior, but methods for spatially resolved, transcriptome-wide gene expression profiling across complex tissues are lacking. RNA staining methods assay only a small number of transcripts, whereas single-cell RNA-seq, which measures global gene expression, separates cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos and generated a transcriptome-wide map of spatial patterning. We confirmed Seurat's accuracy using several experimental approaches, then used the strategy to identify a set of archetypal expression patterns and spatial markers. Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems.

3,465 citations

01 May 2015
TL;DR: Drop-seq as discussed by the authors analyzes mRNA transcripts from thousands of individual cells simultaneously while remembering transcripts' cell of origin, and identifies 39 transcriptionally distinct cell populations, creating a molecular atlas of gene expression for known retinal cell classes and novel candidate cell subtypes.
Abstract: Cells, the basic units of biological structure and function, vary broadly in type and state. Single-cell genomics can characterize cell identity and function, but limitations of ease and scale have prevented its broad application. Here we describe Drop-seq, a strategy for quickly profiling thousands of individual cells by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell's RNAs, and sequencing them all together. Drop-seq analyzes mRNA transcripts from thousands of individual cells simultaneously while remembering transcripts' cell of origin. We analyzed transcriptomes from 44,808 mouse retinal cells and identified 39 transcriptionally distinct cell populations, creating a molecular atlas of gene expression for known retinal cell classes and novel candidate cell subtypes. Drop-seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution. VIDEO ABSTRACT.

3,365 citations

Journal ArticleDOI
TL;DR: This work argues that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation and provides gene set enrichment analysis tailored to single-cell data.
Abstract: Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST .

1,770 citations

Journal ArticleDOI
21 Apr 2017-Science
TL;DR: This refined analysis has identified, among others, a previously unknown dendritic cell population that potently activates T cells and reclassify pDCs as the originally described “natural interferon-producing cells (IPCs)” with weaker T cell proliferation induction ability.
Abstract: INTRODUCTION Dendritic cells (DCs) and monocytes consist of multiple specialized subtypes that play a central role in pathogen sensing, phagocytosis, and antigen presentation. However, their identities and interrelationships are not fully understood, as these populations have historically been defined by a combination of morphology, physical properties, localization, functions, developmental origins, and expression of a restricted set of surface markers. RATIONALE To overcome this inherently biased strategy for cell identification, we performed single-cell RNA sequencing of ~2400 cells isolated from healthy blood donors and enriched for HLA-DR + lineage − cells. This single-cell profiling strategy and unbiased genomic classification, together with follow-up profiling and functional and phenotypic characterization of prospectively isolated subsets, led us to identify and validate six DC subtypes and four monocyte subtypes, and thus revise the taxonomy of these cells. RESULTS Our study reveals: 1) A new DC subset, representing 2 to 3% of the DC populations across all 10 donors tested, characterized by the expression of AXL , SIGLEC1 , and SIGLEC6 antigens, named AS DCs. The AS DC population further divides into two populations captured in the traditionally defined plasmacytoid DC (pDC) and CD1C + conventional DC (cDC) gates. This split is further reflected through AS DC gene expression signatures spanning a spectrum between cDC-like and pDC-like gene sets. Although AS DCs share properties with pDCs, they more potently activate T cells. This discovery led us to reclassify pDCs as the originally described “natural interferon-producing cells (IPCs)” with weaker T cell proliferation induction ability. 2) A new subdivision within the CD1C + DC subset: one defined by a major histocompatibility complex class II–like gene set and one by a CD14 + monocyte–like prominent gene set. These CD1C + DC subsets, which can be enriched by combining CD1C with CD32B, CD36, and CD163 antigens, can both potently induce T cell proliferation. 3) The existence of a circulating and dividing cDC progenitor giving rise to CD1C + and CLEC9A + DCs through in vitro differentiation assays. This blood precursor is defined by the expression of CD100 + CD34 int and observed at a frequency of ~0.02% of the LIN – HLA-DR + fraction. 4) Two additional monocyte populations: one expressing classical monocyte genes and cytotoxic genes, and the other with unknown functions. 5) Evidence for a relationship between blastic plasmacytoid DC neoplasia (BPDCN) cells and healthy DCs. CONCLUSION Our revised taxonomy will enable more accurate functional and developmental analyses as well as immune monitoring in health and disease. The discovery of AS DCs within the traditionally defined pDC population explains many of the cDC properties previously assigned to pDCs, highlighting the need to revisit the definition of pDCs. Furthermore, the discovery of blood cDC progenitors represents a new therapeutic target readily accessible in the bloodstream for manipulation, as well as a new source for better in vitro DC generation. Although the current results focus on DCs and monocytes, a similar strategy can be applied to build a comprehensive human immune cell atlas.

1,468 citations

References
More filters
Journal ArticleDOI
TL;DR: This protocol provides an overview of the comparative CT method for quantitative gene expression studies and various examples to present quantitative gene Expression data using this method.
Abstract: Two different methods of presenting quantitative gene expression exist: absolute and relative quantification. Absolute quantification calculates the copy number of the gene usually by relating the PCR signal to a standard curve. Relative gene expression presents the data of the gene of interest relative to some calibrator or internal control gene. A widely used method to present relative gene expression is the comparative C(T) method also referred to as the 2 (-DeltaDeltaC(T)) method. This protocol provides an overview of the comparative C(T) method for quantitative gene expression studies. Also presented here are various examples to present quantitative gene expression data using this method.

20,580 citations

Journal ArticleDOI
TL;DR: The normalization strategy presented here is a prerequisite for accurate RT-PCR expression profiling, which opens up the possibility of studying the biological relevance of small expression differences.
Abstract: Gene-expression analysis is increasingly important in biological research, with real-time reverse transcription PCR (RT-PCR) becoming the method of choice for high-throughput and accurate expression profiling of selected genes. Given the increased sensitivity, reproducibility and large dynamic range of this methodology, the requirements for a proper internal control gene for normalization have become increasingly stringent. Although housekeeping gene expression has been reported to vary considerably, no systematic survey has properly determined the errors related to the common practice of using only one control gene, nor presented an adequate way of working around this problem. We outline a robust and innovative strategy to identify the most stably expressed control genes in a given set of tissues, and to determine the minimum number of genes required to calculate a reliable normalization factor. We have evaluated ten housekeeping genes from different abundance and functional classes in various human tissues, and demonstrated that the conventional use of a single gene for normalization leads to relatively large errors in a significant proportion of samples tested. The geometric mean of multiple carefully selected housekeeping genes was validated as an accurate normalization factor by analyzing publicly available microarray data. The normalization strategy presented here is a prerequisite for accurate RT-PCR expression profiling, which, among other things, opens up the possibility of studying the biological relevance of small expression differences.

18,261 citations

Journal ArticleDOI
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Abstract: The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.

11,864 citations

Journal ArticleDOI
TL;DR: A new reproducibility index is developed and studied that is simple to use and possesses desirable properties and the statistical properties of this estimate can be satisfactorily evaluated using an inverse hyperbolic tangent transformation.
Abstract: A new reproducibility index is developed and studied. This index is the correlation between the two readings that fall on the 45 degree line through the origin. It is simple to use and possesses desirable properties. The statistical properties of this estimate can be satisfactorily evaluated using an inverse hyperbolic tangent transformation. A Monte Carlo experiment with 5,000 runs was performed to confirm the estimate's validity. An application using actual data is given.

6,916 citations

Related Papers (5)