Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model
read more
Citations
Eleven grand challenges in single-cell data science
The art of using t-SNE for single-cell transcriptomics.
Orchestrating Single-Cell Analysis with Bioconductor
muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data.
Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq
References
Integrating single-cell transcriptomic data across different conditions, technologies, and species.
Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets
Massively parallel digital transcriptional profiling of single cells
Related Papers (5)
Computational and analytical challenges in single-cell transcriptomics
Frequently Asked Questions (10)
Q2. What is the process of lysing a cell?
When the cell is processed by a scRNA-Seq protocol, it is lysed, then some fraction of the transcripts are captured by beads within the droplets.
Q3. What is the way to model a high dimensional dataset?
Any high dimensional, sparse dataset where samples contain only relative information in the form of counts may conceivably be modeled by the multinomial distribution.
Q4. What is the popular method for calculating the size factor for a cell?
A numerically stable and popular method is to set the size factor for each cell as the total counts divided by 106 (counts per million, CPM).
Q5. What is the probability of zeros in the biological replicates data?
After down-sampling replicates to 10,000 UMIs per droplet to remove variability due to the differences in sequencing depth, thefraction of zeros is computed for each gene and plotted against the log of expression across all samples for the technical replicates data.
Q6. What is the way to replace the normal null model with a multinomial?
the authors propose to replace the normal null model with a multinomial null model as a better match to the data-generating mechanism.
Q7. What is the framework for scRNA-Seq analysis?
The authors have outlined a statistical framework for analysis of scRNA-Seq data with UMI counts based on a multinomial model, providing effective and simple to compute methods for feature selection and dimension reduction.
Q8. What is the way to model biological variability in scRNA-Seq counts?
While the multinomial likelihood is ideal for modeling technical variability in scRNA-Seq UMI counts (Fig. 1), in many cases, there may be excess biological variability present as well.
Q9. What is the meaning of the term technical replicates negative control?
The authors refer to this dataset as the technical replicates negative control as there is no biological variability whatsoever, and in principle, each expression profile should be the same.
Q10. What is the argument that the Poisson model is insufficient to describe the sampling distribution of genes?
It has been argued that the Poisson model is insufficient to describe the sampling distribution of genes with high counts and the negative binomial model is more appropriate [11].