Construction of continuously expandable single-cell atlases through integration of heterogeneous datasets in a generalized cell-embedding space
read more
Citations
Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq
IMGG: Integrating Multiple Single-Cell Datasets through Connected Graphs and Generative Adversarial Networks
Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review
Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders
References
Fast, sensitive and accurate integration of single-cell data with Harmony.
From Louvain to Leiden: guaranteeing well-connected communities
Tackling the widespread and critical impact of batch effects in high-throughput data
Single-cell transcriptomics of 20 mouse organs creates a "Tabula Muris"
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.
Related Papers (5)
Evaluating the reproducibility of single-cell gene regulatory network inference algorithms
pcaReduce: hierarchical clustering of single cell transcriptional profiles
Frequently Asked Questions (13)
Q2. What are the main features of the scATAC-seq technologies?
Single-cell RNA sequencing (scRNA-seq) and assay for transposase-accessible chromatin using sequencing (scATAC-seq) technologies enable decomposition of diverse cell-types and states to elucidate their function and regulation in tissues and heterogeneous systems1-4.
Q3. What is the way to integrate scATAC data?
SCALEX can be used to integrate scATAC-seq data as well as cross-modality data (e.g. scRNA-seq and scATAC-seq) (Methods).
Q4. What is the way to integrate a cell-embedding space?
The accurate, scalable, and efficient integration performance of SCALEX depends on its encoder’s capacity to project cells from various sources into a generalized, batchinvariant cell-embedding space.
Q5. How did the authors visualize the integration performance of all methods?
6The authors used Uniform Manifold Approximation and Projection (UMAP)36 embeddingsto visualize the integration performance of all methods (Methods).
Q6. What is the composition of the COVID-19 dataset?
COVID-19 dataset composition, including healthy controls and in uenza patients, as well as mild/moderate, severe, and convalescent COVID-19 patients.
Q7. Why did Seurat v3 and Harmony achieve the integration performance?
Seurat v3 and Harmony may have obtained a high batch entropy mixing score because of misaligning different cell-types together.
Q8. What is the composition of the COVID-19 atlas?
COVID-19 dataset composition, including healthy controls and influenza patients, as well as mild/moderate, severe, and convalescent COVID-19 patients.
Q9. What did the authors use to visualize the integration performance of the raw datasets?
Note that all of the raw datasets displayed strong batch effects: cell-types that were common in different batches were separately distributed.
Q10. What is the corresponding expression level of the marker?
Dot plot of canonical markers of cell-types of reference pancreas dataset; dot color represents average expression level, while dot size represents the proportion of cells in the group expressing the marker.
Q11. What datasets were used to build a single cell atlas?
The authors applied SCALEX integration to two large and complex datasets—the mouse atlas dataset (comprising multiple organs from two studies assayed by 10X, Smart-seq2, and Microwell-seq6,51) (Fig. 4a) and the human atlas dataset (comprising multiple organs from two studies assayed by 10X and Microwell-seq39,52).
Q12. What was the function used to normalize the total counts of cells?
Total counts of each cell were normalized to the median of the total counts of all cells by using the normalize_total function, with parameters target_sum=“None” in the Scanpy69 package. iv).
Q13. How did the UMAP score evaluate batch mixing?
by only considering the degree of batch mixing but ignoring cell-type differences, the batch entropy mixing score is not ideally suited for assessing batch mixing for partially-overlapping datasets.