scispace - formally typeset
Search or ask a question
Author

Huidong Chen

Other affiliations: Broad Institute, Tongji University
Bio: Huidong Chen is an academic researcher from Harvard University. The author has contributed to research in topics: Cluster analysis & Data pre-processing. The author has an hindex of 10, co-authored 18 publications receiving 681 citations. Previous affiliations of Huidong Chen include Broad Institute & Tongji University.

Papers
More filters
Journal ArticleDOI
TL;DR: Application of GiniClust to public single-cell RNA-seq datasets uncovers previously unrecognized rare cell types, including Zscan4-expressing cells within mouse embryonic stem cells and hemoglobin-expression cells in the mouse cortex and hippocampus.
Abstract: High-throughput single-cell technologies have great potential to discover new cell types; however, it remains challenging to detect rare cell types that are distinct from a large population. We present a novel computational method, called GiniClust, to overcome this challenge. Validation against a benchmark dataset indicates that GiniClust achieves high sensitivity and specificity. Application of GiniClust to public single-cell RNA-seq datasets uncovers previously unrecognized rare cell types, including Zscan4-expressing cells within mouse embryonic stem cells and hemoglobin-expressing cells in the mouse cortex and hippocampus. GiniClust also correctly detects a small number of normal cells that are mixed in a cancer cell population.

240 citations

Journal ArticleDOI
TL;DR: A benchmarking framework is presented that is applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms.
Abstract: Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans), lead to inherent data sparsity (1–10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (10–45% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level. We present a benchmarking framework that is applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were compared by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed. This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC is the only method able to analyze a large dataset (> 80,000 cells).

225 citations

Journal ArticleDOI
TL;DR: STREAM is a pipeline for reconstruction and visualization of differentiation trajectories from both single-cell RNA-seq and ATAC-seq data and its utility for understanding myoblast differentiation and disentangling known heterogeneity in hematopoiesis for different organisms is demonstrated.
Abstract: Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. We have tested STREAM on several synthetic and real datasets generated with different single-cell technologies. We further demonstrate its utility for understanding myoblast differentiation and disentangling known heterogeneity in hematopoiesis for different organisms. STREAM is an open-source software package. The increasing accessibility of single cell omics technologies beyond transcriptomics demands parallel advances in analysis. Here, the authors introduce STREAM, a pipeline for reconstruction and visualization of differentiation trajectories from both single-cell RNA-seq and ATAC-seq data.

196 citations

Journal ArticleDOI
TL;DR: This work provides the first, comprehensive, single-cell, transcriptomic analysis of kidney and marrow cells in the adult zebrafish, uncovering novel cell types including two classes of natural killer immune cells, classically defined and erythroid-primed hematopoietic stem and progenitor cells, mucin-secreting kidney cells, and kidney stem/progenitor cells.
Abstract: Recent advances in single-cell, transcriptomic profiling have provided unprecedented access to investigate cell heterogeneity during tissue and organ development. In this study, we used massively parallel, single-cell RNA sequencing to define cell heterogeneity within the zebrafish kidney marrow, constructing a comprehensive molecular atlas of definitive hematopoiesis and functionally distinct renal cells found in adult zebrafish. Because our method analyzed blood and kidney cells in an unbiased manner, our approach was useful in characterizing immune-cell deficiencies within DNA-protein kinase catalytic subunit (prkdc), interleukin-2 receptor γ a (il2rga), and double-homozygous-mutant fish, identifying blood cell losses in T, B, and natural killer cells within specific genetic mutants. Our analysis also uncovered novel cell types, including two classes of natural killer immune cells, classically defined and erythroid-primed hematopoietic stem and progenitor cells, mucin-secreting kidney cells, and kidney stem/progenitor cells. In total, our work provides the first, comprehensive, single-cell, transcriptomic analysis of kidney and marrow cells in the adult zebrafish.

128 citations

Journal ArticleDOI
TL;DR: This work uses high throughput single-cell RNA-sequencing (scRNA-seq), based on optimized microfluidic circuits, to profile early differentiation lineages in the human embryoid body system and reveals the cellular-state landscape of hPSC early differentiation.
Abstract: Human pluripotent stem cells (hPSCs) provide powerful models for studying cellular differentiations and unlimited sources of cells for regenerative medicine. However, a comprehensive single-cell level differentiation roadmap for hPSCs has not been achieved. We use high throughput single-cell RNA-sequencing (scRNA-seq), based on optimized microfluidic circuits, to profile early differentiation lineages in the human embryoid body system. We present a cellular-state landscape for hPSC early differentiation that covers multiple cellular lineages, including neural, muscle, endothelial, stromal, liver, and epithelial cells. Through pseudotime analysis, we construct the developmental trajectories of these progenitor cells and reveal the gene expression dynamics in the process of cell differentiation. We further reprogram primed H9 cells into naive-like H9 cells to study the cellular-state transition process. We find that genes related to hemogenic endothelium development are enriched in naive-like H9. Functionally, naive-like H9 show higher potency for differentiation into hematopoietic lineages than primed cells. Our single-cell analysis reveals the cellular-state landscape of hPSC early differentiation, offering new insights that can be harnessed for optimization of differentiation protocols.

83 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: On a compendium of single-cell data from tumors and brain, it is demonstrated that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states.
Abstract: We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://scenicaertslaborg) On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity

2,277 citations

25 Apr 2017
TL;DR: This presentation is a case study taken from the travel and holiday industry and describes the effectiveness of various techniques as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib).
Abstract: This presentation is a case study taken from the travel and holiday industry. Paxport/Multicom, based in UK and Sweden, have recently adopted a recommendation system for holiday accommodation bookings. Machine learning techniques such as Collaborative Filtering have been applied using Python (3.5.1), with Jupyter (4.0.6) as the main framework. Data scale and sparsity present significant challenges in the case study, and so the effectiveness of various techniques are described as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib). The presentation is suitable for all levels of programmers.

1,338 citations

Journal ArticleDOI
TL;DR: It is demonstrated that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients and achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach.
Abstract: Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.

1,120 citations

Posted ContentDOI
31 May 2017-bioRxiv
TL;DR: SCENIC (Single Cell rEgulatory Network Inference and Clustering) is the first method to analyze scRNA-seq data using a network-centric, rather than cell-centric approach and allows for the simultaneous tracing of genomic regulatory programs and the mapping of cellular identities emerging from these programs.
Abstract: Single-cell RNA-seq allows building cell atlases of any given tissue and infer the dynamics of cellular state transitions during developmental or disease trajectories. Both the maintenance and transitions of cell states are encoded by regulatory programs in the genome sequence. However, this regulatory code has not yet been exploited to guide the identification of cellular states from single-cell RNA-seq data. Here we describe a computational resource, called SCENIC (Single Cell rEgulatory Network Inference and Clustering), for the simultaneous reconstruction of gene regulatory networks (GRNs) and the identification of stable cell states, using single-cell RNA-seq data. SCENIC outperforms existing approaches at the level of cell clustering and transcription factor identification. Importantly, we show that cell state identification based on GRNs is robust towards batch-effects and technical-biases. We applied SCENIC to a compendium of single-cell data from the mouse and human brain and demonstrate that the proper combinations of transcription factors, target genes, enhancers, and cell types can be identified. Moreover, we used SCENIC to map the cell state landscape in melanoma and identified a gene regulatory network underlying a proliferative melanoma state driven by MITF and STAT and a contrasting network controlling an invasive state governed by NFATC2 and NFIB. We further validated these predictions by showing that two transcription factors are predominantly expressed in early metastatic sentinel lymph nodes. In summary, SCENIC is the first method to analyze scRNA-seq data using a network-centric, rather than cell-centric approach. SCENIC is generic, easy to use, and flexible, and allows for the simultaneous tracing of genomic regulatory programs and the mapping of cellular identities emerging from these programs. Availability: SCENIC is available as an R workflow based on three new R/Bioconductor packages: GENIE3, RcisTarget and AUCell. As scalable alternative to GENIE3, we also provide GRNboost, paving the way towards the network analysis across millions of single cells.

1,101 citations

Journal ArticleDOI
07 Jun 2017-Nature
TL;DR: It is shown that human melanoma cells can display profound transcriptional variability at the single-cell level that predicts which cells will ultimately resist drug treatment, and this work reveals the multistage nature of the acquisition of drug resistance and provides a framework for understanding resistance dynamics in single cells.
Abstract: Through drug exposure, a rare, transient transcriptional program characterized by high levels of expression of known resistance drivers can get ‘burned in’, leading to the selection of cells endowed with a transcriptional drug resistance and thus more chemoresistant cancers. Therapies that target signalling molecules that are mutated in cancers can often have substantial short-term effects, but the emergence of resistant cancer cells is a major barrier to full cures1,2. Resistance can result from secondary mutations3,4, but in other cases there is no clear genetic cause, raising the possibility of non-genetic rare cell variability5,6,7,8,9,10,11. Here we show that human melanoma cells can display profound transcriptional variability at the single-cell level that predicts which cells will ultimately resist drug treatment. This variability involves infrequent, semi-coordinated transcription of a number of resistance markers at high levels in a very small percentage of cells. The addition of drug then induces epigenetic reprogramming in these cells, converting the transient transcriptional state to a stably resistant state. This reprogramming begins with a loss of SOX10-mediated differentiation followed by activation of new signalling pathways, partially mediated by the activity of the transcription factors JUN and/or AP-1 and TEAD. Our work reveals the multistage nature of the acquisition of drug resistance and provides a framework for understanding resistance dynamics in single cells. We find that other cell types also exhibit sporadic expression of many of these same marker genes, suggesting the existence of a general program in which expression is displayed in rare subpopulations of cells.

854 citations