scispace - formally typeset
Search or ask a question
Journal ArticleDOI

SC3: consensus clustering of single-cell RNA-seq data

TL;DR: It is demonstrated that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients and achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach.
Abstract: Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

7,741 citations

Journal ArticleDOI
TL;DR: On a compendium of single-cell data from tumors and brain, it is demonstrated that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states.
Abstract: We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://scenicaertslaborg) On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity

2,277 citations

Journal ArticleDOI
Aviv Regev1, Aviv Regev2, Aviv Regev3, Sarah A. Teichmann4, Sarah A. Teichmann5, Sarah A. Teichmann6, Eric S. Lander2, Eric S. Lander7, Eric S. Lander3, Ido Amit8, Christophe Benoist7, Ewan Birney5, Bernd Bodenmiller9, Bernd Bodenmiller5, Peter J. Campbell6, Peter J. Campbell4, Piero Carninci4, Menna R. Clatworthy10, Hans Clevers11, Bart Deplancke12, Ian Dunham5, James Eberwine13, Roland Eils14, Roland Eils15, Wolfgang Enard16, Andrew Farmer, Lars Fugger17, Berthold Göttgens4, Nir Hacohen2, Nir Hacohen7, Muzlifah Haniffa18, Martin Hemberg6, Seung K. Kim19, Paul Klenerman20, Paul Klenerman17, Arnold R. Kriegstein21, Ed S. Lein22, Sten Linnarsson23, Emma Lundberg19, Emma Lundberg24, Joakim Lundeberg24, Partha P. Majumder, John C. Marioni5, John C. Marioni6, John C. Marioni4, Miriam Merad25, Musa M. Mhlanga26, Martijn C. Nawijn27, Mihai G. Netea28, Garry P. Nolan19, Dana Pe'er29, Anthony Phillipakis2, Chris P. Ponting30, Stephen R. Quake19, Wolf Reik6, Wolf Reik4, Wolf Reik31, Orit Rozenblatt-Rosen2, Joshua R. Sanes7, Rahul Satija32, Ton N. Schumacher33, Alex K. Shalek3, Alex K. Shalek2, Alex K. Shalek34, Ehud Shapiro8, Padmanee Sharma35, Jay W. Shin, Oliver Stegle5, Michael R. Stratton6, Michael J. T. Stubbington6, Fabian J. Theis36, Matthias Uhlen24, Matthias Uhlen37, Alexander van Oudenaarden11, Allon Wagner38, Fiona M. Watt39, Jonathan S. Weissman, Barbara J. Wold40, Ramnik J. Xavier, Nir Yosef34, Nir Yosef38, Human Cell Atlas Meeting Participants 
05 Dec 2017-eLife
TL;DR: An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease.
Abstract: The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.

1,391 citations


Cites background from "SC3: consensus clustering of single..."

  • ..., 2016b) and that individual sub-clones can be readily identified in one patient (Kiselev et al., 2017)....

    [...]

  • ...…(Patel et al., 2014; Tirosh et al., 2016a; Tirosh et al., 2016b) – and related them to each other, showing, for example, that only stem-like cells proliferate in lowgrade glioma (Tirosh et al., 2016b) and that individual sub-clones can be readily identified in one patient (Kiselev et al., 2017)....

    [...]

  • ...…of melanoma (Tirosh et al., 2016a), glioblastoma (Patel et al., 2014), lowgrade glioma (Tirosh et al., 2016b), and myeloproliferative neoplasms (Kiselev et al., 2017), single-cell RNA-seq of fresh tumors resected directly from patients readily distinguished among malignant, immune, stromal and…...

    [...]

  • ..., 2016b), and myeloproliferative neoplasms (Kiselev et al., 2017), single-cell RNA-seq of fresh tumors resected directly from patients readily distinguished among malignant, immune, stromal and endothelial cells....

    [...]

Journal ArticleDOI
TL;DR: The concept of ensemble learning is introduced, traditional, novel and state‐of‐the‐art ensemble methods are reviewed and current challenges and trends in the field are discussed.
Abstract: Ensemble methods are considered the state‐of‐the art solution for many machine learning challenges. Such methods improve the predictive performance of a single model by training multiple models and combining their predictions. This paper introduce the concept of ensemble learning, reviews traditional, novel and state‐of‐the‐art ensemble methods and discusses current challenges and trends in the field.

1,381 citations


Cites methods from "SC3: consensus clustering of single..."

  • ...Consensus clustering has been shown to be very effective in discovering biological meaningful clusters in gene expression data (Kiselev et al., 2017; Monti et al., 2003; Verhaak et al., 2010), video shot segmentation (Chang, Lee, Hong, & Archibald, 2008; Zheng, Zhang, & Li, 2012), online event…...

    [...]

Journal ArticleDOI
15 Jun 2017-Cell
TL;DR: Deep single-cell RNA sequencing on 5,063 single T cells isolated from peripheral blood, tumor, and adjacent normal tissues from six hepatocellular carcinoma patients enables us to identify 11 T cell subsets based on their molecular and functional properties and delineate their developmental trajectory.

1,232 citations


Cites background or methods from "SC3: consensus clustering of single..."

  • ...For each SC3 run, the silhouette was calculated, the consensusmatrix plotted, and cluster specific genes identified....

    [...]

  • ...The SC30s parameters k, which was used in the k-means and hierarchical clustering, was chosen from 2 to 10 iteratively....

    [...]

  • ...T Cell Clustering and Subtype Analysis To reveal the intrinsic structure and potential functional subtypes of the overall T cell populations, we performed unsupervised clustering of all T cells using the spectral clustering method implemented in SC3 (Kiselev et al., 2017)....

    [...]

  • ...Based on the SC3 cluster analysis, tumor-infiltrating CD8+ T cells in C4_CD8-LAYN cluster were defined as exhausted T cells, while others as non-exhausted T cells....

    [...]

  • ...REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies Anti-Human CD3 eFluor 450 (FACS) eBioscience Cat#48-0037-41 Anti-Human CD4 FITC (FACS) eBioscience Cat#11-0048-41 Anti-Human CD8a APC (FACS) eBioscience Cat#17-0086-41 Anti-Human CD25 PE (FACS) eBioscience Cat#12-0259-42 Human Layilin Antibody (FACS) Sino Biological Cat#10208-MM02 7-AAD Viability Staining Solution (FACS) eBioscience Cat#00-6993-50 Anti-CD3 antibody (IHC) Abcam Cat#ab16669 Anti-CD4 antibody (IHC) Abcam Cat#ab846 Anti-CD8 antibody (IHC) Abcam Cat#ab17147 Anti-FOXP3 antibody (IHC) Abcam Cat#ab22510 CD3 Functional Grade Monoclonal Antibody eBioscience Cat#16-0037-85 CD28 Functional Grade Monoclonal Antibody eBioscience Cat#16-0289-85 Biological Samples Human PBMC AllCells https://www.allcells.com Critical Commercial Assays Live/Dead Fixable Blue Dead Cell Stain Kit Invitrogen Cat#L34962 Alexa Fluor 647 Conjugation Kit Molecular Probes Cat#A20186 IFN Gamma Human Uncoated ELISA Kit eBioscience Cat#88-7316-88 Dynabeads Human T-activator CD3/CD28 for T Cell Expansion and Activation ThermoFisher Scientific Cat#11131D Retro-X Universal Packaging System Clontech Cat#631530 Pan T cell Isolation Kit Miltenyi Biotec Cat#130-096-535 Human T cell Activation/Expansion Kit Miltenyi Biotec Cat#130-091-441 NEBNext Ultra RNA Library Prep Kit for Illumina Paired-end Multiplexed Sequencing Library NEB Cat#E7530 SureSelectXT Target Enrichment System for Illumina Paired-End Multiplexed Sequencing Library kit Agilent Cat#G9701 TruePrep DNA Library Prep Kit V2 for Illumina Vazyme Biotech Cat#TD503 Hiseq 3000/4000 SBS kit Illumina Cat#FC-410-1003 Hiseq 3000/4000 PE cluster kit Illumina Cat#PE-410-1001 Deposited Data Data files for single-cell RNA sequencing (raw data) This paper EGAS00001002072 Data files for bulk RNA sequencing (raw data) This paper EGAS00001002072 Data files for bulk exome sequencing (raw data) This paper EGAS00001002072 Data files for single-cell RNA sequencing (processed data) This paper GSE98638 Oligonucleotides Primer: CD3D Forward: TCATTGCCACTCTGCTCC This paper N/A Primer: CD3D Reverse: GTTCACTTGTTCCGAGCC This paper N/A Software and Algorithms SC3 Kiselev et al., 2017 https://github.com/hemberg-lab/SC3 Monocle 2.0 Trapnell et al., 2014 http://monocle-bio.sourceforge.net/ ScLVM Buettner et al., 2015 https://github.com/PMBio/scLVM TraCeR Stubbington et al., 2016 https://github.com/teichlab/tracer Cell 169, 1342–1356.e1–e5, June 15, 2017 e1...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Abstract: Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic Contact: ed.nehcaa-htwr.1oib@ledasu Supplementary information: Supplementary data are available at Bioinformatics online.

39,291 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations

Journal ArticleDOI
TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

14,524 citations

Related Papers (5)