scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Automated identification of stratifying signatures in cellular subpopulations

TL;DR: A data-driven method termed Citrus is presented that identifies cell subsets associated with an experimental endpoint of interest and is demonstrated through the systematic identification of blood cells that signal in response to experimental stimuli and T-cell subsets whose abundance is predictive of AIDS-free survival risk in patients with HIV.
Abstract: Elucidation and examination of cellular subpopulations that display condition-specific behavior can play a critical contributory role in understanding disease mechanism, as well as provide a focal point for development of diagnostic criteria linking such a mechanism to clinical prognosis. Despite recent advancements in single-cell measurement technologies, the identification of relevant cell subsets through manual efforts remains standard practice. As new technologies such as mass cytometry increase the parameterization of single-cell measurements, the scalability and subjectivity inherent in manual analyses slows both analysis and progress. We therefore developed Citrus (cluster identification, characterization, and regression), a data-driven approach for the identification of stratifying subpopulations in multidimensional cytometry datasets. The methodology of Citrus is demonstrated through the identification of known and unexpected pathway responses in a dataset of stimulated peripheral blood mononuclear cells measured by mass cytometry. Additionally, the performance of Citrus is compared with that of existing methods through the analysis of several publicly available datasets. As the complexity of flow cytometry datasets continues to increase, methods such as Citrus will be needed to aid investigators in the performance of unbiased—and potentially more thorough—correlation-based mining and inspection of cell subsets nested within high-dimensional datasets.
Citations
More filters
Journal ArticleDOI
TL;DR: A mechanistically relevant population of CD27+PD-1–CD8+ CAR T cells expressing high levels of the IL-6 receptor predicts therapeutic response and is responsible for tumor control, and new features of CAR T cell biology are uncovered.
Abstract: Tolerance to self-antigens prevents the elimination of cancer by the immune system1,2. We used synthetic chimeric antigen receptors (CARs) to overcome immunological tolerance and mediate tumor rejection in patients with chronic lymphocytic leukemia (CLL). Remission was induced in a subset of subjects, but most did not respond. Comprehensive assessment of patient-derived CAR T cells to identify mechanisms of therapeutic success and failure has not been explored. We performed genomic, phenotypic and functional evaluations to identify determinants of response. Transcriptomic profiling revealed that CAR T cells from complete-responding patients with CLL were enriched in memory-related genes, including IL-6/STAT3 signatures, whereas T cells from nonresponders upregulated programs involved in effector differentiation, glycolysis, exhaustion and apoptosis. Sustained remission was associated with an elevated frequency of CD27+CD45RO–CD8+ T cells before CAR T cell generation, and these lymphocytes possessed memory-like characteristics. Highly functional CAR T cells from patients produced STAT3-related cytokines, and serum IL-6 correlated with CAR T cell expansion. IL-6/STAT3 blockade diminished CAR T cell proliferation. Furthermore, a mechanistically relevant population of CD27+PD-1–CD8+ CAR T cells expressing high levels of the IL-6 receptor predicts therapeutic response and is responsible for tumor control. These findings uncover new features of CAR T cell biology and underscore the potential of using pretreatment biomarkers of response to advance immunotherapies. An IL-6/STAT3 signature and memory CD8 T cell subset in preinfusion chimeric antigen receptor–expressing T cells associate with response in patients with high-risk chronic lymphocytic leukemia.

980 citations

Journal ArticleDOI
05 May 2016-Cell
TL;DR: The current state of mass cytometry is reviewed, providing an overview of the instrumentation, its present capabilities, and methods of data analysis, as well as thoughts on future developments and applications.

888 citations


Cites methods from "Automated identification of stratif..."

  • ...The algorithm is termedCluster Identification, Characterization, andRegression (Citrus) (Bruggner et al., 2014) and combines hierarchical clustering of cell events with machine learning approaches to identify statistically significant features between groups of samples or to build a predictive…...

    [...]

  • ...The algorithm is termedCluster Identification, Characterization, andRegression (Citrus) (Bruggner et al., 2014) and combines hierarchical clustering of cell events with machine learning approaches to identify statistically significant features between groups of samples or to build a predictive model for a particular sample type (Bair and Tibshirani, 2004)....

    [...]

Journal ArticleDOI
TL;DR: This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years in single-cell data science.
Abstract: The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.

677 citations

Journal ArticleDOI
26 Jan 2017-Cell
TL;DR: The critical impact of systemic immune responses that drive tumor rejection are demonstrated by developing intuitive models for visualizing single-cell data with statistical inference and analyzing immune responses in several tissues after immunotherapy.

653 citations


Cites methods from "Automated identification of stratif..."

  • ...We therefore determined whether the statistical inference integrated into Citrus could, instead, be applied to Scaffold maps....

    [...]

  • ...We then used the Significance Analysis of Microarrays framework to identify statistically significant features between the sample types (effective versus ineffective treatment groups) as in Citrus (Bair and Tibshirani, 2004; Bruggner et al., 2014)....

    [...]

  • ...Another algorithm for mass cytometry analysis, Citrus (Bruggner et al., 2014), provides statistical comparisons between groups....

    [...]

  • ...The results from Citrus, however, can be cumbersome to interpret....

    [...]

Journal ArticleDOI
TL;DR: The use of palladium-based labeling reagents expands the number of measurement channels available for mass cytometry and reduces interference with lanthanide-based antibody measurement, and an error-detecting combinatorial barcoding scheme allows cell doublets to be identified and removed from the analysis.
Abstract: Mass-tag cell barcoding (MCB) labels individual cell samples with unique combinatorial barcodes, after which they are pooled for processing and measurement as a single multiplexed sample. The MCB method eliminates variability between samples in antibody staining and instrument sensitivity, reduces antibody consumption and shortens instrument measurement time. Here we present an optimized MCB protocol. The use of palladium-based labeling reagents expands the number of measurement channels available for mass cytometry and reduces interference with lanthanide-based antibody measurement. An error-detecting combinatorial barcoding scheme allows cell doublets to be identified and removed from the analysis. A debarcoding algorithm that is single cell-based rather than population-based improves the accuracy and efficiency of sample deconvolution. This debarcoding algorithm has been packaged into software that allows rapid and unbiased sample deconvolution. The MCB procedure takes 3-4 h, not including sample acquisition time of ∼1 h per million cells.

427 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"Automated identification of stratif..." refers methods in this paper

  • ...Accordingly, Citrus constructs classification models using the lassoregularized logistic regression and nearest shrunken centroid methods (21, 22)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Abstract: Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

7,400 citations


"Automated identification of stratif..." refers methods in this paper

  • ...Alternatively, sparse regression models such as the group lasso (30) that explicitly account for correlated features could be incorporated into the Citrus workflow, thus eliminating a need to reconcile related cell subsets after regression....

    [...]

Journal ArticleDOI
TL;DR: KEGG Mapper, a collection of tools for KEGG PATHWAY, BRITE and MODULE mapping, enabling integration and interpretation of large-scale data sets and recent enhancements to the K EGG content, especially the incorporation of disease and drug information used in practice and in society, to support translational bioinformatics.
Abstract: Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/ or http://www.kegg.jp/) is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem. Major efforts have been undertaken to manually create a knowledge base for such systemic functions by capturing and organizing experimental knowledge in computable forms; namely, in the forms of KEGG pathway maps, BRITE functional hierarchies and KEGG modules. Continuous efforts have also been made to develop and improve the cross-species annotation procedure for linking genomes to the molecular networks through the KEGG Orthology system. Here we report KEGG Mapper, a collection of tools for KEGG PATHWAY, BRITE and MODULE mapping, enabling integration and interpretation of large-scale data sets. We also report a variant of the KEGG mapping procedure to extend the knowledge base, where different types of data and knowledge, such as disease genes and drug targets, are integrated as part of the KEGG molecular networks. Finally, we describe recent enhancements to the KEGG content, especially the incorporation of disease and drug information used in practice and in society, to support translational bioinformatics.

4,259 citations

Journal ArticleDOI
TL;DR: The method of “nearest shrunken centroids” identifies subsets of genes that best characterize each class, which was highly efficient in finding genes for classifying small round blue cell tumors and leukemias.
Abstract: We have devised an approach to cancer class prediction from gene expression profiling, based on an enhancement of the simple nearest prototype (centroid) classifier. We shrink the prototypes and hence obtain a classifier that is often more accurate than competing methods. Our method of "nearest shrunken centroids" identifies subsets of genes that best characterize each class. The technique is general and can be used in many other classification problems. To demonstrate its effectiveness, we show that the method was highly efficient in finding genes for classifying small round blue cell tumors and leukemias.

2,954 citations


"Automated identification of stratif..." refers methods in this paper

  • ...Accordingly, Citrus constructs classification models using the lassoregularized logistic regression and nearest shrunken centroid methods (21, 22)....

    [...]

Journal ArticleDOI
TL;DR: This work proposes summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which is presented as ROC(t), and presents an example where ROC (t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer.
Abstract: ROC curves are a popular method for displaying sensitivity and specificity of a continuous diagnostic marker, X, for a binary disease variable, D. However, many disease outcomes are time dependent, D(t), and ROC curves that vary as a function of time may be more appropriate. A common example of a time-dependent variable is vital status, where D(t) = 1 if a patient has died prior to time t and zero otherwise. We propose summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which we denote as ROC(t). A typical complexity with survival data is that observations may be censored. Two ROC curve estimators are proposed that can accommodate censored data. A simple estimator is based on using the Kaplan-Meier estimator for each possible subset X > c. However, this estimator does not guarantee the necessary condition that sensitivity and specificity are monotone in X. An alternative estimator that does guarantee monotonicity is based on a nearest neighbor estimator for the bivariate distribution function of (X, T), where T represents survival time (Akritas, M. J., 1994, Annals of Statistics 22, 1299-1327). We present an example where ROC(t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer and an example where the ROC(t) curve displays the impact of modifying eligibility criteria for sample size and power in HIV prevention trials.

2,177 citations

Related Papers (5)