scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Enhanced flowType/RchyOptimyx: a BioConductor pipeline for discovery in high-dimensional cytometry data.

01 May 2014-Bioinformatics (Oxford University Press)-Vol. 30, Iss: 9, pp 1329-1330
TL;DR: A significantly improved version of the flowType and RchyOptimyx BioConductor-based pipeline is presented that is both 14 times faster and can accommodate multiple levels of biomarker expression for up to 96 markers, positioned to be an integral part of data analysis for high-throughput experiments on high-dimensional single-cell assay platforms.
Abstract: We present a significantly improved version of the flowType and RchyOptimyx BioConductor-based pipeline that is both 14 times faster and can accommodate multiple levels of biomarker expression for up to 96 markers. With these improvements, the pipeline is positioned to be an integral part of data analysis for high-throughput experiments on high-dimensional single-cell assay platforms, including flow cytometry, mass cytometry and single-cell RT-qPCR.

Content maybe subject to copyright    Report

Citations
More filters
Posted ContentDOI
14 May 2018-bioRxiv
TL;DR: This work proposes a new method for the automated and unbiased analysis of high-dimensional single cell datasets that is simple and robust, with the goal of reducing this complex information into a familiar 2D scatter plot representation that is of immediate utility to a range of biomedical and clinical settings.
Abstract: New cytometric techniques continue to push the boundaries of multi-parameter quantitative data acquisition at the single-cell level particularly in immunology and medicine. Sophisticated analysis methods for such ever higher dimensional datasets are rapidly emerging, with advanced data representations and dimensional reduction approaches. However, these are not yet standardized and clinical scientists and cell biologists are not yet experienced in their interpretation. More fundamentally their range of statistical validity is not yet fully established. We therefore propose a new method for the automated and unbiased analysis of high-dimensional single cell datasets that is simple and robust, with the goal of reducing this complex information into a familiar 2D scatter plot representation that is of immediate utility to a range of biomedical and clinical settings. Using publicly available flow cytometry and mass cytometry datasets we demonstrate that this method (termed CytoBinning), recapitulates the results of traditional manual cytometric analyses and leads to new and testable hypotheses.

2 citations

Dissertation
08 Aug 2017
TL;DR: This thesis outlines a data processing pipeline used to isolate features for each FCS file and test the different types of features extracted on a benchmark data set from the FlowCAP-II challenge, containing data from healthy persons and patients with AML (acute myeloid leukemia).
Abstract: Flow cytometry (FCM) bioinformatics is a sub-field of bioinformatics, aimed at developing effective and efficient computational tools to store, organize, and analyze highthroughput/dimensional FCM data. Flow cytometers are capable of analyzing thousands of cells per second for up to 40 features. These features primarily signal the presence of different proteins on cells in the bloodstream. Hence contributing large amounts of data towards the big biological data paradigm. The data that a flow cytometer outputs from a biological sample, is called a FCS file. The International Mouse Phenotyping Consortium (IMPC) is a collaboration between 23 international institutions and funding organizations. Its aim is to decipher the function of 20,000 mouse genes. IMPC is doing so by breeding mice with a certain gene knocked out (KO), cancelling the function of that gene. In turn, FCM is used to measure the immunological changes correlated to this knockout. Many tools exist to classify FCS files. However, there is a lack of tools to conduct unsupervised clustering of FCS files. One goal of IMPC is to compare and contrast KO genes, hence IMPC becomes a prime motivation for this problem. As such, this thesis outlines a data processing pipeline used to isolate features for each FCS file. We then test the different types of features extracted on a benchmark data set from the FlowCAP-II challenge, containing data from healthy persons and patients with AML (acute myeloid leukemia). We then evaluate how well these features separate out FCS files of different origin (i.e. healthy vs AML).

2 citations


Cites background or methods from "Enhanced flowType/RchyOptimyx: a Bi..."

  • ...We also elaborate on how we analyze each FCS file as a cell hierarchy as defined in [75], a representation of the FCS file in the form of a structured graph incorporating all possible cell populations and the relations between them....

    [...]

  • ...We then give a definition of the cell hierarchy [75], and elaborate on how we use it to extract features from each processed FCS file....

    [...]

  • ...Pipelines that have integrated this process include SamSPECTRAL [114], FloReMi [108], gEM/GANN [105] the flowType/RchyOptimyx pipeline, the flowtype/FeaLect (Feature Selection for Sample Classification) pipeline [75], Citrus (hierarchical clustering) [17], and COMPASS [61]....

    [...]

  • ...Clustering tools such as PhenoGraph [59] and CLARA [101], and post-clustering analysis tools such as the flowType/RchyOptimyx pipeline [75] and FloReMe [108] also have visualization capabilities....

    [...]

  • ...The subsequent sections of this thesis described features we designed that take advantage of a graphical FCS file representation called the cell hierarchy [75]....

    [...]

Dissertation
04 Jun 2019
TL;DR: An extensive bibliographic research on unsupervised clustering algorithms applied to cytometry data has been performed and a methodology for performance evaluation has been developed, which has allowed the selection of a clustering algorithm, RPhenograph, to implement a Shiny application.
Abstract: in English, 250 words or less): Conventional flow cytometry is an experimental technique enabling to measure up to 30 fluorescence parameters per cell. Recently, flow cytometry has been fused to mass spectrometry giving rise to a new methodology named mass cytometry that can potentially detect up to 100 parameters per cell. Cell populations are mainly characterized by a procedure known as gating, consisting in manually delimitating cell subsets using histograms or two-dimensional dot plots in a sequential manner. This procedure is time-consuming, imprecise and particularly inadequate to be used with a high number of parameters. In the past few years new computational techniques have been developed in order to efficiently handle high-dimensional cytometry data. However, such developments are still under evaluation. Furthermore, dealing with these techniques requires proficiency in using R packages and script writing. The main objective of this project is to provide cytometrists with efficient and easy-to-use unsupervised learning algorithms and visualization tools to explore high-dimensional cytometry data in a reproducible way. To that end, an extensive bibliographic research on unsupervised clustering algorithms applied to cytometry data has been performed and a methodology for performance evaluation has been developed. A selection of algorithms has been benchmarked using this methodology and both real cytometry and synthetic data, the latter being specially generated to that end. This comparative study has allowed the selection of a clustering algorithm, RPhenograph, to implement a Shiny application. The developed methodology is now ready to be applied to benchmark further algorithms and compare performances on other experimental designs.

1 citations

Posted ContentDOI
09 Jul 2021-bioRxiv
TL;DR: A new cell population score called SpecEnr (specific enrichment) is introduced and a method that discovers robust and accurate candidate biomarkers from flow cytometry data is described that finds driver cell populations whose abundance is associated with a sample class, but not as a result of a change in a related population.
Abstract: We introduce a new cell population score called SpecEnr (specific enrichment) and describe a method that discovers robust and accurate candidate biomarkers from flow cytometry data. Our approach identifies a new class of candidate biomarkers we define as driver cell populations, whose abundance is associated with a sample class (e.g. disease), but not as a result of a change in a related population. We show that the driver cell populations we find are also easily interpretable using a lattice-based visualization tool. Our method is implemented in the R package flowGraph, freely available on GitHub (github.com/aya49/flowGraph) and will be available BioConductor. This work was featured as a spotlight and poster presentation at MLCB 2019.

1 citations


Cites background or methods from "Enhanced flowType/RchyOptimyx: a Bi..."

  • ...For our experiments, given a FCM sample containing a cell × measurement matrix and threshold gates obtained via gating, we use flowType [14] to identify all possible cell populations and enumerate their cell count....

    [...]

  • ...However, given L measurements, there are 3 · 2L 3 such relationships not including the relationship between a cell population and its indirect ancestors [14]....

    [...]

  • ..., CytoDX [11] main goal is to classify FCM samples, but it also tries to find DCPs as a postprocessing step) or compare prespecified cell populations by evaluating whether there is a large difference in their proportional abundance across samples using some statistical significance test [12, 14, 19]....

    [...]

Posted ContentDOI
16 Nov 2019-bioRxiv
TL;DR: This paper introduces a method capable of discovering biologically meaningful, interpretable, and actionable differential cell population bio-markers from flow cytometry samples of different phenotypes that can be interpreted via a lattice-based visualization tool.
Abstract: Bio-markers are measurable indicators that predict a given phenotype or disease. We introduce a method capable of discovering biologically meaningful, interpretable, and actionable differential cell population bio-markers from flow cytometry samples of different phenotypes. Cell populations are groups of cells that contain the same set of proteins. Differential cell populations are those that have a significantly changed abundance between samples of different phenotypic types. Existing methods for differential cell population identification fall into one of three categories: methods that 1) compare a limited set of pre-specified mutually exclusive cell populations that do not share cells, 2) find differential cell populations as a byproduct of another procedure, and 3) compare overlapping cell populations in a search space of all possible cell populations. The cell populations analyzed in 3) are dependent on each other and can be difficult to interpret. For example, an increase in one cell population (e.g. a bio-marker for a phenotype of interest) may induce an increase in several cell populations that share its cells. Our method solves this issue by taking into account these dependencies by finding only cell populations that are the source of these changes. Bio-markers can then be interpreted via a lattice-based visualization tool that depicts how these bio-markers affect each cell population and how they differentiate between samples of different phenotypes. This abstract has been accepted as a spotlight poster at Machine Learning for Computational Biology 2019 (MLCB)

Cites background or methods from "Enhanced flowType/RchyOptimyx: a Bi..."

  • ...These gates and their corresponding pre-processed FCM samples (R×L matrix) are then given as input to flowType [17] which generates the cell count for each of the m = 3L possible cell populations, each being labelled based on whether or not a specific group of markers have high FI or are present on the cells....

    [...]

  • ...The third category, implemented by flowType [17], considers each possible subset of markers as defining a potential differential cell population....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: K shortest paths are given for finding the k shortest paths connecting a pair of vertices in a digraph, and applications to dynamic programming problems including the knapsack problem, sequence alignment, maximum inscribed polygons, and genealogical relationship discovery are described.
Abstract: We give algorithms for finding the k shortest paths (not required to be simple) connecting a pair of vertices in a digraph. Our algorithms output an implicit representation of these paths in a digraph with n vertices and m edges, in time O(m + n log n + k). We can also find the k shortest paths from a given source s to each vertex in the graph, in total time O(m + n log n + kn). We describe applications to dynamic programming problems including the knapsack problem, sequence alignment, maximum inscribed polygons, and genealogical relationship discovery.

1,413 citations

Proceedings ArticleDOI
20 Nov 1994
TL;DR: K shortest paths are given for finding the k shortest paths connecting a pair of vertices in a digraph, and applications to dynamic programming problems including the knapsack problem, sequence alignment, and maximum inscribed polygons are described.
Abstract: We give algorithms for finding the k shortest paths (not required to be simple) connecting a pair of vertices in a digraph. Our algorithms output an implicit representation of these paths in a digraph with n vertices and m edges, in time O(m+n log n+k). We can also find the k shortest paths from a given source s to each vertex in the graph, in total time O(m+n log n+kn). We describe applications to dynamic programming problems including the knapsack problem, sequence alignment, and maximum inscribed polygons. >

750 citations


"Enhanced flowType/RchyOptimyx: a Bi..." refers methods in this paper

  • ...RchyOptimyx uses a dynamic programing algorithm for efficiently constructing k-shortest paths (Eppstein, 1998)....

    [...]

Journal ArticleDOI
TL;DR: These cytometric technologies, capable of high-content, high-throughput single-cell assays, and a new technology that promises to extend these capabilities significantly are reviewed.

593 citations


"Enhanced flowType/RchyOptimyx: a Bi..." refers background in this paper

  • ...Since then, mass cytometry has enabled measurement of 30–45markers/cell (Bendall et al., 2012), whereas single-cellmultiplexed RT-qPCR can measure 50–96 messenger RNAs/cell (White et al., 2011)....

    [...]

Journal ArticleDOI
TL;DR: Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
Abstract: Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.

562 citations


"Enhanced flowType/RchyOptimyx: a Bi..." refers methods in this paper

  • ...FlowType uses partitioning of cells, either manually or by clustering, into positive or negative for each marker to enumerate all cell types in a sample, e.g. Aghaeepour et al. (2013)....

    [...]

Journal ArticleDOI
TL;DR: This work presents a fully integrated microfluidic device capable of performing high-precision RT-qPCR measurements of gene expression from hundreds of single cells per run, and shows that nanoliter volume processing reduced measurement noise, increased sensitivity, and provided single nucleotide specificity.
Abstract: A long-sought milestone in microfluidics research has been the development of integrated technology for scalable analysis of transcription in single cells Here we present a fully integrated microfluidic device capable of performing high-precision RT-qPCR measurements of gene expression from hundreds of single cells per run Our device executes all steps of single-cell processing, including cell capture, cell lysis, reverse transcription, and quantitative PCR In addition to higher throughput and reduced cost, we show that nanoliter volume processing reduced measurement noise, increased sensitivity, and provided single nucleotide specificity We apply this technology to 3,300 single-cell measurements of (i) miRNA expression in K562 cells, (ii) coregulation of a miRNA and one of its target transcripts during differentiation in embryonic stem cells, and (iii) single nucleotide variant detection in primary lobular breast cancer cells The core functionality established here provides the foundation from which a variety of on-chip single-cell transcription analyses will be developed

493 citations