scispace - formally typeset
Search or ask a question
Author

Louis Faure

Bio: Louis Faure is an academic researcher. The author has contributed to research in topics: Distance matrix & Nonlinear dimensionality reduction. The author has an hindex of 3, co-authored 3 publications receiving 43 citations.

Papers
More filters
Journal ArticleDOI
04 Mar 2020-Entropy
TL;DR: ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology, and is capable of approximating data point clouds via principal graph ensembles.
Abstract: Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in various fields such as biology, where it can be used for example with single-cell transcriptomic or epigenomic datasets to infer gene expression dynamics and recover differentiation landscapes.

44 citations

Posted Content
TL;DR: ElPiGraph as mentioned in this paper is a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph, and is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles.
Abstract: Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.

29 citations

Posted Content
TL;DR: ElPiGraph is currently implemented in five programming languages and accompanied by a graphical user interface, which makes it a versatile tool to deal with complex data in various fields from molecular biology, where it can be used to infer pseudo-time trajectories from single-cell RNASeq, to astronomy, where its used to approximate complex structures in the distribution of galaxies.
Abstract: We present ElPiGraph, a method for approximating data distributions having non-trivial topological features such as the existence of excluded regions or branching structures. Unlike many existing methods, ElPiGraph is not based on the construction of a k-nearest neighbour graph, a procedure that can perform poorly in the case of multidimensional and noisy data. Instead, ElPiGraph constructs elastic principal graphs in a more robust way by minimizing elastic energy, applying graph grammars and explicitly controlling topological complexity. Using trimmed approximation error function makes ElPiGraph extremely robust to the presence of background noise without decreasing computational performance and allows it to deal with complex cases of manifold learning (for example, ElPiGraph can learn disconnected intersecting manifolds). Thanks to the quasi-quadratic nature of the elastic function, ElPiGraph performs almost as fast as a simple k-means clustering and, therefore, is much more scalable than alternative methods, and can work on large datasets containing millions of high dimensional points on a personal computer. The excellent performance of the method opens the possibility to apply resampling and to approximate complex data structures via principal graph ensembles which can be used to construct consensus principal graphs. ElPiGraph is currently implemented in five programming languages and accompanied by a graphical user interface, which makes it a versatile tool to deal with complex data in various fields from molecular biology, where it can be used to infer pseudo-time trajectories from single-cell RNASeq, to astronomy, where it can be used to approximate complex structures in the distribution of galaxies.

11 citations


Cited by
More filters
Book Chapter
24 Oct 2007
TL;DR: The book is meant to be useful for practitioners in applied data analysis in life sciences, engineering, physics and chemistry; it will also be valuable to PhD students.
Abstract: The book is meant to be useful for practitioners in applied data analysis in life sciences, engineering, physics and chemistry; it will also be valuable to PhD ...

237 citations

Journal ArticleDOI
TL;DR: STREAM is a pipeline for reconstruction and visualization of differentiation trajectories from both single-cell RNA-seq and ATAC-seq data and its utility for understanding myoblast differentiation and disentangling known heterogeneity in hematopoiesis for different organisms is demonstrated.
Abstract: Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. We have tested STREAM on several synthetic and real datasets generated with different single-cell technologies. We further demonstrate its utility for understanding myoblast differentiation and disentangling known heterogeneity in hematopoiesis for different organisms. STREAM is an open-source software package. The increasing accessibility of single cell omics technologies beyond transcriptomics demands parallel advances in analysis. Here, the authors introduce STREAM, a pipeline for reconstruction and visualization of differentiation trajectories from both single-cell RNA-seq and ATAC-seq data.

196 citations

Journal ArticleDOI
TL;DR: In this paper, a single-nucleus RNA-seq was applied to map the plasticity of mouse epididymal white adipose tissue at single nucleus resolution in response to highfat-diet-induced obesity.

104 citations

Journal ArticleDOI
TL;DR: This study reveals sources of intratumoral heterogeneity within EwS tumors by combining independent component analysis of single-cell RNA sequencing data from diverse cell types and model systems with time-resolved mapping of EWSR1-FLI1 binding sites and of open chromatin regions to characterize dynamic cellular processes associated with EWSR 1- FLI1 activity.

84 citations

Journal ArticleDOI
TL;DR: Deep single cell analysis is used to resolve fate splits and molecular biasing processes during sensory neurogenesis in mice to show that sensory neuron diversity is achieved by a transition through a bi-potential intermediate state.
Abstract: Somatic sensation is defined by the existence of a diversity of primary sensory neurons with unique biological features and response profiles to external and internal stimuli. However, there is no coherent picture about how this diversity of cell states is transcriptionally generated. Here, we use deep single cell analysis to resolve fate splits and molecular biasing processes during sensory neurogenesis in mice. Our results identify a complex series of successive and specific transcriptional changes in post-mitotic neurons that delineate hierarchical regulatory states leading to the generation of the main sensory neuron classes. In addition, our analysis identifies previously undetected early gene modules expressed long before fate determination although being clearly associated with defined sensory subtypes. Overall, the early diversity of sensory neurons is generated through successive bi-potential intermediates in which synchronization of relevant gene modules and concurrent repression of competing fate programs precede cell fate stabilization and final commitment. The diversity of primary sensory neurons and how fate choice is determined is unclear. Here, the authors use single cell RNA sequencing analysis of early murine somatosensory neurons to show that sensory neuron diversity is achieved by a transition through a bi-potential intermediate state.

45 citations