Optimal gene selection for cell type discrimination in single cell analyses

doi:10.1101/599654

Open AccessPosted ContentDOI

Optimal gene selection for cell type discrimination in single cell analyses

Bianca Dumitrascu, +3 more

- 04 Apr 2019 -

bioRxiv

- pp 599654

Chats0

TLDR

Given single cell RNA-seq data and a set of cellular labels to discriminate, scGene-Fit selects gene transcript markers that jointly optimize cell label recovery using label-aware compressive classification methods, resulting in a substantially more robust and less redundant set of markers.

Abstract:

Single-cell technologies characterize complex cell populations across multiple data modalities at un-precedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers to identify and differentiate specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGene-Fit selects gene transcript markers that jointly optimize cell label recovery using label-aware compressive classification methods, resulting in a substantially more robust and less redundant set of markers than existing methods. When applied to a data set given a hierarchy of cell type labels, the markers found by our method enable the recovery of the label hierarchy through a computationally efficient and principled optimization.

Citations

PDF

Open Access

More filters

Posted Content

Estimation of Wasserstein distances in the Spiked Transport Model

Jonathan Niles-Weed, +1 more

- 16 Sep 2019 -

arXiv: Statistics Theory

TL;DR: A new statistical model is proposed, the spiked transport model, which formalizes the assumption that two probability distributions differ only on a low-dimensional subspace and establishes a lower bound showing that, in the absence of such structure, the plug-in estimator is nearly rate-optimal for estimating the Wasserstein distance in high dimension.

...read moreread less

Journal ArticleDOI

A rank-based marker selection method for high throughput scRNA-seq data.

Alexander Vargo, +1 more

- 23 Oct 2020 -

BMC Bioinformatics

TL;DR: RankCorr is a fast method with strong mathematical underpinnings that performs multi-class marker selection in an informed manner and is consistently one of most optimal marker selection methods on scRNA-seq data.

...read moreread less

Journal ArticleDOI

Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data

Aiguo Wang, +3 more

- 01 Jan 2022 -

Computers in Biology and Medicine

TL;DR: In this article , an ensemble feature selection framework was proposed to improve the discrimination and stability of finally selected features in microarray data, and two aggregation strategies were developed to combine multiple feature subsets into one set.

...read moreread less

Posted Content

Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding

Rishi Sonthalia, +1 more

- 08 May 2020 -

arXiv: Learning

TL;DR: A novel fast algorithm TreeRep is presented such that, given a $\delta$-hyperbolic metric, the algorithm learns a tree structure that approximates the original metric and analytically shows that TreeRep exactly recovers the original tree structure.

...read moreread less

Journal ArticleDOI

A robust nonlinear low-dimensional manifold for single cell RNA-seq data

Archit Verma, +1 more

- 21 Jul 2020 -

BMC Bioinformatics

TL;DR: A nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data is presented and is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments

Andrew McDavid, +7 more

- 01 Feb 2013 -

Bioinformatics

TL;DR: Gottard et al. as discussed by the authors proposed a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero).

...read moreread less

Posted Content

Solving Linear Programs in the Current Matrix Multiplication Time

Michael B. Cohen, +2 more

- 18 Oct 2018 -

arXiv: Data Structures and Algorithms

TL;DR: This paper shows how to solve linear programs of the form minAx=b,x≥0 c⊤x with n variables in time O*((nω+n2.5−α/2+ n2+1/6) log(n/δ)) where ω is the exponent of matrix multiplication, α is the dual exponent of Matrix multiplication, and δ is the relative accuracy.

...read moreread less

Proceedings ArticleDOI

Solving linear programs in the current matrix multiplication time

Michael B. Cohen, +2 more

TL;DR: In this article, a stochastic central path method was proposed to solve linear programs of the form minAx=b,x≥0c⊤x with n variables in time O*((nω+n2.5−α/2+n 2+1/6) log(n/δ)) where ω is the exponent of matrix multiplication, α is the dual exponent, and δ is the relative accuracy.

...read moreread less

Journal ArticleDOI

A unified statistical framework for single cell and bulk RNA sequencing data

Lingxue Zhu, +3 more

- 09 Mar 2018 -

The Annals of Applied Statistics

TL;DR: A Unified RNA-Sequencing Model is proposed for both single cell and bulk RNA-seq data, formulated as a hierarchical model that borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile.

...read moreread less

Journal ArticleDOI

A Unified Statistical Framework for Single Cell and Bulk RNA Sequencing Data

Lingxue Zhu, +3 more

- 26 Sep 2016 -

arXiv: Applications

TL;DR: In this article, a Unified RNA-Sequencing Model (URSM) is proposed for both single cell and bulk RNA-seq data, formulated as a hierarchical model, which can estimate cell type specific gene expression profile.

...read moreread less