scispace - formally typeset
Open AccessPosted ContentDOI

Optimal gene selection for cell type discrimination in single cell analyses

Reads0
Chats0
TLDR
Given single cell RNA-seq data and a set of cellular labels to discriminate, scGene-Fit selects gene transcript markers that jointly optimize cell label recovery using label-aware compressive classification methods, resulting in a substantially more robust and less redundant set of markers.
Abstract
Single-cell technologies characterize complex cell populations across multiple data modalities at un-precedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers to identify and differentiate specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGene-Fit selects gene transcript markers that jointly optimize cell label recovery using label-aware compressive classification methods, resulting in a substantially more robust and less redundant set of markers than existing methods. When applied to a data set given a hierarchy of cell type labels, the markers found by our method enable the recovery of the label hierarchy through a computationally efficient and principled optimization.

read more

Citations
More filters
Posted Content

Estimation of Wasserstein distances in the Spiked Transport Model

TL;DR: A new statistical model is proposed, the spiked transport model, which formalizes the assumption that two probability distributions differ only on a low-dimensional subspace and establishes a lower bound showing that, in the absence of such structure, the plug-in estimator is nearly rate-optimal for estimating the Wasserstein distance in high dimension.
Journal ArticleDOI

A rank-based marker selection method for high throughput scRNA-seq data.

TL;DR: RankCorr is a fast method with strong mathematical underpinnings that performs multi-class marker selection in an informed manner and is consistently one of most optimal marker selection methods on scRNA-seq data.
Journal ArticleDOI

Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data

TL;DR: In this article , an ensemble feature selection framework was proposed to improve the discrimination and stability of finally selected features in microarray data, and two aggregation strategies were developed to combine multiple feature subsets into one set.
Posted Content

Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding

TL;DR: A novel fast algorithm TreeRep is presented such that, given a $\delta$-hyperbolic metric, the algorithm learns a tree structure that approximates the original metric and analytically shows that TreeRep exactly recovers the original tree structure.
Journal ArticleDOI

A robust nonlinear low-dimensional manifold for single cell RNA-seq data

TL;DR: A nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data is presented and is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.
References
More filters
Journal ArticleDOI

Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments

TL;DR: Gottard et al. as discussed by the authors proposed a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero).
Posted Content

Solving Linear Programs in the Current Matrix Multiplication Time

TL;DR: This paper shows how to solve linear programs of the form minAx=b,x≥0 c⊤x with n variables in time O*((nω+n2.5−α/2+ n2+1/6) log(n/δ)) where ω is the exponent of matrix multiplication, α is the dual exponent of Matrix multiplication, and δ is the relative accuracy.
Proceedings ArticleDOI

Solving linear programs in the current matrix multiplication time

TL;DR: In this article, a stochastic central path method was proposed to solve linear programs of the form minAx=b,x≥0c⊤x with n variables in time O*((nω+n2.5−α/2+n 2+1/6) log(n/δ)) where ω is the exponent of matrix multiplication, α is the dual exponent, and δ is the relative accuracy.
Journal ArticleDOI

A unified statistical framework for single cell and bulk RNA sequencing data

TL;DR: A Unified RNA-Sequencing Model is proposed for both single cell and bulk RNA-seq data, formulated as a hierarchical model that borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile.
Journal ArticleDOI

A Unified Statistical Framework for Single Cell and Bulk RNA Sequencing Data

TL;DR: In this article, a Unified RNA-Sequencing Model (URSM) is proposed for both single cell and bulk RNA-seq data, formulated as a hierarchical model, which can estimate cell type specific gene expression profile.
Related Papers (5)