Author

Lukas Großberger

Bio: Lukas Großberger is an academic researcher. The author has contributed to research in topics: Manifold (fluid mechanics) & Projection (mathematics). The author has an hindex of 1, co-authored 1 publications receiving 1972 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

UMAP: Uniform Manifold Approximation and Projection

[...]

Leland McInnes, John Healy, Nathaniel Saul, Lukas Großberger

02 Sep 2018

TL;DR: Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.

...read moreread less

Abstract: Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. UMAP has a rigorous mathematical foundation, but is simple to use, with a scikit-learn compatible API. UMAP is among the fastest manifold learning implementations available – significantly faster than most t-SNE implementations.

...read moreread less

4,141 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The mutational constraint spectrum quantified from variation in 141,456 humans

[...]

Konrad J. Karczewski¹, Laurent C. Francioli¹, Grace Tiao¹, Beryl B. Cummings¹, Jessica Alföldi¹, Qingbo Wang¹, Ryan L. Collins¹, Kristen M. Laricchia¹, Andrea Ganna¹, Daniel P. Birnbaum¹, Laura D. Gauthier¹, Harrison Brand¹, Matthew Solomonson¹, Nicholas A. Watts¹, Daniel R. Rhodes², Moriel Singer-Berk¹, Eleina M. England¹, Eleanor G. Seaby¹, Jack A. Kosmicki¹, Raymond K. Walters¹, Katherine Tashman¹, Yossi Farjoun¹, Eric Banks¹, Timothy Poterba¹, Arcturus Wang¹, Cotton Seed¹, Nicola Whiffin¹, Jessica X. Chong³, Kaitlin E. Samocha⁴, Emma Pierce-Hoffman¹, Zachary Zappala¹, Anne H. O’Donnell-Luria¹, Eric Vallabh Minikel¹, Ben Weisburd¹, Monkol Lek⁵, James S. Ware¹, Christopher Vittal⁶, Irina M. Armean¹, Louis Bergelson¹, Kristian Cibulskis¹, Kristen M. Connolly¹, Miguel Covarrubias¹, Stacey Donnelly¹, Steven Ferriera¹, Stacey Gabriel¹, Jeff Gentry¹, Namrata Gupta¹, Thibault Jeandet¹, Diane Kaplan¹, Christopher Llanwarne¹, Ruchi Munshi¹, Sam Novod¹, Nikelle Petrillo¹, David Roazen¹, Valentin Ruano-Rubio¹, Andrea Saltzman¹, Molly Schleicher¹, Jose Soto¹, Kathleen Tibbetts¹, Charlotte Tolonen¹, Gordon Wade¹, Michael E. Talkowski¹, Benjamin M. Neale¹, Mark J. Daly¹, Daniel G. MacArthur¹ - Show less +61 more•Institutions (6)

Broad Institute¹, Queen Mary University of London², University of Washington³, Wellcome Trust Sanger Institute⁴, Yale University⁵, Harvard University⁶

27 May 2020-Nature

TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

4,913 citations

Journal Article•DOI•

Dimensionality reduction for visualizing single-cell data using UMAP.

[...]

Etienne Becht¹, Leland McInnes, John Healy, Charles-Antoine Dutertre¹, Immanuel Kwok¹, Lai Guan Ng¹, Florent Ginhoux¹, Evan W. Newell², Evan W. Newell¹ - Show less +5 more•Institutions (2)

Agency for Science, Technology and Research¹, Fred Hutchinson Cancer Research Center²

01 Jan 2019-Nature Biotechnology

TL;DR: Comparing the performance of UMAP with five other tools, it is found that UMAP provides the fastest run times, highest reproducibility and the most meaningful organization of cell clusters.

...read moreread less

Abstract: Advances in single-cell technologies have enabled high-resolution dissection of tissue composition. Several tools for dimensionality reduction are available to analyze the large number of parameters generated in single-cell studies. Recently, a nonlinear dimensionality-reduction technique, uniform manifold approximation and projection (UMAP), was developed for the analysis of any type of high-dimensional data. Here we apply it to biological data, using three well-characterized mass cytometry and single-cell RNA sequencing datasets. Comparing the performance of UMAP with five other tools, we find that UMAP provides the fastest run times, highest reproducibility and the most meaningful organization of cell clusters. The work highlights the use of UMAP for improved visualization and interpretation of single-cell data.

...read moreread less

3,016 citations

Journal Article•DOI•

Fast, sensitive and accurate integration of single-cell data with Harmony.

[...]

Ilya Korsunsky, Nghia Millard, Jean Fan¹, Kamil Slowikowski, Fan Zhang, Kevin Wei², Yuriy Baglaenko, Michael B. Brenner², Po-Ru Loh³, Po-Ru Loh², Po-Ru Loh¹, Soumya Raychaudhuri - Show less +8 more•Institutions (3)

Harvard University¹, Brigham and Women's Hospital², Broad Institute³

18 Nov 2019-Nature Methods

TL;DR: Harmony, for the integration of single-cell transcriptomic data, identifies broad and fine-grained populations, scales to large datasets, and can integrate sequencing- and imaging-based data.

...read moreread less

Abstract: The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony ( https://github.com/immunogenomics/harmony ), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data. Harmony, for the integration of single-cell transcriptomic data, identifies broad and fine-grained populations, scales to large datasets, and can integrate sequencing- and imaging-based data.

...read moreread less

2,459 citations

Journal Article•DOI•

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

[...]

Christoph Hafemeister, Rahul Satija¹•Institutions (1)

New York University¹

23 Dec 2019-Genome Biology

TL;DR: It is proposed that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity.

...read moreread less

Abstract: Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.

...read moreread less

1,898 citations

Journal Article•DOI•

The single-cell transcriptional landscape of mammalian organogenesis

[...]

Junyue Cao¹, Malte Spielmann¹, Xiaojie Qiu¹, Xingfan Huang¹, Daniel M. Ibrahim², Daniel M. Ibrahim³, Andrew J. Hill¹, Fan Zhang⁴, Stefan Mundlos³, Stefan Mundlos², Lena Christiansen⁴, Frank J. Steemers⁴, Cole Trapnell¹, Jay Shendure - Show less +10 more•Institutions (4)

University of Washington¹, Charité², Max Planck Society³, Illumina⁴

01 Feb 2019-Nature

TL;DR: A cell atlas of mouse organogenesis provides a global view of developmental processes occurring during this critical period, including focused analyses of the apical ectodermal ridge, limb mesenchyme and skeletal muscle.

...read moreread less

Abstract: Mammalian organogenesis is a remarkable process. Within a short timeframe, the cells of the three germ layers transform into an embryo that includes most of the major internal and external organs. Here we investigate the transcriptional dynamics of mouse organogenesis at single-cell resolution. Using single-cell combinatorial indexing, we profiled the transcriptomes of around 2 million cells derived from 61 embryos staged between 9.5 and 13.5 days of gestation, in a single experiment. The resulting ‘mouse organogenesis cell atlas’ (MOCA) provides a global view of developmental processes during this critical window. We use Monocle 3 to identify hundreds of cell types and 56 trajectories, many of which are detected only because of the depth of cellular coverage, and collectively define thousands of corresponding marker genes. We explore the dynamics of gene expression within cell types and trajectories over time, including focused analyses of the apical ectodermal ridge, limb mesenchyme and skeletal muscle. Data from single-cell combinatorial-indexing RNA-sequencing analysis of 2 million cells from mouse embryos between embryonic days 9.5 and 13.5 are compiled in a cell atlas of mouse organogenesis, which provides a global view of developmental processes occurring during this critical period.

...read moreread less

1,865 citations

Collapse