scispace - formally typeset
Search or ask a question
Posted ContentDOI

A reference tissue atlas for the human kidney

TL;DR: In this article, the authors describe the construction of an integrated reference tissue map of cells, pathways and genes using unaffected regions of nephrectomy tissues and undiseased human biopsies from 55 subjects.
Abstract: Kidney Precision Medicine Project (KPMP) is building a spatially-specified human tissue atlas at the single-cell resolution with molecular details of the kidney in health and disease. Here, we describe the construction of an integrated reference tissue map of cells, pathways and genes using unaffected regions of nephrectomy tissues and undiseased human biopsies from 55 subjects. We use single-cell and -nucleus transcriptomics, subsegmental laser microdissection bulk transcriptomics and proteomics, near-single-cell proteomics, 3-D nondestructive and CODEX imaging, and spatial metabolomics data to hierarchically identify genes, pathways and cells. Integrated data from these different technologies coherently describe cell types/subtypes within different nephron segments and interstitium. These spatial profiles identify cell-level functional organization of the kidney tissue as indicative of their physiological functions and map different cell subtypes to genes, proteins, metabolites and pathways. Comparison of transcellular sodium reabsorption along the nephron to levels of mRNAs encoding the different sodium transporter genes indicate that mRNA levels are largely congruent with physiological activity.This reference atlas provides an initial framework for molecular classification of kidney disease when multiple molecular mechanisms underlie convergent clinical phenotypes.

Summary (6 min read)

INTRODUCTION

  • The kidney is one of the most diverse organs in the human body in terms of its cellular heterogeneity, and possibly second only to the brain in its spatial complexity.
  • Delineating the cell types and subtypes in different regions of the kidney during health and disease will help identify the tissue-level, cellular and subcellular pathways and processes involved in disease initiation and progression, and aid in drug discovery.
  • The KPMP features an expanding set of complementary set of high throughput assays for molecular entities that span transcriptomic, proteomic, metabolomic profiles and spatial/structural properties of kidney tissue.
  • The KPMP envisions that harmonization and integration of different types of molecular data from omics assays, combined with state-of-the-art pathological and clinical descriptors, will allow us to classify different disease subtypes and states for diagnostic and therapeutic purposes.
  • Numerous groups have proposed the use of integrated multiomics analysis to characterize disease phenotypes using tools that include Bayesian, correlative, network-based and machine learning-based clustering algorithms 2-4.

Outline of KPMP Data Types

  • In these analyses, there were four transcriptomic, two proteomic, one imaging-based, and one spatial metabolomics tissue interrogation assays that consisted of 3 to 48 different datasets obtained from 3 to 22 participants (Supplementary Table 1).
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.
  • Hierarchical clustering of the correlation coefficients documented that the absolute gene and protein expression values are specific for a particular platform and not for their anatomical origin (Supplementary Figure 3A).
  • While imaging assays identify the spatial localization of individual cells together with their expression signatures for a limited number of proteins, single-cell RNASeq assays provide more extensive transcriptomic profiles for individual cells.
  • The pathways that are predicted for non-glomerular metabolites either overlapped with or were closely related to the pathways that are predicted for proximal tubule cells and subsegments based on the other datasets (Figure 2A).

DISCUSSION

  • The advances in transcriptomic technologies along with other omics and imaging assays offer unprecedented insights into the organization of tissues at cellular resolution and the molecular constituents of the different cell types and their subtypes.
  • The authors predict that these subtypes differ in their potential for lipid metabolism, which is critically important for the physiological function of proximal tubule cells, as cellular energetics have been shown to be critical for reabsorptive activity.
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.
  • Nevertheless, their post-hoc power analysis can help to estimate the reliability of an identified cell subtype or predicted disease mechanism by documenting that it is consistently recovered using down-sampled datasets.
  • In addition to the integrated analytics presented here, the KPMP is also building a community-based Kidney Tissue Atlas Ontology (KTAO), which will systematically integrate different types information (such as clinical, pathological, cell and molecular) into a logically defined tissue atlas, which can then be further utilized to support various applications 34.

METHODS

  • Omics and imaging assays used within KPMP target different types of molecular components with different resolution, sensitivity and precision.
  • An important function of the KPMP Central Hub is to integrate the different types of data using a set of analytical techniques.
  • The pilot data presented for each assay comprises 3 to 48 different datasets that are obtained from 3 to 22 participants (Supplementary Table 1).
  • Participants kidne tissue was procured from a spectrum of tissue resources including from unaffected parts of tumor nephrectomy specimen (n=38), living donor preperfusion biopsies (n=3), diseased donor nephrectomies (n=5), and normal surveillance transplant (n=5) and native kidney biopsies (n=4).
  • Within each assay the authors generated lists of differentially expressed genes (DEGs), proteins (DEPs) and metabolites that describe those genes, proteins or metabolites that are upregulated or enriched in a particular single cell cluster, single nucleus cluster or kidney subsegment, if compared to all other clusters or subsegments.

Ranking of Differentially Expressed Genes and Proteins

  • In the case of the DEGs and DEPs that were used for dynamic enrichment analysis, 6 module identification, 21 and post-hoc power analysis, single nucleus and single cell DEGs were first ranked by p-value and then by decreasing fold changes (i.e., fold changes were used as a tiebreaker).
  • Top ranked 300 entities were subjected to downstream analysis.
  • Similarly, DEGs and DEPs obtained for each kidney subsegment based on LMD bulk RNASeq, or LMD and NSC proteomics, were ranked first by p-value and decreasing fold changes and the top ranked 300 DEGs and DEPs subjected to pathway enrichment analysis or module detection (see below).
  • Therefore, the authors could not calculate p-values for the LMD and NSC technologies.
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.

Standard and Dynamic Enrichment Analysis

  • Top DEGs and DEPs for each podocyte cluster/glomerulus, proximal tubule cell cluster/tubulointerstitium and principal cell cluster/collecting duct subsegment were separately subjected to standard enrichment analysis using Gene Ontology Biological Processes (GO BPs) or the Molecular Biology of the Cell Ontology (MBCO) level-3 subcellular processes (SCPs) 6 and Fisher’s Exact Test.
  • Only genes/proteins that are detected by this method and statistically analyzed for differential expression can be identified as DEGs/DEPs and only these genes/proteins are considered as the background set for the Fisher’s Exact test.
  • Ontological background genes/proteins were all genes that are annotated to at least one pathway within that particular ontology.
  • Dynamic enrichment analysis uses these relationships to generate context-specific higher-level processes by merging functionally related SCPs that contain at least one DEG or DEP.
  • The top five predicted SCPs or merged SCPs are connected based on the inferred relationships, and all networks for a particular cell type/segment merged, whereby each SCP was color-coded according to the source assay(s) that initiated its dynamic enrichment.

Module Detection

  • In parallel to enrichment analyses, the authors also performed another network-based pathway enrichment technique, identifying modules of cell-type specific marker genes within the kidneyspecific functional network using the HumanBase interface (hb.flatironinstitute.org).
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.
  • BioRxiv preprint 14 DEPs from each proteomics dataset.
  • Module detection is a network-based approach described in Krishnan et al., and construction of the functional networks is described in Greene et al 20, 21.
  • Modules are detected using a community clustering algorithm based on connectivity between genes in the kidney-specific functional network, and enrichment analysis is subsequently performed to identify functional enrichments in each module.

Enrichment Analysis of Metabolites

  • All glomerular and nonglomerular metabolites that were identified for the three participants were merged and subjected to pathway enrichment analysis using MetaboAnalyst 25.
  • The top six predicted metabolic pathways were mapped onto MBCO pathways whenever possible; if they did not have a corresponding pathway, the original pathway names were preserved.

Integration of Single-Cell/Single-Nucleus Transcriptomics

  • In contrast to bulk mRNA sequencing, where the gene expression measurements reflect an average across all captured cell types, single-cell or single-nucleus mRNA sequencing allows the measurement and comparison of comprehensive gene sets obtained from individual cells.
  • Single-cell transcriptomic data was produced by PREMIERE (24 libraries from 22 participants) 8 and UCSF (10 libraries from 10 participants), whereas the single-nucleus data was made by UCSD (47 libraries from 15 participants).
  • Data from each site were first processed using the Seurat 3.0 R package 26.
  • These anchor genes were then used to harmonize the datasets.
  • The downstream process included scaling, principal component analysis, batch integration using harmony, dimensionality reduction using Uniform Manifold Approximation and Projection (UMAP), and unsupervised clustering.

Integration of Single-cell, Single-nucleus and Laser Capture Microdissection Bulk Transcriptomics

  • To integrate single-cell sequencing, single-nucleus sequencing, and LMD bulk transcriptomic datasets, the authors first determined the overlap between genes identified both in the LMD dataset and in the corresponding single-cell transcriptomic dataset.
  • From this set of was not certified by peer review) is the author/funder.
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.
  • The authors then computed the Pearson correlation between each individual cell in a scaled single-cell dataset and the LMD transcriptomic dataset for the same participant.
  • Using this approach, the authors can assign each cell to the appropriate LMD segment that shows the highest correlation value.

Post-hoc power analysis

  • The PREMIERE single-cell RNASeq 8 and the UCSD/WU single-nucleus RNASeq 9 datasets were obtained from 22 and 15 participants, respectively, whose samples were sequenced in 24 and 47 libraries.
  • The authors used jackstraw analysis to identify the last significant principal component (alpha = 0.01) among the top 20 components.
  • To document the reliability of that cell type assignment the authors compared its p-value to the p-value of the second prediction (that cell type whose essential genes had the second most significant enrichment among the DEGs of that cluster).
  • The authors progressively and randomly removed libraries from the full datasets to generate 100 non-overlapping downsampled datasets for each number of remaining participants.
  • Additionally, the top 300 significant DEPs of each subsegment were subjected to enrichment analysis and predicted pathways compared as described above.

Proteomic-Transcriptomic Co-expression Analysis

  • LMD and NSC proteomic datasets identified protein expression in two kidney subsegments: glomeruli and tubulointerstitium for LMD and glomeruli and proximal tubule for NSC.
  • The authors identified technology and participant specific cluster gene expression, using the “Average Expression” functionality embedded in Seurat R package (RNA assay, counts slot) on the cells/nuclei assigned to the same clusters in the integrated PREMIERE, UCSF and UCSD/WU data analysis described above.
  • The intersection of all background sets was defined as the set of common genes.
  • Ratios were inverted to describe proximal tubule/tubulointerstitial specific gene expression.
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.

Comparison of Cell Type-specific Imaging and Transcriptomic Expression Data

  • To integrate cell type-specific imaging and transcriptomic data, the authors first constructed matrices with average expression values for each gene in each cell type cluster for both the set of 16 normalized integrated transcriptomic clusters and the CODEX clusters.
  • The authors normalized each gene in both transcriptomic and CODEX matrices to have a mean of 0 and standard deviation of 1.
  • The authors then filtered both datasets to include only genes represented in both the transcriptomic and the imaging datasets and computed the average expression of each gene/protein in each cell type.
  • The authors next considered the problem of constructing a matrix to computationally map transcriptomic cell clusters to the imaging cell clusters.
  • Before visualizing matrix M as a heatmap, the authors first normalized each row to have mean of 0 and standard deviation of 1 in order to identify the transcriptomic cell types that are weighted most heavily in the mapping to each imaging cell type.

Generating Pathway Maps for Beta-oxidation Network from Single-cell RNASeq Clusters

  • To better understand one of the most significantly enriched pathways in their integrated analytics of proximal tubules, reactions involved in fatty acid beta oxidation were extracted from KEGG (www.genome.jp/kegg). was not certified by peer review) is the author/funder.
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.
  • Datasets were subjected to an automated single-cell/nucleus and proteomic data analysis pipeline and results compared between the downsampled and complete reference datasets.
  • ‘Cluster count’ documents how many clusters were assigned to a particular cell type.

50% of all SCPs that were part of the top seven predictions based on dynamic enrichment

  • Libraries label the number of used sequencing libraries for each down-sampled dataset, cells the average number of total cells that were obtained from those libraries.
  • The copyright holder for this preprint (whichthis version posted July 24, 2020.
  • Labels describing podocyte/glomerular and proximal tubule/tubulointerstitium RNASeq and proteomic datasets are colored aquamarine and orange, respectively.
  • Curly brackets group samples obtained by the same technology: 1: LMD RNASeq, 2: NSC/LMD Proteomics, 3: SC RNASeq PREMIERE, 4: SC RNASeq UCSF, 5: SN RNASeq UCSD/WU.
  • BioRxiv preprint Figure 7 A Prior knowledge Integration with multiomics and imaging data Models and predictions of tissue function Cell subtype-specific compartmental metabolic networks Dynamic models of metabolic pathways in different subtypes of proximal tubule cells Subcellular compartments of enzymes Metabolic reactions.

Single-nucleus RNASeq (UCSD/WashU) and Single-cell RNASeq (PREMIERE)

  • UMI count matrixes and list of differentially expressed genes were downloaded from published analyses for the PREMIERE TIS (composed of Michigan, Princeton, Broad) singlecell RNA sequencing 8 and UCSD/WashU TIS Single-nucleus 9 datasets.
  • The authors excluded the proximal tubular cells-3 and principal cells-2 clusters from the single-nucleus RNASeq dataset, since these clusters showed an inflammatory or a stress response.

Subsegmental LMD Transcriptomics (IU/OSU)

  • A comprehensive Laser MicroDissection (LMD) protocol is published on protocols. io (https://www.protocols.io/view/laser-microdissection-8rkhv4w).
  • Briefly, 12 m frozen sections are obtained from an Optimal Cutting Temperature (OCT) preserved tissue block and adhered to LMD membrane slides (Leica, Buffalo Grove, IL).
  • Slides undergo dissection with a Leica LMD6500 system with pulsed UV laser.
  • RNA quality is assessed by bioanalyzer, ribosomal RNA is depleted, and cDNA libraries are prepared using the SMARTer Universal Low Input RNA Kit (Takara, No. 634938).
  • Total read counts mapping to each gene were generated with edgeR, normalized, and converted to expression ratios.

Subsegmental LMD Proteomics (IU/OSU)

  • A comprehensive Laser MicroDissection (LMD) proteomics protocol is published on protocols.
  • The authors LMD proteomic methods have also been previously published in detail 29, 30.
  • Glomerular gene expression was compared to the tubulointerstitial gene expression using an unpaired t-test with equal variance.
  • The entire 3-D fluorescence imaging and tissue cytometry protocol is published on protocols.
  • Images were acquired in up to 8 channels using a Leica SP8 Confocal Microscope.

Spatial Metabolomics (UTHSA-PNNL-EMBL)

  • 10 m thick renal cortical tissues were sectioned on a cryostat (Leica Microsystems) and prepared for matrix assisted laser deposition imaging mass spectrometry by spraying 3 with the norharmane matrix using the TM-Sprayer automated spraying robot (HTX Technology).
  • For dynamic enrichment analysis all SCPs among the top 25 predictions were compared.
  • Top 300 DEGs or DEPs were subjected to pathway enrichment analysis and (D) the top-50 GO BPs and (E) MBCO level-3 SCPs subjected to hierarchical clustering based on pairwise correlation coefficients between - log10(p-values).
  • A B Supplementary Figure 4 Supplementary Figure 4.

12. van Swelm RPL, Wetzels JFM, Swinkels DW. The multifaceted role of iron in renal health and

  • Proximal tubule H-ferritin mediates iron trafficking in acute kidney injury.
  • Changes in membrane sphingolipid composition modulate dynamics and adhesion of integrin nanoclusters.
  • Differentiation of human neuroblastoma cell line IMR-32 by sildenafil and its newly discovered analogue IS00384.

27. McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA

  • Binder JX, Pletscher-Frankild S, Tsafou K, et al.
  • Unification and visualization of protein subcellular localization evidence, also known as COMPARTMENTS.
  • Characterization of glomerular diseases using proteomic analysis of laser capture microdissected glomeruli.
  • Modeling Kidney Disease Using Ontology: Perspectives from the KPMP.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
Towards Building a Smart Kidney Atlas: Network-based integration of multimodal
transcriptomic, proteomic, metabolomic and imaging data in the Kidney Precision
Medicine Project
Jens Hansen
1,*
, Rachel Sealfon
2,*
, Rajasree Menon
3,*
, Michael T. Eadon
4
, Blue B. Lake
5
,
Becky Steck
3
, Dejan Dobi
6
, Samir Parikh
7
, Tara K. Sidgel
6
, Theodore Alexandrov
8
, Andrew
Schroeder
6
, Edgar A. Otto
3
, Christopher R. Anderton
9,10
, Daria Barwinska
4
, Guanshi Zheng
10
,
Michael P. Rose
3
, John P. Shapiro
7
, Dusan Velickovic
9
, Annapurna Pamreddy
10
, Seth
Winfree
4
, Yongqun He
3
, Ian H. de Boer
11
, Jeffrey B. Hodgin
3
, Abhijit Nair
3
, Kumar Sharma
10
,
Minnie Sarwal
6
, Kun Zhang
5
, Jonathan Himmelfarb
11
, Zoltan Laszik
6
, Brad Rovin
7
, Pierre C.
Dagher
4
, John Cijiang He
1
, Tarek M. El-Achkar
4
, Sanjay Jain
12
, Olga G. Troyanskaya
2,#
,
Matthias Kretzler
3,#
, Ravi Iyengar
1,#
, Evren U. Azeloglu
1,#
for the Kidney Precision Medicine
Project Consortium
* Contributed equally, joint first authors
Affiliations:
1. Icahn School of Medicine at Mount Sinai, New York, New York
2. Princeton University, Princeton, New Jersey and Flatiron Institute, New York, New York
3. University of Michigan School of Medicine, Ann Arbor, Michigan
4. Indiana University School of Medicine, Indianapolis, Indiana
5. University of California San Diego, Jacobs School of Engineering, San Diego, California
6. University of California San Francisco School of Medicine, San Francisco, California
7. Ohio State University College of Medicine, Columbus, Ohio
8. European Molecular Biology Laboratory, Heidelberg, Germany
9. Pacific Northwest National Laboratory, Richland, Washington
10. UT-Health San Antonio School of Medicine, San Antonio, Texas
11. University of Washington, Schools of Medicine and Public Health, Seattle, Washington
12. Washington University in Saint Louis School of Medicine, St. Louis, Missouri
#
Corresponding Authors, joint senior authors:
Evren U. Azeloglu, Ph.D.
Assistant Professor of Medicine, Nephrology
Icahn School of Medicine at Mount Sinai, New York, NY
Email: evren.azeloglu@mssm.edu
Twitter: @azeloglu
Ravi Iyengar, Ph.D.
Dorothy H and Lewis H Rosenstiel Professor of Pharmacological Sciences
Icahn School of Medicine at Mount Sinai, New York, NY
Email: ravi.iyengar@mssm.edu
Matthias Kretzler, M.D.
Professor of Medicine, Nephrology
University of Michigan School of Medicine, Ann Arbor, MI
Email: kretzler@med.umich.edu
Olga Troyanskaya, Ph.D.
Professor of Computer Science
Princeton University, Princeton, NJ
Email: ogt@genomics.princeton.edu
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

2
ABSTRACT
The Kidney Precision Medicine Project (KPMP) plans to construct a spatially specified
tissue atlas of the human kidney at a cellular resolution with near comprehensive molecular
details. The atlas will have maps of healthy, acute kidney injury and chronic kidney disease
tissues. To construct such maps, we integrate different data sets that profile mRNAs, proteins
and metabolites collected by five KPMP Tissue Interrogation Sites. Here, we describe a set of
hierarchical analytical methods to process, combine, and harmonize single-cell, single-nucleus
and subsegmental laser microdissection (LMD) transcriptomics with LMD and near single-cell
proteomics, 3-D nondestructive and immunofluorescence-based Codex imaging and spatial
metabolomics datasets. We use nephrectomy, healthy living donor and surveillance transplant
biopsy tissues to create a harmonized reference tissue map. Our results demonstrate that
different assays produce reliable and coherent identification of cell types and tissue
subsegments. They further show that the molecular profiles and pathways are partially
overlapping yet complementary for cell type-specific and subsegmental physiological
processes. Focusing on the proximal tubules, we find that our integrated systems biology-
based analyses identify different subtypes of tubular cells with potential for different levels of
lipid oxidation and energy generation. Integration of our omics data with pathways from the
literature, enables us to construct predictive computational models to develop a smart kidney
atlas. These integrated models can describe physiological capabilities of the tissues based on
the underlying cell types and pathways in health and disease.
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

3
INTRODUCTION
The kidney is one of the most diverse organs in the human body in terms of its cellular
heterogeneity, and possibly second only to the brain in its spatial complexity. Accordingly,
decoding the functional and pathogenic mechanisms of kidney disease has been challenging;
as such, nephrology has consistently ranked behind all other subspecialties of medicine in
terms of the drug discovery pipeline
1
. Delineating the cell types and subtypes in different
regions of the kidney during health and disease will help identify the tissue-level, cellular and
subcellular pathways and processes involved in disease initiation and progression, and aid in
drug discovery.
The Kidney Precision Medicine Project (KPMP) is a consortium funded by the National
Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) that aims to ethically and
safely obtain kidney biopsies from participants with chronic kidney disease (CKD) or acute
kidney injury (AKI); create a reference kidney atlas; characterize disease subgroups to stratify
patients based on molecular features of disease; and identify critical cells, pathways, and
targets for novel therapies and preventive strategies. The KPMP features an expanding set of
complementary set of high throughput assays for molecular entities that span transcriptomic,
proteomic, metabolomic profiles and spatial/structural properties of kidney tissue. These
assays, described here for the five initially funded Tissue Interrogation Sites (TISes), will be
integrated to create a comprehensive knowledge environment for the human kidney. This
knowledge environment will be compiled by the KPMP Central Hub to serve as a foundation
for a spatially specified interactive smart tissue atlas that will include molecular and
physiological information on healthy and diseased states of all individual cell types within the
adult human kidney.
The KPMP envisions that harmonization and integration of different types of molecular data
from omics assays, combined with state-of-the-art pathological and clinical descriptors, will
allow us to classify different disease subtypes and states for diagnostic and therapeutic
purposes. Numerous groups have proposed the use of integrated multiomics analysis to
characterize disease phenotypes using tools that include Bayesian, correlative, network-based
and machine learning-based clustering algorithms
2-4
. The goals of these approaches include
prediction of clinical outcomes, identification of underlying disease mechanisms and
stratification of patients
5
. KPMP further envisions that the final integrated analytical
environment will serve as a knowledge base for the entire field that will empower a molecular
anchored outcome prediction and development of targeted treatments.
Here, we present an overview of KPMP’s strategies to harmonize and integrate multiple
data types through identification of subcellular pathways and functions that delineate cell-level
biochemical and physiological functions. Using reference kidney pilot tissue samples, we have
performed data harmonization and integration to investigate the complementarity of different
data types and develop a pipeline for the generation of tissue maps.
RESULTS
Outline of KPMP Data Types
In these analyses, there were four transcriptomic, two proteomic, one imaging-based, and
one spatial metabolomics tissue interrogation assays that consisted of 3 to 48 different
datasets obtained from 3 to 22 participants (Supplementary Table 1). These assays and their
detailed tissue pre-analytical, tissue processing, data acquisition and analytical data
processing pipelines are outlined in Figure 1. We also summarize the steps whereby the data
sets were integrated and harmonized in the upper right side of this descriptive map view of the
KPMP data integration paradigm.
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

4
Pathway- and network-level integration of multiple molecular interrogation techniques
reveals cell- and tissue-specific biological processes that are critical for renal
physiology
To overcome the inherent challenges of multiomics integration and assay dependent
divergence, we employed dynamic enrichment analysis
6
and network mapping
7
. We
evaluated the convergence of subcellular processes (SCPs) and pathways that are over-
represented in different cell types or subsegments within the kidney (in comparison to the other
cell types or subsegments), using single cell RNASeq data from PREMIERE TIS (Michigan,
Princeton, Broad)
8
, single nucleus RNASeq data from UCSD/WU TIS
9
, Laser microdissected
(LMD) bulk RNASeq (Supplementary Table 2) and LMD proteomics (Supplementary Table 3)
from the OSU/IU TIS, Near Single Cell (NSC) proteomics from the UCSF TIS (Supplementary
Table 4) and spatial metabolomics from the UTHSA-PNNL-EMBL TIS (Supplementary Table
5A/B/C from 3 different participants).
Single-cell
8
and -nucleus
9
RNASeq analysis resulted in the grouping of multiple cells or
nuclei into clusters that were assigned to a particular cell type based on the expression of
essential genes. The top 300 most significantly differentially expressed genes (DEGs) and
proteins (DEPs) of each cluster or subsegment compared to all other clusters or subsegments
as well as the metabolites assigned to glomerular and non-glomerular kidney regions
(Supplementary Table 6) were subjected to enrichment analysis to create pathway maps
(Supplementary Table 7) for the three representative cell types contributing diverse function to
kidney physiology: proximal tubular epithelial cells (Figure 2A, Supplementary Figure 1A for
nonspecific pathways), podocytes (Supplementary Figure 1B) and principal cells of the
collecting ducts (Supplementary Figure 1C). The final maps revealed highly interrelated SCPs
that are intimately linked to the physiological function of the respective cell types. Furthermore,
these SCPs are highly overlapping between assays and datasets with up to 74% of them being
repeatedly enriched in two or more assays, confirming the inherent agreement among these
different assays. While the individual significant genes or gene products coming from multiple
assays were not necessarily the same, placement of these gene products into an
interconnected pathway map showed innate congruence between the assays. The key
subcellular processes (SCPs) for the different cell types differed significantly.
Cell-type specific SCP networks predict overlapping and complementary pathways that
accurately support each cell type’s whole cell function. Proximal tubule networks predict a high
metabolic activity and describe ion reabsorption and ion-triggered glucose reabsorption
pathways as well as ammonia metabolism and detoxification pathways (Figure 2A). The
predictions are in agreement with the energy intensive ion, glucose and other small molecule
reabsorption by the proximal tubule cells
10
and their predominant function in ammonium
excretion and renal drug clearance
11
. The identification of cellular iron homeostasis pathways
documents the iron storage capacity of proximal tubule cells
12
that among other functions,
also mitigates kidney damage during acute kidney injury
13
. Podocyte/glomerular networks
focus on cell-cell/cell-matrix adhesion, glomerular basement membrane/extracellular matrix
(ECM) and actin dynamics (Supplementary Figure 1B), all pathways fundamental for barrier
generation and consequently for glomerular filtration. Principal cell/collecting duct networks
concentrate on ion reabsorption (Supplementary Figure 1C), emphasizing the important role of
the collecting duct in fine-tuning these mechanisms, thereby regulating systemic electrolyte
and water balance.
These networks document that 13% (principal cells/collecting duct), 27% (proximal tubule
cells/tubulointerstitium) and 74% (podocytes/glomerulus) of all predicted SCPs were
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

5
discovered by at least two different technologies. A closer investigation of the SCPs further
highlights that the overlap is even higher, if only the SCPs that describe cell type specific
functions are considered. Furthermore, the different datasets describe complementary
subfunctions of the same physiological processes. For example, both proteomic datasets of
the proximal tubule subsegments describe fatty acid transport via carnitine shuttling into the
mitochondrial matrix, where the enzymes for mitochondrial beta oxidation are localized (Figure
2A). The PREMIERE SC RNASeq dataset predicts carnitine biosynthesis, i.e. synthesis of the
central molecule of the carnitine shuttle.
Integration of pathways that were predicted based on the tubulointerstitial metabolites, such
as ‘Glycolysis and Gluconeogenesis’ and ‘D-Arginine and D-ornithine metabolism’
(Supplementary Figure 1D), into the Molecular Biology of the Cell Ontology (MBCO) SCP-
networks (Figure 1A) further underline the predicted high metabolic activity of the proximal
tubule cells. Glomerular metabolites enrich for pathways (Supplementary Figure 1C), such as
sphingolipid and arachidonic acid metabolism, that support cell-matrix/cell-cell adhesion and
gap junctions, respectively
14
. Dynamic enrichment analysis of both single-cell RNA-seq
datasets predicts the involvement of another metabolic pathway, i.e. retinol metabolism, in
podocyte function, in particular as a regulator of tight junctions (Supplementary Figure 1B).
Retinoic acid has a regulatory effect on tight junctions
15, 16
and plays a significant role in
mitigating podocyte apoptosis and dedifferentiation during podocyte injury
17
.
The enrichment results suggest that proximal tubular cells have the capacity to meet the
high energy demand by not only fueling the citric acid cycle via beta oxidation, but also via
glucose and glutamine catabolism. Nevertheless, beta oxidation is most consistently predicted,
in agreement with previous studies documenting lipid metabolism as the preferential energy
source in proximal tubule cells
18, 19
. Investigation of the pathway components of these SCPs
documents that the different omics technologies identify different components of these
pathways that integrate into a comprehensive description of the relevant biochemical pathways
(Figure 2B). Each technology contributes genes, proteins and metabolites for a fuller
description of the pathways than would be obtained by a single technology. Tubulointerstitial
metabolites, for example, contain glucose, cofactors of the pyruvate dehydrogenase complex
and multiple adenosine nucleotides/nucleosides (i.e. metabolites of the energy carrier ATP). In
agreement with the results of the pathway predictions, network mapping
7
revealed that cell-
type specific DEGs and DEPs lie within the same area of the human interactome
(Supplementary Figure 1E), indicative of close functional relationships.
In parallel, we identified modules in a kidney-specific functional network using the top
ranked 300 marker genes and proteins across all datatypes in order to detect sets of cell-type
specific, functionally related genes
20, 21
. The module detection algorithm finds groups of genes
that form tightly connected communities within a kidney-specific functional network, which is
constructed using a data-driven approach from gene-gene relationships across thousands of
experimental assays. After module detection, gene enrichment analysis is performed within
each module to understand the key functions of the genes in each module. As with dynamic
enrichment analysis, the modules display clear cell-type specific functional enrichments
(Supplementary Table 8). For example, the network of proximal tubule marker genes includes
modules enriched in anion transport and cellular response to metal ions (Figure 2C), the
network of podocyte marker genes includes modules enriched in glomerulus development and
cell-cell adhesion (Supplementary Figure 1F), and the network of principal cell marker genes
includes modules enriched in sodium ion transport (Supplementary Figure 1G)
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a review of the analysis pipelines used in the field of single-cell analysis is presented, along with an overview of challenges and the most commonly used analytical tools.
Abstract: Over the last 5 years, single cell methods have enabled the monitoring of gene and protein expression, genetic, and epigenetic changes in thousands of individual cells in a single experiment. With the improved measurement and the decreasing cost of the reactions and sequencing, the size of these datasets is increasing rapidly. The critical bottleneck remains the analysis of the wealth of information generated by single cell experiments. In this review, we give a simplified overview of the analysis pipelines, as they are typically used in the field today. We aim to enable researchers starting out in single cell analysis to gain an overview of challenges and the most commonly used analytical tools. In addition, we hope to empower others to gain an understanding of how typical readouts from single cell datasets are presented in the published literature.

11 citations

Journal ArticleDOI
TL;DR: In this paper, a holistic approach aims to identify the molecular basis of CKD subtypes as well as individual determinants of disease manifestation in a given patient, which will lead to improved prognostic and predictive diagnostics and the discovery of novel molecular disease-specific therapies.
Abstract: Chronic kidney diseases (CKD) are a major health problem affecting approximately 10% of the world’s population and posing increasing challenges to the healthcare system. While CKD encompasses a broad spectrum of pathological processes and diverse etiologies, the classification of kidney disease is currently based on clinical findings or histopathological categorizations. This descriptive classification is agnostic towards the underlying disease mechanisms and has limited progress towards the ability to predict disease prognosis and treatment responses. To gain better insight into the complex and heterogeneous disease pathophysiology of CKD, a systems biology approach can be transformative. Rather than examining one factor or pathway at a time, as in the reductionist approach, with this strategy a broad spectrum of information is integrated, including comprehensive multi-omics data, clinical phenotypic information, and clinicopathological parameters. In recent years, rapid advances in mathematical, statistical, computational, and artificial intelligence methods enable the mapping of diverse big data sets. This holistic approach aims to identify the molecular basis of CKD subtypes as well as individual determinants of disease manifestation in a given patient. The emerging mechanism-based patient stratification and disease classification will lead to improved prognostic and predictive diagnostics and the discovery of novel molecular disease-specific therapies.

7 citations

Journal ArticleDOI
TL;DR: In this paper , the authors evaluated agarose inflation and carboxymethyl cellulose embedding media and determined effective tissue preparation protocols for performing bulk and spatial mass spectrometry-based omics measurements.
Abstract: Human disease states are biomolecularly multifaceted and can span across phenotypic states, therefore it is important to understand diseases on all levels, across cell types, and within and across microanatomical tissue compartments. To obtain an accurate and representative view of the molecular landscape within human lungs, this fragile tissue must be inflated and embedded to maintain spatial fidelity of the location of molecules and minimize molecular degradation for molecular imaging experiments. Here, we evaluated agarose inflation and carboxymethyl cellulose embedding media and determined effective tissue preparation protocols for performing bulk and spatial mass spectrometry-based omics measurements. Mass spectrometry imaging methods were optimized to boost the number of annotatable molecules in agarose inflated lung samples. This optimized protocol permitted the observation of unique lipid distributions within several airway regions in the lung tissue block. Laser capture microdissection of these airway regions followed by high-resolution proteomic analysis allowed us to begin linking the lipidome with the proteome in a spatially resolved manner, where we observed proteins with high abundance specifically localized to the airway regions. We also compared our mass spectrometry results to lung tissue samples preserved using two other inflation/embedding media, but we identified several pitfalls with the sample preparation steps using this preservation method. Overall, we demonstrated the versatility of the inflation method, and we can start to reveal how the metabolome, lipidome, and proteome are connected spatially in human lungs and across disease states through a variety of different experiments.
References
More filters
Journal ArticleDOI
13 Jun 2019-Cell
TL;DR: A strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.

7,892 citations

Journal ArticleDOI
TL;DR: A freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst, which supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods.
Abstract: Metabolomics is a newly emerging field of 'omics' research that is concerned with characterizing large numbers of metabolites using NMR, chromatography and mass spectrometry. It is frequently used in biomarker identification and the metabolic profiling of cells, tissues or organisms. The data processing challenges in metabolomics are quite unique and often require specialized (or expensive) data analysis software and a detailed knowledge of cheminformatics, bioinformatics and statistics. In an effort to simplify metabolomic data analysis while at the same time improving user accessibility, we have developed a freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst. Fundamentally, MetaboAnalyst is a web-based metabolomic data processing tool not unlike many of today's web-based microarray analysis packages. It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping. In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods. It also employs a large library of reference spectra to facilitate compound identification from most kinds of input spectra. MetaboAnalyst guides users through a step-by-step analysis pipeline using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses. MetaboAnalyst is accessible at http://www.metaboanalyst.ca.

1,596 citations

Journal ArticleDOI
TL;DR: The evolution of knowledge base–driven pathway analysis over its first decade is discussed, distinctly divided into three generations, and a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods are identified.
Abstract: Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

1,357 citations

Journal ArticleDOI
TL;DR: A computational doublet detection tool-DoubletFinder-that identifies doublets using only gene expression data is presented, allowing its application across scRNA-seq datasets with diverse distributions of cell types.
Abstract: Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as "doublets," which limit cell throughput and lead to spurious biological conclusions. Here, we present a computational doublet detection tool-DoubletFinder-that identifies doublets using only gene expression data. DoubletFinder predicts doublets according to each real cell's proximity in gene expression space to artificial doublets created by averaging the transcriptional profile of randomly chosen cell pairs. We first use scRNA-seq datasets where the identity of doublets is known to show that DoubletFinder identifies doublets formed from transcriptionally distinct cells. When these doublets are removed, the identification of differentially expressed genes is enhanced. Second, we provide a method for estimating DoubletFinder input parameters, allowing its application across scRNA-seq datasets with diverse distributions of cell types. Lastly, we present "best practices" for DoubletFinder applications and illustrate that DoubletFinder is insensitive to an experimentally validated kidney cell type with "hybrid" expression features.

1,148 citations

Journal ArticleDOI
TL;DR: It is shown that molecular labels—random sequences that label individual molecules—can nearly eliminate amplification noise, and that microfluidic sample preparation and optimized reagents produce a fivefold improvement in mRNA capture efficiency.
Abstract: Single-cell RNA sequencing (RNA-seq) is a powerful tool to reveal cellular heterogeneity, discover new cell types and characterize tumor microevolution. However, losses in cDNA synthesis and bias in cDNA amplification lead to severe quantitative errors. We show that molecular labels--random sequences that label individual molecules--can nearly eliminate amplification noise, and that microfluidic sample preparation and optimized reagents produce a fivefold improvement in mRNA capture efficiency.

1,121 citations

Related Papers (5)
Frequently Asked Questions (15)
Q1. What have the authors contributed in "Towards building a smart kidney atlas: network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the kidney precision medicine project" ?

Towards Building a Smart Kidney Atlas: Network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the Kidney Precision Medicine Project Jens Hansen, Rachel Sealfon, Rajasree Menon, Michael T. Eadon, Blue B. Lake, Becky Steck, Dejan Dobi, Samir Parikh, Tara K. Sidgel, Theodore Alexandrov, Andrew Schroeder, Edgar A. Otto, Christopher R. Anderton, Daria Barwinska, Guanshi Zheng, Michael P. Rose, John P. Shapiro, Dusan Velickovic, Annapurna Pamreddy, Seth Winfree, Yongqun He, Ian H. de Boer, Jeffrey B. Hodgin, Abhijit Nair, Kumar Sharma, Minnie Sarwal, Kun Zhang, Jonathan Himmelfarb, Zoltan Laszik, Brad Rovin, Pierre C. Dagher, John Cijiang He, Tarek M. El-Achkar, Sanjay Jain, Olga G. Troyanskaya, Matthias Kretzler, Ravi Iyengar, Evren U. Azeloglu for the Kidney Precision Medicine Project Consortium 

Their approach is amendable to future computational modeling studies that can further improve the proposed tissue atlas. In addition to the integrated analytics presented here, the KPMP is also building a community-based Kidney Tissue Atlas Ontology ( KTAO ), which will systematically integrate different types information ( such as clinical, pathological, cell and molecular ) into a logically defined tissue atlas, which can then be further utilized to support various applications 34. 

‘SCTransform’ was used for data normalization and scaling (based on top 2,000 features), followed by principal component analysis. 

Decrease in fatty acid oxidation, resulting in a loss of ATP generation, has been shown to be a significant contributor to tubulointerstitial fibrosis 19. 

On average 12 and 15 libraries (~3,100 and 3,835 nuclei) allowed reidentification of seven of the top 10 predicted podocyte and proximal tubule MBCO SCPs, respectively, while 21 libraries (~5,462 nuclei) were sufficient to reidentify five of the topwas not certified by peer review) is the author/funder. 

Subcellular localization of each gene was identified using the jensenlab human compartment database based on a jensenlab confidence of at least four (i.e. 80% of maximum confidence in the database) 28. 

Tubulointerstitial metabolites, for example, contain glucose, cofactors of the pyruvate dehydrogenase complex and multiple adenosine nucleotides/nucleosides (i.e. metabolites of the energy carrier ATP). 

Their results indicate that for a consistent detection of podocytes (i.e. in more than 95% of all down sampled datasets with the same library counts), at least 16 (~11,727 cells) or 7 libraries (1,837 nuclei) are needed if subjected to single-cell RNASeq (Figure 4A) or single-nucleus RNASeq (Figure 4B), respectively. 

Top 300 differentially expressed genes (DEGs) and proteins (DEPs) predicted by each assay for each analyzed cell type/tissue subsegment. 

Notice that the top seven predictions based on dynamic enrichment analysis can contain more than seven SCPs, since each prediction is either a single SCP or a unique combination of two or three SCPs. 

For the LMD proteomics dataset, six to eight samples were sufficient to reproduce the results obtained for the full datasets with only minor variations in the correlation of identified DEGs (Figure 4C) and SCPs (Supplementary Figure 2E) or SCP rankings (Figure 4C). 

An idealized integration scenario would combine these assays synergistically such that they could complement the shortcomings of each other, improve quality control metrics across technologies, and increase rigor and reproducibility of the overall study. 

the authors determined how many SCPs have to be considered in a down-sampled analysis to re-identify at least 70% (or 50%) of the top 10 or seven predictions obtained from standard or dynamic enrichment analysis with the full dataset, respectively. 

Principal cell/collecting duct networks concentrate on ion reabsorption (Supplementary Figure 1C), emphasizing the important role of the collecting duct in fine-tuning these mechanisms, thereby regulating systemic electrolyte and water balance. 

To compute the Pearson correlation between the gene expression profiles of cells and LCM segments, the gene profiles were restricted to genes shared between the two datasets and showing variable expression in the single-cell dataset and correlations were computed between the logarithm of the mean ratio vector for each LCM segment and the scaled expression profile of each cell in the single cell dataset.