Posted Content•DOI•

A reference tissue atlas for the human kidney

Jens Hansen¹, Rachel Sealfon², Rajasree Menon³, Michael T. Eadon⁴, Blue B. Lake⁵, Becky Steck³, Dejan Dobi⁶, Samir M. Parikh⁷, Tara K. Sigdel⁶, Guanshi Zhang⁸, Dušan Veličković⁹, Daria Barwinska⁴, Theodore Alexandrov, Priyanka Rashmi⁶, Edgar A. Otto³, Michael Rose³, Christopher R. Anderton⁹, Christopher R. Anderton⁸, John P. Shapiro⁷, Annapurna Pamreddy⁸, Seth Winfree⁴, Yongqun He³, Ian H. de Boer¹⁰, Jeffrey B. Hodgin³, Laura Barisoni¹¹, Abhijit S. Naik³, Kumar Sharma⁸, Minnie M. Sarwal⁶, Kun Zhang⁵, Jonathan Himmelfarb¹⁰, Brad H. Rovin⁷, Tarek M. El-Achkar⁴, Zoltan Laszik⁶, John Cijiang He¹, Pierre C. Dagher⁴, M. Todd Valerius¹², Sanjay Jain¹³, Lisa M. Satlin¹, Olga G. Troyanskaya², Matthias Kretzler³, Ravi Iyengar¹, Evren U. Azeloglu¹ - Show less +38 more•Institutions (13)

Icahn School of Medicine at Mount Sinai¹, Princeton University², University of Michigan³, Indiana University⁴, University of California, San Diego⁵, University of California, San Francisco⁶, Ohio State University⁷, University of Texas Health Science Center at San Antonio⁸, Pacific Northwest National Laboratory⁹, University of Washington¹⁰, Duke University¹¹, Brigham and Women's Hospital¹², Washington University in St. Louis¹³

15 Sep 2021-bioRxiv (Cold Spring Harbor Laboratory)-

TL;DR: In this article, the authors describe the construction of an integrated reference tissue map of cells, pathways and genes using unaffected regions of nephrectomy tissues and undiseased human biopsies from 55 subjects.

read less

Abstract: Kidney Precision Medicine Project (KPMP) is building a spatially-specified human tissue atlas at the single-cell resolution with molecular details of the kidney in health and disease. Here, we describe the construction of an integrated reference tissue map of cells, pathways and genes using unaffected regions of nephrectomy tissues and undiseased human biopsies from 55 subjects. We use single-cell and -nucleus transcriptomics, subsegmental laser microdissection bulk transcriptomics and proteomics, near-single-cell proteomics, 3-D nondestructive and CODEX imaging, and spatial metabolomics data to hierarchically identify genes, pathways and cells. Integrated data from these different technologies coherently describe cell types/subtypes within different nephron segments and interstitium. These spatial profiles identify cell-level functional organization of the kidney tissue as indicative of their physiological functions and map different cell subtypes to genes, proteins, metabolites and pathways. Comparison of transcellular sodium reabsorption along the nephron to levels of mRNAs encoding the different sodium transporter genes indicate that mRNA levels are largely congruent with physiological activity.This reference atlas provides an initial framework for molecular classification of kidney disease when multiple molecular mechanisms underlie convergent clinical phenotypes.

...read moreread less

Summary (6 min read)

Jump to: [INTRODUCTION] – [Outline of KPMP Data Types] – [DISCUSSION] – [METHODS] – [Ranking of Differentially Expressed Genes and Proteins] – [Standard and Dynamic Enrichment Analysis] – [Module Detection] – [Enrichment Analysis of Metabolites] – [Integration of Single-Cell/Single-Nucleus Transcriptomics] – [Integration of Single-cell, Single-nucleus and Laser Capture Microdissection Bulk Transcriptomics] – [Post-hoc power analysis] – [Proteomic-Transcriptomic Co-expression Analysis] – [Comparison of Cell Type-specific Imaging and Transcriptomic Expression Data] – [Generating Pathway Maps for Beta-oxidation Network from Single-cell RNASeq Clusters] – [50% of all SCPs that were part of the top seven predictions based on dynamic enrichment] – [Single-nucleus RNASeq (UCSD/WashU) and Single-cell RNASeq (PREMIERE)] – [Subsegmental LMD Transcriptomics (IU/OSU)] – [Subsegmental LMD Proteomics (IU/OSU)] – [Spatial Metabolomics (UTHSA-PNNL-EMBL)] – [12. van Swelm RPL, Wetzels JFM, Swinkels DW. The multifaceted role of iron in renal health and] and [27. McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA]

INTRODUCTION

The kidney is one of the most diverse organs in the human body in terms of its cellular heterogeneity, and possibly second only to the brain in its spatial complexity.
Delineating the cell types and subtypes in different regions of the kidney during health and disease will help identify the tissue-level, cellular and subcellular pathways and processes involved in disease initiation and progression, and aid in drug discovery.
The KPMP features an expanding set of complementary set of high throughput assays for molecular entities that span transcriptomic, proteomic, metabolomic profiles and spatial/structural properties of kidney tissue.
The KPMP envisions that harmonization and integration of different types of molecular data from omics assays, combined with state-of-the-art pathological and clinical descriptors, will allow us to classify different disease subtypes and states for diagnostic and therapeutic purposes.
Numerous groups have proposed the use of integrated multiomics analysis to characterize disease phenotypes using tools that include Bayesian, correlative, network-based and machine learning-based clustering algorithms 2-4.

Outline of KPMP Data Types

In these analyses, there were four transcriptomic, two proteomic, one imaging-based, and one spatial metabolomics tissue interrogation assays that consisted of 3 to 48 different datasets obtained from 3 to 22 participants (Supplementary Table 1).
The copyright holder for this preprint (whichthis version posted July 24, 2020.
Hierarchical clustering of the correlation coefficients documented that the absolute gene and protein expression values are specific for a particular platform and not for their anatomical origin (Supplementary Figure 3A).
While imaging assays identify the spatial localization of individual cells together with their expression signatures for a limited number of proteins, single-cell RNASeq assays provide more extensive transcriptomic profiles for individual cells.
The pathways that are predicted for non-glomerular metabolites either overlapped with or were closely related to the pathways that are predicted for proximal tubule cells and subsegments based on the other datasets (Figure 2A).

DISCUSSION

The advances in transcriptomic technologies along with other omics and imaging assays offer unprecedented insights into the organization of tissues at cellular resolution and the molecular constituents of the different cell types and their subtypes.
The authors predict that these subtypes differ in their potential for lipid metabolism, which is critically important for the physiological function of proximal tubule cells, as cellular energetics have been shown to be critical for reabsorptive activity.
The copyright holder for this preprint (whichthis version posted July 24, 2020.
Nevertheless, their post-hoc power analysis can help to estimate the reliability of an identified cell subtype or predicted disease mechanism by documenting that it is consistently recovered using down-sampled datasets.
In addition to the integrated analytics presented here, the KPMP is also building a community-based Kidney Tissue Atlas Ontology (KTAO), which will systematically integrate different types information (such as clinical, pathological, cell and molecular) into a logically defined tissue atlas, which can then be further utilized to support various applications 34.

METHODS

Omics and imaging assays used within KPMP target different types of molecular components with different resolution, sensitivity and precision.
An important function of the KPMP Central Hub is to integrate the different types of data using a set of analytical techniques.
The pilot data presented for each assay comprises 3 to 48 different datasets that are obtained from 3 to 22 participants (Supplementary Table 1).
Participants kidne tissue was procured from a spectrum of tissue resources including from unaffected parts of tumor nephrectomy specimen (n=38), living donor preperfusion biopsies (n=3), diseased donor nephrectomies (n=5), and normal surveillance transplant (n=5) and native kidney biopsies (n=4).
Within each assay the authors generated lists of differentially expressed genes (DEGs), proteins (DEPs) and metabolites that describe those genes, proteins or metabolites that are upregulated or enriched in a particular single cell cluster, single nucleus cluster or kidney subsegment, if compared to all other clusters or subsegments.

Ranking of Differentially Expressed Genes and Proteins

In the case of the DEGs and DEPs that were used for dynamic enrichment analysis, 6 module identification, 21 and post-hoc power analysis, single nucleus and single cell DEGs were first ranked by p-value and then by decreasing fold changes (i.e., fold changes were used as a tiebreaker).
Top ranked 300 entities were subjected to downstream analysis.
Similarly, DEGs and DEPs obtained for each kidney subsegment based on LMD bulk RNASeq, or LMD and NSC proteomics, were ranked first by p-value and decreasing fold changes and the top ranked 300 DEGs and DEPs subjected to pathway enrichment analysis or module detection (see below).
Therefore, the authors could not calculate p-values for the LMD and NSC technologies.
The copyright holder for this preprint (whichthis version posted July 24, 2020.

Standard and Dynamic Enrichment Analysis

Top DEGs and DEPs for each podocyte cluster/glomerulus, proximal tubule cell cluster/tubulointerstitium and principal cell cluster/collecting duct subsegment were separately subjected to standard enrichment analysis using Gene Ontology Biological Processes (GO BPs) or the Molecular Biology of the Cell Ontology (MBCO) level-3 subcellular processes (SCPs) 6 and Fisher’s Exact Test.
Only genes/proteins that are detected by this method and statistically analyzed for differential expression can be identified as DEGs/DEPs and only these genes/proteins are considered as the background set for the Fisher’s Exact test.
Ontological background genes/proteins were all genes that are annotated to at least one pathway within that particular ontology.
Dynamic enrichment analysis uses these relationships to generate context-specific higher-level processes by merging functionally related SCPs that contain at least one DEG or DEP.
The top five predicted SCPs or merged SCPs are connected based on the inferred relationships, and all networks for a particular cell type/segment merged, whereby each SCP was color-coded according to the source assay(s) that initiated its dynamic enrichment.

Module Detection

In parallel to enrichment analyses, the authors also performed another network-based pathway enrichment technique, identifying modules of cell-type specific marker genes within the kidneyspecific functional network using the HumanBase interface (hb.flatironinstitute.org).
The copyright holder for this preprint (whichthis version posted July 24, 2020.
BioRxiv preprint 14 DEPs from each proteomics dataset.
Module detection is a network-based approach described in Krishnan et al., and construction of the functional networks is described in Greene et al 20, 21.
Modules are detected using a community clustering algorithm based on connectivity between genes in the kidney-specific functional network, and enrichment analysis is subsequently performed to identify functional enrichments in each module.

Enrichment Analysis of Metabolites

All glomerular and nonglomerular metabolites that were identified for the three participants were merged and subjected to pathway enrichment analysis using MetaboAnalyst 25.
The top six predicted metabolic pathways were mapped onto MBCO pathways whenever possible; if they did not have a corresponding pathway, the original pathway names were preserved.

Integration of Single-Cell/Single-Nucleus Transcriptomics

In contrast to bulk mRNA sequencing, where the gene expression measurements reflect an average across all captured cell types, single-cell or single-nucleus mRNA sequencing allows the measurement and comparison of comprehensive gene sets obtained from individual cells.
Single-cell transcriptomic data was produced by PREMIERE (24 libraries from 22 participants) 8 and UCSF (10 libraries from 10 participants), whereas the single-nucleus data was made by UCSD (47 libraries from 15 participants).
Data from each site were first processed using the Seurat 3.0 R package 26.
These anchor genes were then used to harmonize the datasets.
The downstream process included scaling, principal component analysis, batch integration using harmony, dimensionality reduction using Uniform Manifold Approximation and Projection (UMAP), and unsupervised clustering.

Integration of Single-cell, Single-nucleus and Laser Capture Microdissection Bulk Transcriptomics

To integrate single-cell sequencing, single-nucleus sequencing, and LMD bulk transcriptomic datasets, the authors first determined the overlap between genes identified both in the LMD dataset and in the corresponding single-cell transcriptomic dataset.
From this set of was not certified by peer review) is the author/funder.
The copyright holder for this preprint (whichthis version posted July 24, 2020.
The authors then computed the Pearson correlation between each individual cell in a scaled single-cell dataset and the LMD transcriptomic dataset for the same participant.
Using this approach, the authors can assign each cell to the appropriate LMD segment that shows the highest correlation value.

Post-hoc power analysis

The PREMIERE single-cell RNASeq 8 and the UCSD/WU single-nucleus RNASeq 9 datasets were obtained from 22 and 15 participants, respectively, whose samples were sequenced in 24 and 47 libraries.
The authors used jackstraw analysis to identify the last significant principal component (alpha = 0.01) among the top 20 components.
To document the reliability of that cell type assignment the authors compared its p-value to the p-value of the second prediction (that cell type whose essential genes had the second most significant enrichment among the DEGs of that cluster).
The authors progressively and randomly removed libraries from the full datasets to generate 100 non-overlapping downsampled datasets for each number of remaining participants.
Additionally, the top 300 significant DEPs of each subsegment were subjected to enrichment analysis and predicted pathways compared as described above.

Proteomic-Transcriptomic Co-expression Analysis

LMD and NSC proteomic datasets identified protein expression in two kidney subsegments: glomeruli and tubulointerstitium for LMD and glomeruli and proximal tubule for NSC.
The authors identified technology and participant specific cluster gene expression, using the “Average Expression” functionality embedded in Seurat R package (RNA assay, counts slot) on the cells/nuclei assigned to the same clusters in the integrated PREMIERE, UCSF and UCSD/WU data analysis described above.
The intersection of all background sets was defined as the set of common genes.
Ratios were inverted to describe proximal tubule/tubulointerstitial specific gene expression.
The copyright holder for this preprint (whichthis version posted July 24, 2020.

Comparison of Cell Type-specific Imaging and Transcriptomic Expression Data

To integrate cell type-specific imaging and transcriptomic data, the authors first constructed matrices with average expression values for each gene in each cell type cluster for both the set of 16 normalized integrated transcriptomic clusters and the CODEX clusters.
The authors normalized each gene in both transcriptomic and CODEX matrices to have a mean of 0 and standard deviation of 1.
The authors then filtered both datasets to include only genes represented in both the transcriptomic and the imaging datasets and computed the average expression of each gene/protein in each cell type.
The authors next considered the problem of constructing a matrix to computationally map transcriptomic cell clusters to the imaging cell clusters.
Before visualizing matrix M as a heatmap, the authors first normalized each row to have mean of 0 and standard deviation of 1 in order to identify the transcriptomic cell types that are weighted most heavily in the mapping to each imaging cell type.

Generating Pathway Maps for Beta-oxidation Network from Single-cell RNASeq Clusters

To better understand one of the most significantly enriched pathways in their integrated analytics of proximal tubules, reactions involved in fatty acid beta oxidation were extracted from KEGG (www.genome.jp/kegg). was not certified by peer review) is the author/funder.
The copyright holder for this preprint (whichthis version posted July 24, 2020.
Datasets were subjected to an automated single-cell/nucleus and proteomic data analysis pipeline and results compared between the downsampled and complete reference datasets.
‘Cluster count’ documents how many clusters were assigned to a particular cell type.

50% of all SCPs that were part of the top seven predictions based on dynamic enrichment

Libraries label the number of used sequencing libraries for each down-sampled dataset, cells the average number of total cells that were obtained from those libraries.
The copyright holder for this preprint (whichthis version posted July 24, 2020.
Labels describing podocyte/glomerular and proximal tubule/tubulointerstitium RNASeq and proteomic datasets are colored aquamarine and orange, respectively.
Curly brackets group samples obtained by the same technology: 1: LMD RNASeq, 2: NSC/LMD Proteomics, 3: SC RNASeq PREMIERE, 4: SC RNASeq UCSF, 5: SN RNASeq UCSD/WU.
BioRxiv preprint Figure 7 A Prior knowledge Integration with multiomics and imaging data Models and predictions of tissue function Cell subtype-specific compartmental metabolic networks Dynamic models of metabolic pathways in different subtypes of proximal tubule cells Subcellular compartments of enzymes Metabolic reactions.

Single-nucleus RNASeq (UCSD/WashU) and Single-cell RNASeq (PREMIERE)

UMI count matrixes and list of differentially expressed genes were downloaded from published analyses for the PREMIERE TIS (composed of Michigan, Princeton, Broad) singlecell RNA sequencing 8 and UCSD/WashU TIS Single-nucleus 9 datasets.
The authors excluded the proximal tubular cells-3 and principal cells-2 clusters from the single-nucleus RNASeq dataset, since these clusters showed an inflammatory or a stress response.

Subsegmental LMD Transcriptomics (IU/OSU)

A comprehensive Laser MicroDissection (LMD) protocol is published on protocols. io (https://www.protocols.io/view/laser-microdissection-8rkhv4w).
Briefly, 12 m frozen sections are obtained from an Optimal Cutting Temperature (OCT) preserved tissue block and adhered to LMD membrane slides (Leica, Buffalo Grove, IL).
Slides undergo dissection with a Leica LMD6500 system with pulsed UV laser.
RNA quality is assessed by bioanalyzer, ribosomal RNA is depleted, and cDNA libraries are prepared using the SMARTer Universal Low Input RNA Kit (Takara, No. 634938).
Total read counts mapping to each gene were generated with edgeR, normalized, and converted to expression ratios.

Subsegmental LMD Proteomics (IU/OSU)

A comprehensive Laser MicroDissection (LMD) proteomics protocol is published on protocols.
The authors LMD proteomic methods have also been previously published in detail 29, 30.
Glomerular gene expression was compared to the tubulointerstitial gene expression using an unpaired t-test with equal variance.
The entire 3-D fluorescence imaging and tissue cytometry protocol is published on protocols.
Images were acquired in up to 8 channels using a Leica SP8 Confocal Microscope.

Spatial Metabolomics (UTHSA-PNNL-EMBL)

10 m thick renal cortical tissues were sectioned on a cryostat (Leica Microsystems) and prepared for matrix assisted laser deposition imaging mass spectrometry by spraying 3 with the norharmane matrix using the TM-Sprayer automated spraying robot (HTX Technology).
For dynamic enrichment analysis all SCPs among the top 25 predictions were compared.
Top 300 DEGs or DEPs were subjected to pathway enrichment analysis and (D) the top-50 GO BPs and (E) MBCO level-3 SCPs subjected to hierarchical clustering based on pairwise correlation coefficients between - log10(p-values).
A B Supplementary Figure 4 Supplementary Figure 4.

12. van Swelm RPL, Wetzels JFM, Swinkels DW. The multifaceted role of iron in renal health and

Proximal tubule H-ferritin mediates iron trafficking in acute kidney injury.
Changes in membrane sphingolipid composition modulate dynamics and adhesion of integrin nanoclusters.
Differentiation of human neuroblastoma cell line IMR-32 by sildenafil and its newly discovered analogue IS00384.

27. McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA

Binder JX, Pletscher-Frankild S, Tsafou K, et al.
Unification and visualization of protein subcellular localization evidence, also known as COMPARTMENTS.
Characterization of glomerular diseases using proteomic analysis of laser capture microdissected glomeruli.
Modeling Kidney Disease Using Ontology: Perspectives from the KPMP.

Did you find this useful? Give us your feedback

Content maybe subject to copyright Report

Towards Building a Smart Kidney Atlas: Network-based integration of multimodal

transcriptomic, proteomic, metabolomic and imaging data in the Kidney Precision

Medicine Project

Jens Hansen

1,*

, Rachel Sealfon

2,*

, Rajasree Menon

3,*

, Michael T. Eadon

, Blue B. Lake

Becky Steck

, Dejan Dobi

, Samir Parikh

, Tara K. Sidgel

, Theodore Alexandrov

, Andrew

Schroeder

, Edgar A. Otto

, Christopher R. Anderton

9,10

, Daria Barwinska

, Guanshi Zheng

Michael P. Rose

, John P. Shapiro

, Dusan Velickovic

, Annapurna Pamreddy

, Seth

Winfree

, Yongqun He

, Ian H. de Boer

, Jeffrey B. Hodgin

, Abhijit Nair

, Kumar Sharma

Minnie Sarwal

, Kun Zhang

, Jonathan Himmelfarb

, Zoltan Laszik

, Brad Rovin

, Pierre C.

Dagher

, John Cijiang He

, Tarek M. El-Achkar

, Sanjay Jain

, Olga G. Troyanskaya

2,#

Matthias Kretzler

3,#

, Ravi Iyengar

1,#

, Evren U. Azeloglu

1,#

for the Kidney Precision Medicine

Project Consortium

* Contributed equally, joint first authors

Affiliations:

1. Icahn School of Medicine at Mount Sinai, New York, New York

2. Princeton University, Princeton, New Jersey and Flatiron Institute, New York, New York

3. University of Michigan School of Medicine, Ann Arbor, Michigan

4. Indiana University School of Medicine, Indianapolis, Indiana

5. University of California San Diego, Jacobs School of Engineering, San Diego, California

6. University of California San Francisco School of Medicine, San Francisco, California

7. Ohio State University College of Medicine, Columbus, Ohio

8. European Molecular Biology Laboratory, Heidelberg, Germany

9. Pacific Northwest National Laboratory, Richland, Washington

10. UT-Health San Antonio School of Medicine, San Antonio, Texas

11. University of Washington, Schools of Medicine and Public Health, Seattle, Washington

12. Washington University in Saint Louis School of Medicine, St. Louis, Missouri

Corresponding Authors, joint senior authors:

Evren U. Azeloglu, Ph.D.

Assistant Professor of Medicine, Nephrology

Icahn School of Medicine at Mount Sinai, New York, NY

Email: evren.azeloglu@mssm.edu

Twitter: @azeloglu

Ravi Iyengar, Ph.D.

Dorothy H and Lewis H Rosenstiel Professor of Pharmacological Sciences

Icahn School of Medicine at Mount Sinai, New York, NY

Email: ravi.iyengar@mssm.edu

Matthias Kretzler, M.D.

Professor of Medicine, Nephrology

University of Michigan School of Medicine, Ann Arbor, MI

Email: kretzler@med.umich.edu

Olga Troyanskaya, Ph.D.

Professor of Computer Science

Princeton University, Princeton, NJ

Email: ogt@genomics.princeton.edu

The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

ABSTRACT

The Kidney Precision Medicine Project (KPMP) plans to construct a spatially specified

tissue atlas of the human kidney at a cellular resolution with near comprehensive molecular

details. The atlas will have maps of healthy, acute kidney injury and chronic kidney disease

tissues. To construct such maps, we integrate different data sets that profile mRNAs, proteins

and metabolites collected by five KPMP Tissue Interrogation Sites. Here, we describe a set of

hierarchical analytical methods to process, combine, and harmonize single-cell, single-nucleus

and subsegmental laser microdissection (LMD) transcriptomics with LMD and near single-cell

proteomics, 3-D nondestructive and immunofluorescence-based Codex imaging and spatial

metabolomics datasets. We use nephrectomy, healthy living donor and surveillance transplant

biopsy tissues to create a harmonized reference tissue map. Our results demonstrate that

different assays produce reliable and coherent identification of cell types and tissue

subsegments. They further show that the molecular profiles and pathways are partially

overlapping yet complementary for cell type-specific and subsegmental physiological

processes. Focusing on the proximal tubules, we find that our integrated systems biology-

based analyses identify different subtypes of tubular cells with potential for different levels of

lipid oxidation and energy generation. Integration of our omics data with pathways from the

literature, enables us to construct predictive computational models to develop a smart kidney

atlas. These integrated models can describe physiological capabilities of the tissues based on

the underlying cell types and pathways in health and disease.

The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

INTRODUCTION

The kidney is one of the most diverse organs in the human body in terms of its cellular

heterogeneity, and possibly second only to the brain in its spatial complexity. Accordingly,

decoding the functional and pathogenic mechanisms of kidney disease has been challenging;

as such, nephrology has consistently ranked behind all other subspecialties of medicine in

terms of the drug discovery pipeline

. Delineating the cell types and subtypes in different

regions of the kidney during health and disease will help identify the tissue-level, cellular and

subcellular pathways and processes involved in disease initiation and progression, and aid in

drug discovery.

The Kidney Precision Medicine Project (KPMP) is a consortium funded by the National

Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) that aims to ethically and

safely obtain kidney biopsies from participants with chronic kidney disease (CKD) or acute

kidney injury (AKI); create a reference kidney atlas; characterize disease subgroups to stratify

patients based on molecular features of disease; and identify critical cells, pathways, and

targets for novel therapies and preventive strategies. The KPMP features an expanding set of

complementary set of high throughput assays for molecular entities that span transcriptomic,

proteomic, metabolomic profiles and spatial/structural properties of kidney tissue. These

assays, described here for the five initially funded Tissue Interrogation Sites (TISes), will be

integrated to create a comprehensive knowledge environment for the human kidney. This

knowledge environment will be compiled by the KPMP Central Hub to serve as a foundation

for a spatially specified interactive smart tissue atlas that will include molecular and

physiological information on healthy and diseased states of all individual cell types within the

adult human kidney.

The KPMP envisions that harmonization and integration of different types of molecular data

from omics assays, combined with state-of-the-art pathological and clinical descriptors, will

allow us to classify different disease subtypes and states for diagnostic and therapeutic

purposes. Numerous groups have proposed the use of integrated multiomics analysis to

characterize disease phenotypes using tools that include Bayesian, correlative, network-based

and machine learning-based clustering algorithms

2-4

. The goals of these approaches include

prediction of clinical outcomes, identification of underlying disease mechanisms and

stratification of patients

. KPMP further envisions that the final integrated analytical

environment will serve as a knowledge base for the entire field that will empower a molecular

anchored outcome prediction and development of targeted treatments.

Here, we present an overview of KPMP’s strategies to harmonize and integrate multiple

data types through identification of subcellular pathways and functions that delineate cell-level

biochemical and physiological functions. Using reference kidney pilot tissue samples, we have

performed data harmonization and integration to investigate the complementarity of different

data types and develop a pipeline for the generation of tissue maps.

RESULTS

Outline of KPMP Data Types

In these analyses, there were four transcriptomic, two proteomic, one imaging-based, and

one spatial metabolomics tissue interrogation assays that consisted of 3 to 48 different

datasets obtained from 3 to 22 participants (Supplementary Table 1). These assays and their

detailed tissue pre-analytical, tissue processing, data acquisition and analytical data

processing pipelines are outlined in Figure 1. We also summarize the steps whereby the data

sets were integrated and harmonized in the upper right side of this descriptive map view of the

KPMP data integration paradigm.

The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

Pathway- and network-level integration of multiple molecular interrogation techniques

reveals cell- and tissue-specific biological processes that are critical for renal

physiology

To overcome the inherent challenges of multiomics integration and assay dependent

divergence, we employed dynamic enrichment analysis

and network mapping

. We

evaluated the convergence of subcellular processes (SCPs) and pathways that are over-

represented in different cell types or subsegments within the kidney (in comparison to the other

cell types or subsegments), using single cell RNASeq data from PREMIERE TIS (Michigan,

Princeton, Broad)

, single nucleus RNASeq data from UCSD/WU TIS

, Laser microdissected

(LMD) bulk RNASeq (Supplementary Table 2) and LMD proteomics (Supplementary Table 3)

from the OSU/IU TIS, Near Single Cell (NSC) proteomics from the UCSF TIS (Supplementary

Table 4) and spatial metabolomics from the UTHSA-PNNL-EMBL TIS (Supplementary Table

5A/B/C from 3 different participants).

Single-cell

and -nucleus

RNASeq analysis resulted in the grouping of multiple cells or

nuclei into clusters that were assigned to a particular cell type based on the expression of

essential genes. The top 300 most significantly differentially expressed genes (DEGs) and

proteins (DEPs) of each cluster or subsegment compared to all other clusters or subsegments

as well as the metabolites assigned to glomerular and non-glomerular kidney regions

(Supplementary Table 6) were subjected to enrichment analysis to create pathway maps

(Supplementary Table 7) for the three representative cell types contributing diverse function to

kidney physiology: proximal tubular epithelial cells (Figure 2A, Supplementary Figure 1A for

nonspecific pathways), podocytes (Supplementary Figure 1B) and principal cells of the

collecting ducts (Supplementary Figure 1C). The final maps revealed highly interrelated SCPs

that are intimately linked to the physiological function of the respective cell types. Furthermore,

these SCPs are highly overlapping between assays and datasets with up to 74% of them being

repeatedly enriched in two or more assays, confirming the inherent agreement among these

different assays. While the individual significant genes or gene products coming from multiple

assays were not necessarily the same, placement of these gene products into an

interconnected pathway map showed innate congruence between the assays. The key

subcellular processes (SCPs) for the different cell types differed significantly.

Cell-type specific SCP networks predict overlapping and complementary pathways that

accurately support each cell type’s whole cell function. Proximal tubule networks predict a high

metabolic activity and describe ion reabsorption and ion-triggered glucose reabsorption

pathways as well as ammonia metabolism and detoxification pathways (Figure 2A). The

predictions are in agreement with the energy intensive ion, glucose and other small molecule

reabsorption by the proximal tubule cells

and their predominant function in ammonium

excretion and renal drug clearance

. The identification of cellular iron homeostasis pathways

documents the iron storage capacity of proximal tubule cells

that among other functions,

also mitigates kidney damage during acute kidney injury

. Podocyte/glomerular networks

focus on cell-cell/cell-matrix adhesion, glomerular basement membrane/extracellular matrix

(ECM) and actin dynamics (Supplementary Figure 1B), all pathways fundamental for barrier

generation and consequently for glomerular filtration. Principal cell/collecting duct networks

concentrate on ion reabsorption (Supplementary Figure 1C), emphasizing the important role of

the collecting duct in fine-tuning these mechanisms, thereby regulating systemic electrolyte

and water balance.

These networks document that 13% (principal cells/collecting duct), 27% (proximal tubule

cells/tubulointerstitium) and 74% (podocytes/glomerulus) of all predicted SCPs were

The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

discovered by at least two different technologies. A closer investigation of the SCPs further

highlights that the overlap is even higher, if only the SCPs that describe cell type specific

functions are considered. Furthermore, the different datasets describe complementary

subfunctions of the same physiological processes. For example, both proteomic datasets of

the proximal tubule subsegments describe fatty acid transport via carnitine shuttling into the

mitochondrial matrix, where the enzymes for mitochondrial beta oxidation are localized (Figure

2A). The PREMIERE SC RNASeq dataset predicts carnitine biosynthesis, i.e. synthesis of the

central molecule of the carnitine shuttle.

Integration of pathways that were predicted based on the tubulointerstitial metabolites, such

as ‘Glycolysis and Gluconeogenesis’ and ‘D-Arginine and D-ornithine metabolism’

(Supplementary Figure 1D), into the Molecular Biology of the Cell Ontology (MBCO) SCP-

networks (Figure 1A) further underline the predicted high metabolic activity of the proximal

tubule cells. Glomerular metabolites enrich for pathways (Supplementary Figure 1C), such as

sphingolipid and arachidonic acid metabolism, that support cell-matrix/cell-cell adhesion and

gap junctions, respectively

. Dynamic enrichment analysis of both single-cell RNA-seq

datasets predicts the involvement of another metabolic pathway, i.e. retinol metabolism, in

podocyte function, in particular as a regulator of tight junctions (Supplementary Figure 1B).

Retinoic acid has a regulatory effect on tight junctions

15, 16

and plays a significant role in

mitigating podocyte apoptosis and dedifferentiation during podocyte injury

The enrichment results suggest that proximal tubular cells have the capacity to meet the

high energy demand by not only fueling the citric acid cycle via beta oxidation, but also via

glucose and glutamine catabolism. Nevertheless, beta oxidation is most consistently predicted,

in agreement with previous studies documenting lipid metabolism as the preferential energy

source in proximal tubule cells

18, 19

. Investigation of the pathway components of these SCPs

documents that the different omics technologies identify different components of these

pathways that integrate into a comprehensive description of the relevant biochemical pathways

(Figure 2B). Each technology contributes genes, proteins and metabolites for a fuller

description of the pathways than would be obtained by a single technology. Tubulointerstitial

metabolites, for example, contain glucose, cofactors of the pyruvate dehydrogenase complex

and multiple adenosine nucleotides/nucleosides (i.e. metabolites of the energy carrier ATP). In

agreement with the results of the pathway predictions, network mapping

revealed that cell-

type specific DEGs and DEPs lie within the same area of the human interactome

(Supplementary Figure 1E), indicative of close functional relationships.

In parallel, we identified modules in a kidney-specific functional network using the top

ranked 300 marker genes and proteins across all datatypes in order to detect sets of cell-type

specific, functionally related genes

20, 21

. The module detection algorithm finds groups of genes

that form tightly connected communities within a kidney-specific functional network, which is

constructed using a data-driven approach from gene-gene relationships across thousands of

experimental assays. After module detection, gene enrichment analysis is performed within

each module to understand the key functions of the genes in each module. As with dynamic

enrichment analysis, the modules display clear cell-type specific functional enrichments

(Supplementary Table 8). For example, the network of proximal tubule marker genes includes

modules enriched in anion transport and cellular response to metal ions (Figure 2C), the

network of podocyte marker genes includes modules enriched in glomerulus development and

cell-cell adhesion (Supplementary Figure 1F), and the network of principal cell marker genes

includes modules enriched in sodium ion transport (Supplementary Figure 1G)

The copyright holder for this preprint (whichthis version posted July 24, 2020. ; https://doi.org/10.1101/2020.07.23.216507doi: bioRxiv preprint

HTML Viewer

Frequently Asked Questions (15)

Q1. What have the authors contributed in "Towards building a smart kidney atlas: network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the kidney precision medicine project" ?

Towards Building a Smart Kidney Atlas: Network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the Kidney Precision Medicine Project Jens Hansen, Rachel Sealfon, Rajasree Menon, Michael T. Eadon, Blue B. Lake, Becky Steck, Dejan Dobi, Samir Parikh, Tara K. Sidgel, Theodore Alexandrov, Andrew Schroeder, Edgar A. Otto, Christopher R. Anderton, Daria Barwinska, Guanshi Zheng, Michael P. Rose, John P. Shapiro, Dusan Velickovic, Annapurna Pamreddy, Seth Winfree, Yongqun He, Ian H. de Boer, Jeffrey B. Hodgin, Abhijit Nair, Kumar Sharma, Minnie Sarwal, Kun Zhang, Jonathan Himmelfarb, Zoltan Laszik, Brad Rovin, Pierre C. Dagher, John Cijiang He, Tarek M. El-Achkar, Sanjay Jain, Olga G. Troyanskaya, Matthias Kretzler, Ravi Iyengar, Evren U. Azeloglu for the Kidney Precision Medicine Project Consortium

Q2. What are the future works mentioned in the paper "Towards building a smart kidney atlas: network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the kidney precision medicine project" ?

Their approach is amendable to future computational modeling studies that can further improve the proposed tissue atlas. In addition to the integrated analytics presented here, the KPMP is also building a community-based Kidney Tissue Atlas Ontology ( KTAO ), which will systematically integrate different types information ( such as clinical, pathological, cell and molecular ) into a logically defined tissue atlas, which can then be further utilized to support various applications 34.

Q3. What was used for the data normalization and scaling?

‘SCTransform’ was used for data normalization and scaling (based on top 2,000 features), followed by principal component analysis.

Q4. What is the role of fatty acid oxidation in tubulointerstitial ?

Decrease in fatty acid oxidation, resulting in a loss of ATP generation, has been shown to be a significant contributor to tubulointerstitial fibrosis 19.

Q5. How many libraries were needed to reidentify podocytes?

On average 12 and 15 libraries (~3,100 and 3,835 nuclei) allowed reidentification of seven of the top 10 predicted podocyte and proximal tubule MBCO SCPs, respectively, while 21 libraries (~5,462 nuclei) were sufficient to reidentify five of the topwas not certified by peer review) is the author/funder.

Q6. How many jensenlab confidences were used to identify each gene?

Subcellular localization of each gene was identified using the jensenlab human compartment database based on a jensenlab confidence of at least four (i.e. 80% of maximum confidence in the database) 28.

Q7. What are the metabolites of the energy carrier ATP?

Tubulointerstitial metabolites, for example, contain glucose, cofactors of the pyruvate dehydrogenase complex and multiple adenosine nucleotides/nucleosides (i.e. metabolites of the energy carrier ATP).

Q8. How many libraries are needed for a consistent detection of podocytes?

Their results indicate that for a consistent detection of podocytes (i.e. in more than 95% of all down sampled datasets with the same library counts), at least 16 (~11,727 cells) or 7 libraries (1,837 nuclei) are needed if subjected to single-cell RNASeq (Figure 4A) or single-nucleus RNASeq (Figure 4B), respectively.

Q9. How many differentially expressed genes and proteins were predicted by each assay?

Top 300 differentially expressed genes (DEGs) and proteins (DEPs) predicted by each assay for each analyzed cell type/tissue subsegment.

Q10. How many SCPs can be included in the top seven predictions?

Notice that the top seven predictions based on dynamic enrichment analysis can contain more than seven SCPs, since each prediction is either a single SCP or a unique combination of two or three SCPs.

Q11. How many samples were sufficient to reproduce the results for the full dataset?

For the LMD proteomics dataset, six to eight samples were sufficient to reproduce the results obtained for the full datasets with only minor variations in the correlation of identified DEGs (Figure 4C) and SCPs (Supplementary Figure 2E) or SCP rankings (Figure 4C).

Q12. What is the way to integrate the three different assays?

An idealized integration scenario would combine these assays synergistically such that they could complement the shortcomings of each other, improve quality control metrics across technologies, and increase rigor and reproducibility of the overall study.

Q13. How many predictions are needed to re-identify the top 10 or seven predictions?

the authors determined how many SCPs have to be considered in a down-sampled analysis to re-identify at least 70% (or 50%) of the top 10 or seven predictions obtained from standard or dynamic enrichment analysis with the full dataset, respectively.

Q14. What is the role of the collecting duct in regulating systemic electrolyte and?

Principal cell/collecting duct networks concentrate on ion reabsorption (Supplementary Figure 1C), emphasizing the important role of the collecting duct in fine-tuning these mechanisms, thereby regulating systemic electrolyte and water balance.

Q15. What is the correlation between the gene expression profiles of cells and LCM segments?

To compute the Pearson correlation between the gene expression profiles of cells and LCM segments, the gene profiles were restricted to genes shared between the two datasets and showing variable expression in the single-cell dataset and correlations were computed between the logarithm of the mean ratio vector for each LCM segment and the scaled expression profile of each cell in the single cell dataset.

A reference tissue atlas for the human kidney

Summary (6 min read)

INTRODUCTION

Outline of KPMP Data Types

DISCUSSION

METHODS

Ranking of Differentially Expressed Genes and Proteins

Standard and Dynamic Enrichment Analysis

Module Detection

Enrichment Analysis of Metabolites

Integration of Single-Cell/Single-Nucleus Transcriptomics

Integration of Single-cell, Single-nucleus and Laser Capture Microdissection Bulk Transcriptomics

Post-hoc power analysis

Proteomic-Transcriptomic Co-expression Analysis

Comparison of Cell Type-specific Imaging and Transcriptomic Expression Data

Generating Pathway Maps for Beta-oxidation Network from Single-cell RNASeq Clusters

50% of all SCPs that were part of the top seven predictions based on dynamic enrichment

Single-nucleus RNASeq (UCSD/WashU) and Single-cell RNASeq (PREMIERE)

Subsegmental LMD Transcriptomics (IU/OSU)

Subsegmental LMD Proteomics (IU/OSU)

Spatial Metabolomics (UTHSA-PNNL-EMBL)

12. van Swelm RPL, Wetzels JFM, Swinkels DW. The multifaceted role of iron in renal health and

27. McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA

Citations

References

Related Papers (5)

Frequently Asked Questions (15)

Q1. What have the authors contributed in "Towards building a smart kidney atlas: network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the kidney precision medicine project" ?

Q2. What are the future works mentioned in the paper "Towards building a smart kidney atlas: network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the kidney precision medicine project" ?

Q3. What was used for the data normalization and scaling?

Q4. What is the role of fatty acid oxidation in tubulointerstitial ?

Q5. How many libraries were needed to reidentify podocytes?

Q6. How many jensenlab confidences were used to identify each gene?

Q7. What are the metabolites of the energy carrier ATP?

Q8. How many libraries are needed for a consistent detection of podocytes?

Q9. How many differentially expressed genes and proteins were predicted by each assay?

Q10. How many SCPs can be included in the top seven predictions?

Q11. How many samples were sufficient to reproduce the results for the full dataset?

Q12. What is the way to integrate the three different assays?

Q13. How many predictions are needed to re-identify the top 10 or seven predictions?

Q14. What is the role of the collecting duct in regulating systemic electrolyte and?

Q15. What is the correlation between the gene expression profiles of cells and LCM segments?