A reference tissue atlas for the human kidney
Summary (6 min read)
INTRODUCTION
- The kidney is one of the most diverse organs in the human body in terms of its cellular heterogeneity, and possibly second only to the brain in its spatial complexity.
- Delineating the cell types and subtypes in different regions of the kidney during health and disease will help identify the tissue-level, cellular and subcellular pathways and processes involved in disease initiation and progression, and aid in drug discovery.
- The KPMP features an expanding set of complementary set of high throughput assays for molecular entities that span transcriptomic, proteomic, metabolomic profiles and spatial/structural properties of kidney tissue.
- The KPMP envisions that harmonization and integration of different types of molecular data from omics assays, combined with state-of-the-art pathological and clinical descriptors, will allow us to classify different disease subtypes and states for diagnostic and therapeutic purposes.
- Numerous groups have proposed the use of integrated multiomics analysis to characterize disease phenotypes using tools that include Bayesian, correlative, network-based and machine learning-based clustering algorithms 2-4.
Outline of KPMP Data Types
- In these analyses, there were four transcriptomic, two proteomic, one imaging-based, and one spatial metabolomics tissue interrogation assays that consisted of 3 to 48 different datasets obtained from 3 to 22 participants (Supplementary Table 1).
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
- Hierarchical clustering of the correlation coefficients documented that the absolute gene and protein expression values are specific for a particular platform and not for their anatomical origin (Supplementary Figure 3A).
- While imaging assays identify the spatial localization of individual cells together with their expression signatures for a limited number of proteins, single-cell RNASeq assays provide more extensive transcriptomic profiles for individual cells.
- The pathways that are predicted for non-glomerular metabolites either overlapped with or were closely related to the pathways that are predicted for proximal tubule cells and subsegments based on the other datasets (Figure 2A).
DISCUSSION
- The advances in transcriptomic technologies along with other omics and imaging assays offer unprecedented insights into the organization of tissues at cellular resolution and the molecular constituents of the different cell types and their subtypes.
- The authors predict that these subtypes differ in their potential for lipid metabolism, which is critically important for the physiological function of proximal tubule cells, as cellular energetics have been shown to be critical for reabsorptive activity.
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
- Nevertheless, their post-hoc power analysis can help to estimate the reliability of an identified cell subtype or predicted disease mechanism by documenting that it is consistently recovered using down-sampled datasets.
- In addition to the integrated analytics presented here, the KPMP is also building a community-based Kidney Tissue Atlas Ontology (KTAO), which will systematically integrate different types information (such as clinical, pathological, cell and molecular) into a logically defined tissue atlas, which can then be further utilized to support various applications 34.
METHODS
- Omics and imaging assays used within KPMP target different types of molecular components with different resolution, sensitivity and precision.
- An important function of the KPMP Central Hub is to integrate the different types of data using a set of analytical techniques.
- The pilot data presented for each assay comprises 3 to 48 different datasets that are obtained from 3 to 22 participants (Supplementary Table 1).
- Participants kidne tissue was procured from a spectrum of tissue resources including from unaffected parts of tumor nephrectomy specimen (n=38), living donor preperfusion biopsies (n=3), diseased donor nephrectomies (n=5), and normal surveillance transplant (n=5) and native kidney biopsies (n=4).
- Within each assay the authors generated lists of differentially expressed genes (DEGs), proteins (DEPs) and metabolites that describe those genes, proteins or metabolites that are upregulated or enriched in a particular single cell cluster, single nucleus cluster or kidney subsegment, if compared to all other clusters or subsegments.
Ranking of Differentially Expressed Genes and Proteins
- In the case of the DEGs and DEPs that were used for dynamic enrichment analysis, 6 module identification, 21 and post-hoc power analysis, single nucleus and single cell DEGs were first ranked by p-value and then by decreasing fold changes (i.e., fold changes were used as a tiebreaker).
- Top ranked 300 entities were subjected to downstream analysis.
- Similarly, DEGs and DEPs obtained for each kidney subsegment based on LMD bulk RNASeq, or LMD and NSC proteomics, were ranked first by p-value and decreasing fold changes and the top ranked 300 DEGs and DEPs subjected to pathway enrichment analysis or module detection (see below).
- Therefore, the authors could not calculate p-values for the LMD and NSC technologies.
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
Standard and Dynamic Enrichment Analysis
- Top DEGs and DEPs for each podocyte cluster/glomerulus, proximal tubule cell cluster/tubulointerstitium and principal cell cluster/collecting duct subsegment were separately subjected to standard enrichment analysis using Gene Ontology Biological Processes (GO BPs) or the Molecular Biology of the Cell Ontology (MBCO) level-3 subcellular processes (SCPs) 6 and Fisher’s Exact Test.
- Only genes/proteins that are detected by this method and statistically analyzed for differential expression can be identified as DEGs/DEPs and only these genes/proteins are considered as the background set for the Fisher’s Exact test.
- Ontological background genes/proteins were all genes that are annotated to at least one pathway within that particular ontology.
- Dynamic enrichment analysis uses these relationships to generate context-specific higher-level processes by merging functionally related SCPs that contain at least one DEG or DEP.
- The top five predicted SCPs or merged SCPs are connected based on the inferred relationships, and all networks for a particular cell type/segment merged, whereby each SCP was color-coded according to the source assay(s) that initiated its dynamic enrichment.
Module Detection
- In parallel to enrichment analyses, the authors also performed another network-based pathway enrichment technique, identifying modules of cell-type specific marker genes within the kidneyspecific functional network using the HumanBase interface (hb.flatironinstitute.org).
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
- BioRxiv preprint 14 DEPs from each proteomics dataset.
- Module detection is a network-based approach described in Krishnan et al., and construction of the functional networks is described in Greene et al 20, 21.
- Modules are detected using a community clustering algorithm based on connectivity between genes in the kidney-specific functional network, and enrichment analysis is subsequently performed to identify functional enrichments in each module.
Enrichment Analysis of Metabolites
- All glomerular and nonglomerular metabolites that were identified for the three participants were merged and subjected to pathway enrichment analysis using MetaboAnalyst 25.
- The top six predicted metabolic pathways were mapped onto MBCO pathways whenever possible; if they did not have a corresponding pathway, the original pathway names were preserved.
Integration of Single-Cell/Single-Nucleus Transcriptomics
- In contrast to bulk mRNA sequencing, where the gene expression measurements reflect an average across all captured cell types, single-cell or single-nucleus mRNA sequencing allows the measurement and comparison of comprehensive gene sets obtained from individual cells.
- Single-cell transcriptomic data was produced by PREMIERE (24 libraries from 22 participants) 8 and UCSF (10 libraries from 10 participants), whereas the single-nucleus data was made by UCSD (47 libraries from 15 participants).
- Data from each site were first processed using the Seurat 3.0 R package 26.
- These anchor genes were then used to harmonize the datasets.
- The downstream process included scaling, principal component analysis, batch integration using harmony, dimensionality reduction using Uniform Manifold Approximation and Projection (UMAP), and unsupervised clustering.
Integration of Single-cell, Single-nucleus and Laser Capture Microdissection Bulk Transcriptomics
- To integrate single-cell sequencing, single-nucleus sequencing, and LMD bulk transcriptomic datasets, the authors first determined the overlap between genes identified both in the LMD dataset and in the corresponding single-cell transcriptomic dataset.
- From this set of was not certified by peer review) is the author/funder.
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
- The authors then computed the Pearson correlation between each individual cell in a scaled single-cell dataset and the LMD transcriptomic dataset for the same participant.
- Using this approach, the authors can assign each cell to the appropriate LMD segment that shows the highest correlation value.
Post-hoc power analysis
- The PREMIERE single-cell RNASeq 8 and the UCSD/WU single-nucleus RNASeq 9 datasets were obtained from 22 and 15 participants, respectively, whose samples were sequenced in 24 and 47 libraries.
- The authors used jackstraw analysis to identify the last significant principal component (alpha = 0.01) among the top 20 components.
- To document the reliability of that cell type assignment the authors compared its p-value to the p-value of the second prediction (that cell type whose essential genes had the second most significant enrichment among the DEGs of that cluster).
- The authors progressively and randomly removed libraries from the full datasets to generate 100 non-overlapping downsampled datasets for each number of remaining participants.
- Additionally, the top 300 significant DEPs of each subsegment were subjected to enrichment analysis and predicted pathways compared as described above.
Proteomic-Transcriptomic Co-expression Analysis
- LMD and NSC proteomic datasets identified protein expression in two kidney subsegments: glomeruli and tubulointerstitium for LMD and glomeruli and proximal tubule for NSC.
- The authors identified technology and participant specific cluster gene expression, using the “Average Expression” functionality embedded in Seurat R package (RNA assay, counts slot) on the cells/nuclei assigned to the same clusters in the integrated PREMIERE, UCSF and UCSD/WU data analysis described above.
- The intersection of all background sets was defined as the set of common genes.
- Ratios were inverted to describe proximal tubule/tubulointerstitial specific gene expression.
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
Comparison of Cell Type-specific Imaging and Transcriptomic Expression Data
- To integrate cell type-specific imaging and transcriptomic data, the authors first constructed matrices with average expression values for each gene in each cell type cluster for both the set of 16 normalized integrated transcriptomic clusters and the CODEX clusters.
- The authors normalized each gene in both transcriptomic and CODEX matrices to have a mean of 0 and standard deviation of 1.
- The authors then filtered both datasets to include only genes represented in both the transcriptomic and the imaging datasets and computed the average expression of each gene/protein in each cell type.
- The authors next considered the problem of constructing a matrix to computationally map transcriptomic cell clusters to the imaging cell clusters.
- Before visualizing matrix M as a heatmap, the authors first normalized each row to have mean of 0 and standard deviation of 1 in order to identify the transcriptomic cell types that are weighted most heavily in the mapping to each imaging cell type.
Generating Pathway Maps for Beta-oxidation Network from Single-cell RNASeq Clusters
- To better understand one of the most significantly enriched pathways in their integrated analytics of proximal tubules, reactions involved in fatty acid beta oxidation were extracted from KEGG (www.genome.jp/kegg). was not certified by peer review) is the author/funder.
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
- Datasets were subjected to an automated single-cell/nucleus and proteomic data analysis pipeline and results compared between the downsampled and complete reference datasets.
- ‘Cluster count’ documents how many clusters were assigned to a particular cell type.
50% of all SCPs that were part of the top seven predictions based on dynamic enrichment
- Libraries label the number of used sequencing libraries for each down-sampled dataset, cells the average number of total cells that were obtained from those libraries.
- The copyright holder for this preprint (whichthis version posted July 24, 2020.
- Labels describing podocyte/glomerular and proximal tubule/tubulointerstitium RNASeq and proteomic datasets are colored aquamarine and orange, respectively.
- Curly brackets group samples obtained by the same technology: 1: LMD RNASeq, 2: NSC/LMD Proteomics, 3: SC RNASeq PREMIERE, 4: SC RNASeq UCSF, 5: SN RNASeq UCSD/WU.
- BioRxiv preprint Figure 7 A Prior knowledge Integration with multiomics and imaging data Models and predictions of tissue function Cell subtype-specific compartmental metabolic networks Dynamic models of metabolic pathways in different subtypes of proximal tubule cells Subcellular compartments of enzymes Metabolic reactions.
Single-nucleus RNASeq (UCSD/WashU) and Single-cell RNASeq (PREMIERE)
- UMI count matrixes and list of differentially expressed genes were downloaded from published analyses for the PREMIERE TIS (composed of Michigan, Princeton, Broad) singlecell RNA sequencing 8 and UCSD/WashU TIS Single-nucleus 9 datasets.
- The authors excluded the proximal tubular cells-3 and principal cells-2 clusters from the single-nucleus RNASeq dataset, since these clusters showed an inflammatory or a stress response.
Subsegmental LMD Transcriptomics (IU/OSU)
- A comprehensive Laser MicroDissection (LMD) protocol is published on protocols. io (https://www.protocols.io/view/laser-microdissection-8rkhv4w).
- Briefly, 12 m frozen sections are obtained from an Optimal Cutting Temperature (OCT) preserved tissue block and adhered to LMD membrane slides (Leica, Buffalo Grove, IL).
- Slides undergo dissection with a Leica LMD6500 system with pulsed UV laser.
- RNA quality is assessed by bioanalyzer, ribosomal RNA is depleted, and cDNA libraries are prepared using the SMARTer Universal Low Input RNA Kit (Takara, No. 634938).
- Total read counts mapping to each gene were generated with edgeR, normalized, and converted to expression ratios.
Subsegmental LMD Proteomics (IU/OSU)
- A comprehensive Laser MicroDissection (LMD) proteomics protocol is published on protocols.
- The authors LMD proteomic methods have also been previously published in detail 29, 30.
- Glomerular gene expression was compared to the tubulointerstitial gene expression using an unpaired t-test with equal variance.
- The entire 3-D fluorescence imaging and tissue cytometry protocol is published on protocols.
- Images were acquired in up to 8 channels using a Leica SP8 Confocal Microscope.
Spatial Metabolomics (UTHSA-PNNL-EMBL)
- 10 m thick renal cortical tissues were sectioned on a cryostat (Leica Microsystems) and prepared for matrix assisted laser deposition imaging mass spectrometry by spraying 3 with the norharmane matrix using the TM-Sprayer automated spraying robot (HTX Technology).
- For dynamic enrichment analysis all SCPs among the top 25 predictions were compared.
- Top 300 DEGs or DEPs were subjected to pathway enrichment analysis and (D) the top-50 GO BPs and (E) MBCO level-3 SCPs subjected to hierarchical clustering based on pairwise correlation coefficients between - log10(p-values).
- A B Supplementary Figure 4 Supplementary Figure 4.
12. van Swelm RPL, Wetzels JFM, Swinkels DW. The multifaceted role of iron in renal health and
- Proximal tubule H-ferritin mediates iron trafficking in acute kidney injury.
- Changes in membrane sphingolipid composition modulate dynamics and adhesion of integrin nanoclusters.
- Differentiation of human neuroblastoma cell line IMR-32 by sildenafil and its newly discovered analogue IS00384.
27. McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA
- Binder JX, Pletscher-Frankild S, Tsafou K, et al.
- Unification and visualization of protein subcellular localization evidence, also known as COMPARTMENTS.
- Characterization of glomerular diseases using proteomic analysis of laser capture microdissected glomeruli.
- Modeling Kidney Disease Using Ontology: Perspectives from the KPMP.
Did you find this useful? Give us your feedback
Citations
11 citations
7 citations
References
7,892 citations
1,596 citations
1,357 citations
1,148 citations
1,121 citations
Related Papers (5)
Frequently Asked Questions (15)
Q2. What are the future works mentioned in the paper "Towards building a smart kidney atlas: network-based integration of multimodal transcriptomic, proteomic, metabolomic and imaging data in the kidney precision medicine project" ?
Their approach is amendable to future computational modeling studies that can further improve the proposed tissue atlas. In addition to the integrated analytics presented here, the KPMP is also building a community-based Kidney Tissue Atlas Ontology ( KTAO ), which will systematically integrate different types information ( such as clinical, pathological, cell and molecular ) into a logically defined tissue atlas, which can then be further utilized to support various applications 34.
Q3. What was used for the data normalization and scaling?
‘SCTransform’ was used for data normalization and scaling (based on top 2,000 features), followed by principal component analysis.
Q4. What is the role of fatty acid oxidation in tubulointerstitial ?
Decrease in fatty acid oxidation, resulting in a loss of ATP generation, has been shown to be a significant contributor to tubulointerstitial fibrosis 19.
Q5. How many libraries were needed to reidentify podocytes?
On average 12 and 15 libraries (~3,100 and 3,835 nuclei) allowed reidentification of seven of the top 10 predicted podocyte and proximal tubule MBCO SCPs, respectively, while 21 libraries (~5,462 nuclei) were sufficient to reidentify five of the topwas not certified by peer review) is the author/funder.
Q6. How many jensenlab confidences were used to identify each gene?
Subcellular localization of each gene was identified using the jensenlab human compartment database based on a jensenlab confidence of at least four (i.e. 80% of maximum confidence in the database) 28.
Q7. What are the metabolites of the energy carrier ATP?
Tubulointerstitial metabolites, for example, contain glucose, cofactors of the pyruvate dehydrogenase complex and multiple adenosine nucleotides/nucleosides (i.e. metabolites of the energy carrier ATP).
Q8. How many libraries are needed for a consistent detection of podocytes?
Their results indicate that for a consistent detection of podocytes (i.e. in more than 95% of all down sampled datasets with the same library counts), at least 16 (~11,727 cells) or 7 libraries (1,837 nuclei) are needed if subjected to single-cell RNASeq (Figure 4A) or single-nucleus RNASeq (Figure 4B), respectively.
Q9. How many differentially expressed genes and proteins were predicted by each assay?
Top 300 differentially expressed genes (DEGs) and proteins (DEPs) predicted by each assay for each analyzed cell type/tissue subsegment.
Q10. How many SCPs can be included in the top seven predictions?
Notice that the top seven predictions based on dynamic enrichment analysis can contain more than seven SCPs, since each prediction is either a single SCP or a unique combination of two or three SCPs.
Q11. How many samples were sufficient to reproduce the results for the full dataset?
For the LMD proteomics dataset, six to eight samples were sufficient to reproduce the results obtained for the full datasets with only minor variations in the correlation of identified DEGs (Figure 4C) and SCPs (Supplementary Figure 2E) or SCP rankings (Figure 4C).
Q12. What is the way to integrate the three different assays?
An idealized integration scenario would combine these assays synergistically such that they could complement the shortcomings of each other, improve quality control metrics across technologies, and increase rigor and reproducibility of the overall study.
Q13. How many predictions are needed to re-identify the top 10 or seven predictions?
the authors determined how many SCPs have to be considered in a down-sampled analysis to re-identify at least 70% (or 50%) of the top 10 or seven predictions obtained from standard or dynamic enrichment analysis with the full dataset, respectively.
Q14. What is the role of the collecting duct in regulating systemic electrolyte and?
Principal cell/collecting duct networks concentrate on ion reabsorption (Supplementary Figure 1C), emphasizing the important role of the collecting duct in fine-tuning these mechanisms, thereby regulating systemic electrolyte and water balance.
Q15. What is the correlation between the gene expression profiles of cells and LCM segments?
To compute the Pearson correlation between the gene expression profiles of cells and LCM segments, the gene profiles were restricted to genes shared between the two datasets and showing variable expression in the single-cell dataset and correlations were computed between the logarithm of the mean ratio vector for each LCM segment and the scaled expression profile of each cell in the single cell dataset.