scispace - formally typeset
Open AccessPosted ContentDOI

Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

TLDR
Mesmer as mentioned in this paper is a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data, which can be adapted to harness cell lineage information present in highly multiplexed datasets.
Abstract
Understanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource.

read more

Content maybe subject to copyright    Report

1
Whole-cell segmentation of tissue images with human-level performance using large-scale data
annotation and deep learning
Noah F. Greenwald
1,2
*, Geneva Miller
3
*, Erick Moen
3
, Alex Kong
2
, Adam Kagel
2
, Christine Camacho
Fullaway
2
, Brianna J. McIntosh
1
, Ke Leow
1,2
, Morgan Sarah Schwartz
3
, Thomas Dougherty
3
, Cole
Pavelchek
3,4
, Sunny Cui
,5,6
, Isabella Camplisson
3
, Omer Bar-Tal
7
, Jaiveer Singh
2
, Mara Fong
2
,
Gautam Chaudhry
2
, Zion Abraham
2
, Jackson Moseley
2
, Shiri Warshawsky
2
, Erin Soon
2
, Shirley
Greenbaum
2
, Tyler Risom
2
, Travis Hollmann
8
, Leeat Keren
7
, Will Graf
3
, Michael Angelo
2†
, David Van
Valen
3†
1. Cancer Biology Program, Stanford University
2. Department of Pathology, Stanford University
3. Division of Biology and Bioengineering, California Institute of Technology
4. Present address: Washington University in St. Louis Medical School
5. Department of Electrical Engineering, California Institute of Technology
6. Present address: Department of Computer Science, Princeton University
7. Department of Molecular Cell Biology, Weizmann Institute of Science
8. Department of Pathology, Memorial Sloan Kettering Cancer Center
* These authors contributed equally to this work
† These authors jointly supervised this work
Abstract
Understanding the spatial organization of tissues is of critical importance for both basic and translational
research. While recent advances in tissue imaging are opening an exciting new window into the biology of
human tissues, interpreting the data that they create is a significant computational challenge. Cell
segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue
imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield
useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-
scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million
paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms.
We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs
nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed
and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms
in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the
automated extraction of key cellular features, such as subcellular localization of protein signal, which was
challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell
lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell
morphology changes during human gestation. All underlying code and models are released with permissive
licenses as a community resource.
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2021.03.01.431313doi: bioRxiv preprint

2
Introduction
Understanding the structural and functional relationships present within tissues is a challenge that is at the
forefront of basic and translational research. Recent advances in multiplexed imaging have dramatically
expanded the number of transcripts and proteins that can be quantified in a single tissue section while also
improving the throughput of these platforms
1–12
. These technological improvements have opened up
exciting new frontiers for large-scale analysis of human tissue samples. Ambitious collaborative efforts
such as the Human Tumor Atlas Network
13
, the Human BioMolecular Atlas Program
14
, and the Human
Cell Atlas
15
are now using novel imaging techniques to comprehensively characterize the location, function,
and phenotype of the cells in the human body. By generating high-quality, open-source datasets
characterizing the full breadth of human tissues, these datasets will be as transformative as the Human
Genome Project in unleashing the next era of biological discovery.
Despite this immense promise, the tools to facilitate the analysis and interpretation of these datasets at scale
do not yet exist. The clearest example of this shortcoming is the lack of a generalized algorithm for locating
single cells in images. Unlike flow cytometry or single-cell RNA sequencing methods, in which individual
cells are dissociated and physically separated from one another prior to being analyzed, tissue imaging is
performed with intact specimens. Thus, in order to extract single-cell information from images, each pixel
must be assigned to a cell after image acquisition in a process known as cell segmentation. Since the features
extracted through this process are the basis for downstream analyses like cell-type identification and tissue
neighborhood analyses
16
, inaccuracies at this stage have far-reaching consequences for interpreting image
data.
Achieving accurate and automated cell segmentation for tissues remains a substantial challenge. Depending
on the tissue, cells can be rare and dispersed within a large bed of extracellular matrix or densely packed
such that contrast between adjacent neighbors is limited. Cell size in non-neuronal mammalian tissues can
vary over two orders of magnitude
17
, while cell morphology can vary widely from small mature
lymphocytes with little discernible cytoplasm, to elongated spindle-shaped fibroblasts, to large
multinucleated osteoclasts and megakaryocytes
18
. Achieving accurate cell segmentation has been a long-
standing goal of the biological image analysis community, and a diverse array of software tools have been
developed to meet this challenge
1924
. While these efforts have been crucial for advancing our understanding
of biology across a wide range of domains, they fall short for tissue imaging data. A common shortcoming
has been the need to perform manual, image-specific adjustments to produce useful segmentations. This
lack of full automation poses a prohibitive barrier given the increasing scale of tissue imaging experiments.
Recent advances in deep learning have transformed the field of computer vision, and are increasingly being
used for a variety of tasks in biological image analysis, including cell segmentation
2531
. These methods
differ from conventional algorithms in that they learn how to perform tasks from annotated data. While the
accuracy of these new, data-driven algorithms can render difficult analyses routine, using them in practice
can be challenging: high accuracy requires a substantial amount of annotated data. Generating ground-truth
data for cell segmentation is time intensive due to the need to generate pixel-level labels; as a result, existing
datasets are of modest size (10
4
-10
5
annotations). Moreover, most public datasets
26,27,3238
annotate the
location of cell nuclei rather than the whole cell. Deploying pre-trained models to the life science
community is also difficult, and has been the focus of a number of recent works
3942
. Despite deep learning’s
potential, these challenges have caused whole-cell segmentation in tissue imaging data to remain an open
problem.
Here, we sought to close these gaps by creating an automated, simple, and scalable algorithm for nuclear
and whole-cell segmentation that performs accurately across a diverse range of tissue types and imaging
platforms. Developing this algorithm required two innovations: (1) a scalable approach for generating large
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2021.03.01.431313doi: bioRxiv preprint

3
volumes of pixel-level training data in tissue images and (2) an integrated deep learning pipeline that utilizes
these data to achieve human-level performance. To address the first challenge, we developed a
crowdsourced, human-in-the-loop approach for segmenting cells in tissue images where humans and
algorithms work in tandem to produce accurate annotations (Figure 1a). We used this pipeline to create
TissueNet, a comprehensive segmentation dataset of >1 million paired whole-cell and nuclear annotations.
These curated annotations were derived from images of nine different organs acquired from six distinct
imaging platforms. TissueNet is the largest cell-segmentation dataset assembled to date, containing twice
as many nuclear and 16 times as many whole-cell labels as all previously published datasets combined. To
address the second challenge, we developed Mesmer, a deep learning pipeline for scalable, user-friendly
segmentation of imaging data. Mesmer was trained on TissueNet and is the first algorithm to demonstrate
human-level performance on cell segmentation. To enable broad use by the scientific community, we
harnessed DeepCell, an open-source collection of software libraries, to create cloud-native software for
using Mesmer, including plugins for ImageJ and QuPath. We have made all code, data, and trained models
available under a permissive license as a community resource, setting the stage for application of these
modern, data-driven methods to a broad range of fundamental and translational research challenges.
A human-in-the-loop approach drives scalable construction of TissueNet
Existing annotated datasets for cell segmentation are limited in scope and scale (Figure 1b)
26,27,3238
. This
limitation is largely due to the linear, time-intensive approach used to construct them, which requires the
border of every cell in an image to be manually demarcated. This approach scales poorly, as the time
required to label each image remains constant throughout the annotation effort. We therefore implemented
a three-phase approach to create TissueNet. In the first phase, expert human annotators outlined the border
of each cell in 80 images. The labeled images were then used to train a preliminary model (Figure 1a, left;
Methods). Once the preliminary model reached a sufficient level of accuracy, correcting mistakes required
less time than labeling from scratch. Although the exact point at which this transition occurs depends on
model quality and training data diversity, we found that 10,000 cells was a reasonable approximation.
The process then moved to the second phase (Figure 1a, middle), where images were first passed through
the model to generate predicted annotations. These predictions were sent to crowdsourced annotators to
correct errors. The corrected annotations then underwent final inspection by an expert prior to being added
to the training dataset. When enough new data were compiled, a new model was trained and phase two was
repeated. Each iteration yielded more training data, which led to improved model accuracy and fewer errors
that needed to be manually corrected. This virtuous cycle continued until the model achieved human-level
performance. At this point, we transitioned to the third phase (Figure 1a, right), where the model was run
without human assistance to produce high-quality predictions. One advantage of this approach is that we
utilized annotators with different amounts of bandwidth and expertise: experts have experience but limited
bandwidth, while crowdsourced annotators have limited experience but higher bandwidth. Triaging each
task according to its difficulty and accessing a much larger pool of human annotators further reduced the
time and cost of dataset construction.
Human-in-the-loop pipelines require specialized software that is optimized for the task and can be scalably
deployed. We therefore developed DeepCell Label
43
, a browser-based graphical user interface optimized
for editing existing cell annotations in tissue images (Figure S1a, Methods). DeepCell Label is supported
by a scalable cloud backend that dynamically adjusts the number of servers according to demand (Figure
S1b). Using DeepCell Label, we trained annotators from multiple crowdsourcing platforms to identify
whole-cell and nuclear boundaries. To further simplify our annotation workflow, we integrated DeepCell
Label into a pipeline that allowed us to prepare and submit images for annotation, have users annotate those
images, and download the results. The images and resulting labels were used to train and update our model,
completing the loop (Figure S1c; Methods).
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2021.03.01.431313doi: bioRxiv preprint

4
Our goal in creating TissueNet was to use it to power general-purpose tissue segmentation models. To
ensure that models trained on TissueNet would serve as much of the imaging community as possible, we
made two key choices. First, all data in TissueNet contains two channels, a nuclear channel (such as DAPI)
and a membrane or cytoplasm channel (such as E-cadherin or Pan-Keratin). Although some highly
multiplexed platforms are capable of imaging dozens of markers at once
1,2,4,6
, restricting TissueNet to
include only the minimum number of channels necessary for whole-cell segmentation maximizes the
number of imaging platforms where the resulting models can be used. Second, the data in TissueNet are
derived from a wide variety of tissue types, disease states, and imaging platforms. This diversity of data
allows models trained on TissueNet to handle data from many different experimental setups and biological
questions. The images included in TissueNet were acquired from the published and unpublished works of
labs who routinely perform tissue imaging
4451
. Thus, while this first release of TissueNet encompasses the
tissue types most commonly analyzed by the community, we expect that subsequent versions of TissueNet
will be expanded to include less-studied organs.
Figure 1: A human-in-the-loop approach enables scalable, pixel-level annotation of large image
collections. a, This approach has three phases. During phase 1, annotations are created from scratch to train a model.
During phase 2, new data are fed through a preliminary model to generate predictions. These predictions are used as
a starting point for annotators to correct. As more images are corrected, the model improves, which decreases the
number of errors, increasing the speed with which new data can be annotated. During phase 3, an accurate model is
run without human correction. b, TissueNet has more nuclear and whole-cell annotations than all previously published
datasets. c, The number of cell annotations per platform in TissueNet. d, The number of cell annotations per tissue
type in TissueNet. e, The number of hours of annotation time required to create TissueNet.
Crowd
Expert
0
2k
4k
TissueNet
annotation time
Cells
per platform
Cells
per tissue
CODEX
CyCIF
Vectra
MIBI-TOF
MxIF
IMC
0
50k
100k
150k
200k
250k
300k
350k
50k
100k
150k
200k
250k
300k
350k
0
All previous
TissueNet
Published
segmentations
b c d
e
Annotations
Annotations
Hours
Nuclear
Whole Cell
0
400k
800k
1.2M
Annotations
Annotation throughput
a
Crowdsourced
correction
Expert
correction
Model
annotation
Add to
training
data
Retrain
and update
Preliminary
model
Expert
annotation
Model
training
Fully
automated
final model
Phase 1 Phase 2 Phase 3
Pancreas
Tonsil
Breast
Lung
Colon
Esoph.
Lymph
Skin
Spleen
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2021.03.01.431313doi: bioRxiv preprint

5
As a result of the scalability of our human-in-the-loop approach to data labeling, TissueNet is larger than
the sum total of all previously published datasets
26,27,3238
(Figure 1b), with 1.3 million whole-cell
annotations and 1.2 million nuclear annotations. TissueNet contains data from six imaging platforms
(Figure 1c), nine organs (Figure 1d), and includes both histologically normal and diseased tissue (e.g.,
tumor resections). TissueNet also encompasses three species, with images from human, mouse, and
macaque. Constructing TissueNet required >4,000 person hours, the equivalent of nearly 2 person-years of
full-time effort (Figure 1e). With an average hourly rate of $6 per hour, we anticipate that subsequent
datasets of this size will cost around USD $25,000 to producea significant reduction versus highly trained
($30/h) or expert pathologist (>$150/h) annotators.
Mesmer is a novel algorithm for accurate whole-cell segmentation of tissue data
An ideal deep learning model for cell segmentation has two specific requirements. First, a suitable model
must be accurate, which is challenging given the range of cell morphologies, tissue types, and image
platforms present in TissueNet. A model capable of accurately performing whole-cell segmentation in this
setting needs sufficient representational capacity to understand and interpret these heterogeneous images.
Second, a suitable model needs to be fast. Image datasets are increasing rapidly in size, and a model with
high performance but poor inference speed would be of limited utility.
To satisfy these requirements we developed the PanopticNet deep learning architecture. To ensure adequate
model capacity, PanopticNets use a ResNet50 backbone coupled to a modified Feature Pyramid Network
(FPN)
5254
(Figure S2a; Methods). ResNet backbones are a popular architecture for extracting features from
imaging data for a variety of tasks
54
. FPNs aggregate features across length scales, producing
representations that contain both low-level details and high-level semantics
52
. To perform segmentation,
two semantic heads are attached to the highest level of the FPN to create pixel-level predictions. These
heads perform two separate prediction tasks. The first head predicts whether a pixel is inside a cell, at the
cell boundary, or part of the image background
25,26
. The second head predicts the distance of each pixel
within a cell to the cell centroid (Figure S2a; Methods); we extended previous work
30,55
by explicitly
accounting for cell size in this step.
We used the PanopticNet architecture and TissueNet to create Mesmer, a deep learning pipeline for accurate
nuclear and whole-cell segmentation of tissue data. Mesmer’s PanopticNet model contains four semantic
heads (two for nuclear segmentation and two for whole-cell segmentation) that are attached to a common
backbone and FPN. The input to Mesmer is a nuclear image (e.g. DAPI) to define the nucleus of each cell
and a membrane or cytoplasm image (e.g. CD45 or E-cadherin) to define the shape of each cell (Figure
2a). These inputs are normalized
56
(to improve robustness), tiled into patches of fixed size (to allow
processing of images with arbitrary dimensions), and then fed to the PanopticNet model. The model outputs
are then untiled
57
to produce predictions for the centroid and boundary of every nucleus and cell in the
image. The centroid and boundary predictions are used as inputs to a watershed algorithm
58
to create the
final instance segmentation mask for each nucleus and each cell in the image (Methods).
We used the newly created TissueNet dataset to train Mesmer’s model. We randomly partitioned TissueNet
into training (80%), validation (10%), and testing (10%) splits. The training split was used to directly update
the model weights during training, with the validation split used to assess increases in model accuracy after
each epoch. The test split was completely held out during training and used only to evaluate model
performance after training. We used standard image augmentation during training to increase model
robustness. To benchmark model accuracy, we built off our prior framework for classifying segmentation
errors
37
. In brief, we perform a linear assignment between predicted cells and ground truth cells. Cells that
map 1-to-1 with a ground truth cell are marked as accurately segmented; all other cells are assigned to one
of several error modes depending on their relationship with the ground truth data. We use these assignments
to calculate precision, recall, F1 score, and Jaccard index; see the Methods section for detailed descriptions.
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2021.03.01.431313doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI

Spatial components of molecular tissue biology

TL;DR: In this paper , the authors identify the key biological questions in spatial analysis of tissues and develop the requisite computational tools to address them, and group these biological problems and related computational algorithms into classes across length scales, thus characterizing common issues that need to be addressed.
Journal ArticleDOI

Multiplexed imaging mass cytometry of the chemokine milieus in melanoma characterizes features of the response to immunotherapy

TL;DR: The data highlight the strength of targeted RNA and protein codetection to analyze tumor immune microenvironments based on chemokine expression and suggest that the formation of tertiary lymphoid structures may be accompanied by naïve and naïve-like T cell recruitment, which may contribute to antitumor activity.
Journal ArticleDOI

Image-based cell phenotyping with deep learning.

TL;DR: Applications wherein deep learning is powering the recognition, profiling, and prediction of visual phenotypes to answer important biological questions are reviewed.
Journal ArticleDOI

Strategies for Accurate Cell Type Identification in CODEX Multiplexed Imaging Data.

TL;DR: In this article, the authors created single-cell multiplexed imaging datasets by performing CODEX on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Journal ArticleDOI

NIH Image to ImageJ: 25 years of image analysis

TL;DR: The origins, challenges and solutions of NIH Image and ImageJ software are discussed, and how their history can serve to advise and inform other software projects.
Journal ArticleDOI

Fiji: an open-source platform for biological-image analysis

TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning" ?

The authors created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. The authors demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. The authors further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. CC-BY-NC 4. 0 International license available under a ( which was not certified by peer review ) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. 

The centroid and boundary predictions are used as inputs to a watershed algorithm58 to create the final instance segmentation mask for each nucleus and each cell in the image (Methods). 

To integrate the morphological features that the authors extracted, the authors performed k-means clustering on the morphology profiles that the authors collected for every cell (Methods). 

Because annotators only needed to correct the mistakes made by the model, not annotate every cell in each image, annotationtime was linked to model performance. 

The authors use the intersection over union (IOU) of the cell masks as the criterion for assessing whether two cells match, with thresholds of 0.4 and 0.1 for the cost matrix and IOU overlaps, respectively. 

Inaccuracies in segmentation can lead to substantial bias in the identification and enumeration of the cells present in an image. 

To simulate low signal-to-noise ratio and high background staining, the authors added uniform random noise of increasing magnitude to each pixel. 

For each channel, the authors selected fields of view in which the marker showed clear expression, and computed the localization within each cell, after removing the bottom 20% lowly expressing cells within each marker. 

In line with its lower recall score (Figure 2c), Cellpose failed to identify a large fraction of the cells in the image (Figure 2f), likely due to the relative scarcity of tissue images in the data used to train Cellpose. 

To construct the dataset used for model training, individual .npz files containing annotated images from a single experiment were combined. 

To support accurate crowd annotation of dense images, the authors found that supplying smaller image crops led to significantly better crowd performance (data not shown). 

This limitation is largely due to the linear, time-intensive approach used to construct them, which requires the border of every cell in an image to be manually demarcated. 

Several recent tools have sought to overcome this barrier with a variety of software-engineering approaches, including browser-based software (ImJoy41), Google Colab (ZeroCostDL4Mic42), a centralized web portal (NucleAIzer29, Cellpose28, DeepCell39), and ImageJ plugins (StarDist60, DeepCell39). 

Trending Questions (1)
Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

The paper presents a deep learning model called Mesmer that achieves human-level performance in whole-cell segmentation of tissue images.