scispace - formally typeset
Open AccessPosted ContentDOI

Labkit: Labeling and Segmentation Toolkit for Big Image Data

Reads0
Chats0
TLDR
LO_SCPLOWABKITC_SCplow as mentioned in this paper is a plugin for microscopy image segmentation that can be applied to single and multi-channel images as well as to timelapse movies.
Abstract
We present LO_SCPLOWABKITC_SCPLOW, a user-friendly Fiji plugin for the segmentation of microscopy image data. It offers easy to use manual and automated image segmentation routines that can be rapidly applied to single- and multi-channel images as well as to timelapse movies in 2D or 3D. LO_SCPLOWABKITC_SCPLOW is specifically designed to work efficiently on big image data and enables users of consumer laptops to conveniently work with multiple-terabyte images. This efficiency is achieved by using ImgLib2 and BigDataViewer as the foundation of our software. Furthermore, memory efficient and fast random forest based pixel classification inspired by the Waikato Environment for Knowledge Analysis (Weka) is implemented. Optionally we harness the power of graphics processing units (GPU) to gain additional runtime performance. LO_SCPLOWABKITC_SCPLOW is easy to install on virtually all laptops and workstations. Additionally, LO_SCPLOWABKITC_SCPLOW is compatible with high performance computing (HPC) clusters for distributed processing of big image data. The ability to use pixel classifiers trained in LO_SCPLOWABKITC_SCPLOW via the ImageJ macro language enables our users to integrate this functionality as a processing step in automated image processing workflows. Last but not least, LO_SCPLOWABKITC_SCPLOW comes with rich online resources such as tutorials and examples that will help users to familiarize themselves with available features and how to best use LO_SCPLOWABKITC_SCPLOW in a number of practical real-world use-cases.

read more

Content maybe subject to copyright    Report

DRAFT
LABKIT: Labeling and Segmentation Toolkit
for Big Image Data
Matthias Arzt
1,2
, Joran Deschamps
1,2,3
, Christopher Schmied
3
, Tobias Pietzsch
1,2
, Deborah Schmidt
1,2,4
, Robert Haase
1,2,5
,
and Florian Jug
1,2,3,
1
Center for Systems Biology Dresden, Dresden, Germany
2
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
3
Fondazione Human Technopole, Milan, Italy
4
Max Delbrück Center for Molecular Medicine, Berlin, Germany
5
DFG Cluster of Excellence "Physics of Life", TU-Dresden, Dresden, Germany
We present LABKIT, a user-friendly Fiji plugin for the segmen-
tation of microscopy image data. It offers easy to use manual
and automated image segmentation routines that can be rapidly
applied to single- and multi-channel images as well as to time-
lapse movies in 2D or 3D. LABKIT is specifically designed to
work efficiently on big image data and enables users of con-
sumer laptops to conveniently work with multiple-terabyte im-
ages. This efficiency is achieved by using ImgLib2 and Big-
DataViewer as the foundation of our software. Furthermore,
memory efficient and fast random forest based pixel classifica-
tion inspired by the Waikato Environment for Knowledge Anal-
ysis (Weka) is implemented. Optionally we harness the power of
graphics processing units (GPU) to gain additional runtime per-
formance. LABKIT is easy to install on virtually all laptops and
workstations. Additionally, LABKIT is compatible with high
performance computing (HPC) clusters for distributed process-
ing of big image data. The ability to use pixel classifiers trained
in LABKIT via the ImageJ macro language enables our users
to integrate this functionality as a processing step in automated
image processing workflows. Last but not least, LABKIT comes
with rich online resources such as tutorials and examples that
will help users to familiarize themselves with available features
and how to best use LABKIT in a number of practical real-world
use-cases.
segmentation | labeling | machine learning | random forest | Fiji | open-source
Correspondence: florian.jug@fht.org
Introduction
In recent years, new and powerful microscopy and sam-
ple preparation techniques have emerged, such as light-
sheet (1), super-resolution microscopy (26), modern tis-
sue clearing (7, 8), or serial section scanning electron mi-
croscopy (9, 10) enabling researchers to observe biological
tissues and their underlying cellular and molecular composi-
tion and dynamics in unprecedented details. To localize ob-
jects of interest and exploit such rich datasets quantitatively,
scientists need to perform image segmentation, e.g. dividing
all pixels in an image into foreground pixels (part of objects
of interest) and background pixels.
The result of such a pixel classification is a binary mask,
or a (multi-)label image if more than one foreground class
are needed to discriminate different objects. Masks or la-
bel images enable downstream analysis that extract biolog-
ically meaningful semantic quantities, such as the number
of objects in the data, morphological properties of these ob-
jects (shape, size, etc.), or tracks of object movements over
time. In most practical applications, image segmentation is
not an easy task to solve. It is often rendered difficult by the
sample’s biological variability, imperfect imaging conditions
(e.g. leading to noise, blur, or other distortions), or simply
by the complicated three-dimensional shape of the objects of
interest.
Current research in bio-image segmentation focuses primar-
ily on developing new deep learning approaches, with more
classical methods currently receiving little attention. Al-
gorithms, such as StarDist (11), DenoiSeg (12), PatchPer-
Pix (13), PlantSeg (14), CellPose (15), or EmbedSeg (16)
have continuously raised the state-of-the art and outperform
classical methods in quality and accuracy of achieved auto-
mated segmentation. While these approaches are very pow-
erful indeed, deep learning does require some expert knowl-
edge, dedicated computational resources not everybody has
access to, and typically large quantities of densely annotated
ground-truth data to train on.
More classical approaches, on the other hand, can also yield
results that enable the required analysis, while often remain-
ing fast and easy to use on any laptop or workstation. Exam-
ples for such methods range from intensity thresholding and
seeded watershed, to shallow machine learning approaches
on manually chosen or designed features. One crucial prop-
erty of shallow techniques, such as random forests (17), is
that they require orders of magnitude less ground-truth train-
ing data than deep learning based methods. Hence, mul-
tiple software tools pair them with user-friendly interfaces,
e.g. CellProfiler (18), Ilastik (19), QuPath (20), and Train-
able Weka Segmentation (21). The latter specializes in ran-
dom forest classification and is available within Fiji (22), a
widely-used image analysis and processing platform based
on ImageJ (23) and ImageJ2 (24). It is, regrettably, not capa-
ble of processing very large datasets due to its excessive de-
mand for CPU memory, leaving the sizable Fiji community
with a lack of user-friendly pixel classification or segmenta-
tion tools that can operate on large multi-dimensional data.
The required foundations for such a software tool have in re-
cent years been built by the vibrant research software engi-
neering community around Fiji and ImageJ2. The problem of
handling large multi-dimensional images has been addressed
Arzt et al. | bioRχiv | October 14, 2021 | 1–10
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 15, 2021. ; https://doi.org/10.1101/2021.10.14.464362doi: bioRxiv preprint

DRAFT
by a generic and powerful library called ImgLib2 (25). Ad-
ditionally, a fast, memory-efficient, and extensible image
viewer, the BigDataViewer (26), enables tool developers to
create intuitive and fast data handling interfaces.
Here, we present an image labeling and segmentation tool
called LABKIT. It combines the power of ImgLib2 and Big-
DataViewer with a new implementation of random forest
pixel classification. LABKIT features a user-friendly inter-
face allowing for rapid scribble labeling, training, and inter-
active curation of the segmented image. LABKIT also allows
users to fully manually label pixels or voxels in the loaded
images. It can be easily installed in Fiji via its updater, and
it can directly be called from Fiji’s macro programming lan-
guage. LABKIT additionally features GPU acceleration using
CLIJ (27), and can be used on high performance computing
(HPC) clusters thanks to a command-line interface.
Image Segmentation with LABKIT
LABKITs user interface is built around the Big-
DataViewer (26), which allows interactive exploration
of image volumes of any size and dimension on consumer
computing hardware (Fig. 1A, B). Beyond the common
BigDataViewer features, users have access to a set of simple
drawing tools to manually paint or correct existing labels on
image pixels in 2D and voxels in 3D. Importantly, the raw
data is never modified by any such actions. Pixel and voxel
labels are grouped by classes in individual layers. Each class
is represented by a modifiable color, and can be used to
annotate different types of objects and structures of interest
in the image.
Thanks to the intuitive interface design, users can efficiently
segment their images by manually drawing dense labels on
the entire image (Fig. 1C). Labels that are generated with the
drawing tools can directly be saved as images or exported
to Fiji for downstream processing. Dense manual labelings
of complete images or volumes created with LABKIT can be
used to manually segment objects, as was done previously
to mask particles in cryo-electron tomograms of Chlamy-
domonas (28).
However, this process is very time consuming and doesn’t
scale well to large data. LABKIT is therefore often used
to densely and manually label a subset of the image data,
which is then used as ground-truth for supervised deep learn-
ing approaches. Published examples include the generation
of ground-truth training data for a mouse and a Platyneris
dataset in order to segment cell nuclei with EmbedSeg (16).
LABKIT is also suggested as a tool of choice for ground-truth
generation by other deep learning methods (11, 12, 29). Still,
manually generating sufficient amount of ground-truth train-
ing labels for existing deep learning methods remains a cum-
bersome and tedious task.
In order to create high quality segmentation while maintain-
ing a manageable amount of user input, a core feature of
LABKIT is a random forest based pixel classification (17)
based on Weka (30, 31), newly implemented and optimized
for speed. When using this feature, instead of annotating en-
tire objects, a random forest is trained on only a few pixel
annotations per class. Such sparse manual labels, or scrib-
bles (see Fig. 1D, left), are directly drawn by users over the
image. Naturally, the sparse labels must be drawn on cor-
rect and representative pixels from each pixel class, and are
then used to train the shallow random forest classifier. Once
trained, this classifier can then be used to generate a segmen-
tation (dense pixel classification, see Fig. 1D).
Two or more classes can be used to distinguish foreground
objects from background pixels. Fig. 2A & B showcase ex-
amples of a single foreground and background classes. If de-
sired, out of focus objects can even be discarded, for example
by making such pixels part of the background class (Fig. 2B,
arrowheads). For more complex segmentation tasks that need
to discriminate various visible structures (e.g. nucleus vs. cy-
toplasm vs. background) or cell types (as in Fig. 2C), two or
more foreground classes can be used (Fig. 2D).
As opposed to deep learning algorithms, random forests are
typically trained in a matter of seconds. Drawing scribbles
and computing the segmentation can therefore conveniently
be iterated due to the efficient parallelization we have imple-
mented, leading to live segmentation. Live results are com-
puted and displayed only on the currently visualized image
slice in BigDataViewer to increase the interactivity. Hence,
the effect of additional scribbles (sparse labels) is instantly
visible and users can stop once the automated output of the
pixel classifier reaches a similar quality to that of a fully man-
ual pixel annotation. This iterative workflow makes working
with LABKIT very efficient, even when truly large image data
are being processed. BigDataViewer’s bookmarking feature
can additionally be used to quickly jump between previously
defined image regions, thereby allowing validating the qual-
ity of the pixel classifier on multiple areas. Since we use
ImgLib’s caching infrastructure, all image blocks that have
once been computed are kept in memory and switching be-
tween bookmarks or browsing between parts of a huge vol-
ume is fast and visually pleasing. Once sufficiently trained,
the classifier can be saved for later use in interactive LABKIT
sessions or in Fiji/ImageJ macros. The entire dataset can be
directly segmented and the results saved to the disk. Re-
cently, sparse labeling combined with random forest pixel
classification in LABKIT was used to segment mice epider-
mal cells (32), as well as mRNA foci in neurons (33).
Once the image is fully segmented, the generated segmenta-
tion masks can be transferred to label layers and the drawing
tools can now be used to curate them. The goal of curation
is to resolve the remaining errors made by the trained pixel
classifier, such as drawing missing parts, filling holes, erasing
mislabeling and deleting spurious blobs (Fig. 3). Label cu-
ration is performed until the curated segmentation is deemed
satisfactory for downstream processing or analysis. LABKIT
can also be used to curate segmentation results obtained by
other methods that are not available within LABKIT, includ-
ing deep learning based methods (34).
Automated segmentation with LABKIT and the possibility
to quickly curate any automated segmentation result make
LABKIT a powerful tool that can considerably shorten the
time required to generate ground-truth data for training deep
2 | bioRχiv Arzt et al. | LABKIT: Labeling and Segmentation Toolkit
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 15, 2021. ; https://doi.org/10.1101/2021.10.14.464362doi: bioRxiv preprint

DRAFT
learning approaches. For example, we compared automatic
and manual segmentation with LABKIT on a rather small sub-
set of images (N=26, see one example in Fig. 4A) made pub-
licly available by the 218 Data Science Bowl (35). We seg-
mented all images within 5 minutes by iterative scribbling
and automated segmentation (see Fig. 4B). While many im-
ages consisted of homogeneous nuclei and led to high qual-
ity results, images with heterogeneous nuclei resulted in seg-
mentation errors (see arrows in Fig. 4B). Such errors include
spurious instances that do not correlate with any object in
the original image, instances that correspond to the fusion
of multiple instances, instances with holes, or even instances
that split in two. Such errors are obviously undesirable and
negatively impact the overall average precision score (AP =
0.72, see Methods for the metrics definition). As described
above, all such segmentation errors can easily be corrected
within LABKIT, either by adding sparse labels correspond-
ing to typical areas with errors, done during the iterative pro-
cess, or when they persist by manually curating the residual
errors in the final automated results (Fig. 4C). Curating all
26 images took an additional 10 minutes and raised the cor-
responding average precision to 0.76, a score very close to
the inter-observer distance (AP = 0.78), as shown in Fig. 4C
& D. In contrast, manually segmenting all images required
more than an hour (Fig. 4D), which is four times longer than
scribble-based pixel classification with LABKIT, followed by
full curation of the results to obtain images of comparable
quality.
Hence, whenever LABKIT automated segmentation is by
itself not sufficient, manually curating the results yields
ground-truth data that can be used to train a deep learning
method, leading to higher segmentation quality with less la-
beling effort.
Software and workflow integration
LABKITs automatic segmentation is not limited to the
dataset it was trained on. Because the trained classifier can
be saved for later use, it can be applied to new images. While
ensuring reproducibility of the results, it also helps maintain-
ing consistency in the image segmentation. Manually load-
ing both images and trained classifier in LABKIT for multi-
ple sets of images is a repetitive task ill-suited for automated
workflow. Therefore, to simplify the integration into exist-
ing workflows in Fiji, LABKIT can be easily called from the
ImageJ macro language. For instance, a simple macro script
can open multiple datasets and segment each of them using a
trained classifier.
Image segmentation can be further accelerated by running
the process on GPUs thanks to CLIJ (27). Once CLIJ prop-
erly set up, GPU acceleration is available for LABKIT in both
graphical interface and macro commands. GPU processing is
particularly beneficial in the case of large images, for which
it allows shortening the lengthy segmentation tasks. Perform-
ing GPU-accelerated segmentation in LABKIT is a matter of
activating a checkbox, and does not present additional com-
plexity to users.
Some images, however, are far too large to be processed on
a consumer machine in a reasonable amount of time, if they
can be stored at all on such a computer. For such data, mod-
ern workflows resort to the use of HPC clusters, which are
purposely built for high computing performances with large
available memory. LABKIT offers a command line tool (36)
allowing advanced users to segment images on HPC clusters.
The capability of extending LABKIT and re-using its com-
ponents is illustrated by integration with the commercial
Imaris software (Oxford Instruments, UK) via the recently
released ImgLib2-Imaris compatibility bridge. In this con-
text, LABKIT operates directly on datasets that are trans-
parently shared (without duplication) between Imaris and
ImgLib2 (25). These datasets can be arbitrarily large, as
both Imaris and ImgLib2 implement sophisticated caching
schemes. In the same fashion, output segmentation masks
are transparently shared with the running Imaris application,
making additional file import/export steps unnecessary. Im-
portantly, this functionality can also be triggered and con-
trolled directly from Imaris to integrate it into streamlined
object segmentation workflows.
Performance of LABKIT
In order to process large images on consumer computers,
software packages must be able to load the data in mem-
ory, process it and save the results, all within the constraints
of the machine. In LABKIT, this is achieved by reading
only the portions of the image that are displayed to the user,
thanks to the use of the HDF5 format (37) and the Big-
DataViewer (26). The image is further processed in chunks
using a new ImgLib2 (25) implementation of the Trainable
Weka Segmentation algorithm. As a result, LABKIT is capa-
ble of processing arbitrarily large images and is compatible
with GPU acceleration and distributed computation on HPC
clusters.
To illustrate this, we segmented a 13.4 gigapixel image
(482x935x495x60 pixels, 25 GB) on a single laptop com-
puter, with and without GPU, and with different nodes of an
HPC cluster (see Table 1). The image was extracted and
2x down-sampled from the Fluo-N3DL-TRIF dataset made
available for the Cell Tracking Challenge (34, 38, 39) bench-
mark competition. Running the segmentation on the laptop
using GPU acceleration sped up the computation by 7.5 fold,
illustrating the benefit of harnessing GPU power for process-
ing large images. While running computation on an HPC
cluster comes with overhead, increasing the number of CPU
nodes shortens the computation dramatically, reaching a 40-
fold improvement from 1 CPU node to 50. Finally, GPU
nodes on an HPC allow for more parallelization of the com-
putation and therefore even higher computational speed-up
on the segmentation task, with 10 GPU nodes processing the
data in slightly over a minute.
Furthermore, we trained and optimized a classifier on the
Fluo-N3DL-TRIF dataset (original sampling), the largest
dataset of the Cell Tracking Challenge (training dataset of
size 320 GB, evaluation dataset of size 467 GB), and submit-
ted it for evaluation against undisclosed ground-truth. The
segmentation of both training and evaluation datasets was
Arzt et al. | LABKIT: Labeling and Segmentation Toolkit bioRχiv | 3
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 15, 2021. ; https://doi.org/10.1101/2021.10.14.464362doi: bioRxiv preprint

DRAFT
performed on an HPC cluster. LABKIT pixel classification
ranked as the highest performing segmentation method on
this dataset for all three evaluation metrics (OP
CSB
, SEG
and DET ) (40). More specifically, LABKIT segmentation
obtained the following scores: OP
CSB
= 0.895 (0.886 for
the second highest scoring entry), SEG = 0.793 (0.776) and
DET = 0.997 (0.997), performing better than the other en-
tries, including classical (bandpass segmentation) or deep
learning (convolution neural network) algorithms. As op-
posed to the deep learning algorithms to which it was com-
pared, Labkit only made use of a few hundred pixels in total,
distributed throughout a small fraction of the training dataset
(7 frames). Finally, LABKITs classifier was simply trained
through the LABKIT graphical interface, illustrating its ease
of use.
Discussion and conclusion
LABKIT is a labeling tool designed to be intuitive and sim-
ple to use. It features a robust pixel classification algo-
rithm aimed at segmenting images between multiple classes
with very little annotation required. Similar to other tools
of the BigDataViewer family (26, 4143), it integrates seam-
lessly into the SciJava and Fiji ecosystem. It can be easily
installed through the Fiji updater and incorporated into es-
tablished workflows using ImageJ’s macro language.The re-
sults of LABKITs segmentation can be further analysed in
Fiji or exported to other software platforms, such as CellPro-
filer (18), QuPath (20) or Ilastik (19).
Manual labeling, in both 2D and 3D, is also made easy by
LABKIT. Other alternatives exist, among which QuPath (20)
(2D), Ilastik (19), napari (44) or Paintera (45). In particu-
lar, Paintera is specifically tailored to 3D labeling of crowded
environment, but at the cost of a steeper learning curve.
LABKIT is compatible with a wide range of image formats
since image data can be loaded directly from Fiji using Bio-
Formats (46). Nonetheless, in order to fully benefit from
LABKIT optimizations for large images, users must first con-
vert their terabyte-sized images to a file format allowing high-
speed access to arbitrary located sub-regions of the image.
This strategy is also employed by other software, with the
example of Ilastik (19). One such format is HDF5 (37), and
LABKIT uses in particular the BigDataViewer HDF5+XML
version. In Fiji, images can easily be saved in this format
using BigStitcher (42) or Multiview-Reconstruction (47, 48).
In the Cell Tracking Challenge (39, 40), LABKIT segmenta-
tion outperformed other entries on a particular dataset, among
which two deep learning approaches. These methods were
designed as part of a cell segmentation and tracking pipeline
on various images, and it is likely that recent and more spe-
cialized segmentation algorithms, such as StarDist (11) or
CellPose (15), would perform overall better. Yet, the full
potential of deep learning algorithms is only reached when
a sufficient amount of ground-truth data is available, which
is too frequently the limiting factor. Generating ground-truth
data for a deep learning method is a tedious endeavour with-
out the insurance of a perfect segmentation result. A safer
strategy is therefore to first try shallow learning for segmen-
tation tasks, before even thinking of moving to deep learn-
ing algorithms. In cases where higher segmentation quality
is truly necessary, curated results from shallow learning can
be used to generate the massive amount of ground-truth re-
quired to train a deep learning algorithm. As seen previously,
LABKIT is useful in all these scenarios since it can be used
to manually generate ground-truth annotations or to segment
the images with shallow learning before curating the results
in order to use them as ground-truth for other learning-based
algorithms (see Fig. 5).
In the future, we intend to extend LABKITs functionalities to
improve manual and automated segmentation. For instance,
we will add a magic wand tool to select, fill, fuse or delete la-
bels based on the pixel classification. Furthermore, we aim to
add new pixel classifiers,such as the deep learning algorithm
DenoiSeg (12) already available in Fiji. LABKIT source code
is open source and can be found online (49), together with its
command-line interface (36) and tutorials (50).
Methods
A. Timing instance segmentation generation. The
dataset consisted of all 256x256 images (N=26) in the test
sample of StarDist (11), originally published as part of the
2018 Data Science Bowl (35) (subset of stage1_train, acces-
sion number BBBC038, Broad Bioimage Benchmark Collec-
tion). The images were loaded in LABKIT as a stack and
sparsely labeled (scribbles). A classifier was then trained
with the default filter settings: "original image", "Gaussian
blur", "difference of Gaussians", "Gaussian gradient magni-
tude", "Laplacian of Gaussian" and "Hessian eigenvalues",
with sigmas: 1, 2, 4 and 8. The results were saved and
then manually curated using the brush and eraser tools. Fi-
nally, the same original image stack was densely manually
labeled afresh. The total time required to process all im-
ages was measured using a chronometer for i) LABKIT au-
tomated segmentation, including the sparse manual labeling,
ii) the previous step followed by a curation step and iii) dense
manual labeling. In order to evaluate the segmented images,
connected components were computed (4-connectivity) and
given unique pixel values (instance segmentation). Quality
metrics scores were calculated as the average precision with
threshold 0.5 as defined in StarDist (11). We used dense man-
ual labeling performed by another observer as reference im-
ages, and computed the metrics score for the results obtained
in i), ii) and iii). The average metrics over the images were
calculated as a weighted average of each individual image,
where the weights were the number of instances in the refer-
ence image.
B. Speed benchmark. The dataset was downloaded from
the Cell Tracking Challenge (39) website, and consisted of
the first training dataset of the Fluo-N3DL-TRIF example.
The dataset was down-sampled by a factor 2 in order to re-
duce its size and simplify the benchmarking. The dataset was
then saved in the BigDataViewer XML+HDF5 format using
BigStitcher (42). LABKIT was used to draw a few scribbles
on both background and nuclei areas, and to train a random
4 | bioRχiv Arzt et al. | LABKIT: Labeling and Segmentation Toolkit
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 15, 2021. ; https://doi.org/10.1101/2021.10.14.464362doi: bioRxiv preprint

DRAFT
C Cell tracking challenge
forest classifier using the default settings. The trained model
was then saved. The LABKIT command line tool was used to
run the benchmark experiment on a Dell XPS 15 laptop (32
MB RAM, Intel Core i7-6700HQ CPU with 8 cores, GeForce
GTX 960M GPU) and on an HPC cluster, with both CPU
(256 GB RAM, Intel Xeon CPU E5-2680 v3 with 2.5 GHz
and 24 cores) and GPU (512 GB RAM, Intel Xeon CPU
E5-2698 v4 with 2.2 GHz and 40 cores, with two GeForce
GTX 1080 GPUs) nodes. The segmentation results on the
HPC were saved in the N5 format to maximize writing speed.
Benchmarking included read/write of image data form disc,
optional data transfer to the GPU, computation of feature im-
ages and classification all together.
C. Cell tracking challenge. As in the speed benchmark
sample, all Fluo-N3DL-TRIF datasets (training and evalua-
tion) were converted to BigDataViewer XML+HDF5 format
using the BigStitcher Fiji plugin. This time, however, no
down-sampling was applied to the images. For training, only
frames 0, 1, 10, 20, 40, 50 and 59 from sequence “01” of the
training dataset were used. A few hundred pixels were an-
notated as foreground and background. Only nuclei’s central
pixels were labeled as foreground in order to force the classi-
fication algorithm to return segments of smaller size than the
actual nuclei. Thus, segmented nuclei are unlikely to touch
and segmentation errors are minimized. We used the follow-
ing filters to train the random forest classifier: "original im-
age", "Gaussian blur", "Laplacian of Gaussian", and "Hes-
sian eigenvalues", with sigma values 1, 2, 4, 8 and 16. The
filters can be set in LABKITs interface through the parame-
ters menu of the classifier. The trained classifier was saved
and the evaluation dataset was segmented using the LABKIT
command line tool on an HPC. Since the output of the pixel
classification is a binary mask, we performed a connected
component analysis to assign unique pixel values to the indi-
vidual segments. Finally, we dilated the segments to match
the size of the nuclei. The dilation was done in three steps:
the first two steps with a three-dimensional 6-neighborhood
dilation kernel, then with a 3x3x3 pixel cube kernel. The
combination of dilation kernels was chosen as to optimize the
SEG score on the training dataset. All metrics scores were
computed by the Cell Tracking Challenge platform.
Conflict of Interest Statement
The authors declare that the research was conducted in the ab-
sence of any commercial or financial relationships that could
be construed as a potential conflict of interest.
Author Contributions
F.J., M.A., T.P. and D.S. designed the project; M.A. imple-
mented the software with help from T.P., D.S. and R.H.; M.A.
and J.D. performed experiments; M.A., J.D., C.S. and F.J.
wrote the manuscript with inputs from all authors.
Funding
Funding was provided from the Max-Planck Society un-
der project code M.IF.A.MOZG8106, the core budget of the
Max-Planck Institute of Molecular Cell Biology and Genet-
ics (MPI-CBG), the Human Technopole, and the BMBF un-
der codes 031L0102 (de.NBI) and 01IS18026C (ScaDS2), as
well as by the Deutsche Forschungsgemeinschaft (DFG) un-
der code JU3110/1-1 (FiSS). R.H. acknowledges support by
the DFG under Germany’s Excellence Strategy EXC2068 -
Cluster of Excellence Physics of Life of TU Dresden.
Data Availability Statement
Publicly available datasets were analyzed in this
study. This data can be found here: https:
//bbbc.broadinstitute.org/BBBC038, ac-
cession number BBBC038, Broad Bioimage Benchmark
Collection and http://celltrackingchallenge.
net/3d-datasets/, Fluo-N3DL-TRIF dataset, Cell
Tracking Challenge.
ACKNOWLEDGEMENTS
We thank Anne Wuttke (Zerial lab, MPI-CBG), Sascha Kuhn (Nadler lab, MPI-CBG),
Maria Luisa Romero Romero (Toth-Petroczy lab, MPI-CBG), Akanksha Jain & Anas-
tasios (Tassos) Pavlopoulos (Tomancak, MPI-CBG) for sharing the experimental
data. We also want to thank the Scientific Computing Facility at MPI-CBG for giving
us access to HPC infrastructure.
Arzt et al. | LABKIT: Labeling and Segmentation Toolkit bioRχiv | 5
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 15, 2021. ; https://doi.org/10.1101/2021.10.14.464362doi: bioRxiv preprint

Figures
Citations
More filters
Posted ContentDOI

Active mesh and neural network pipeline for cell aggregate segmentation

TL;DR: In this article , an image analysis pipeline for cell aggregates that combines deep learning with active contour segmentation is presented. But the pipeline is not suitable for 3D segmentation of cells.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

NIH Image to ImageJ: 25 years of image analysis

TL;DR: The origins, challenges and solutions of NIH Image and ImageJ software are discussed, and how their history can serve to advise and inform other software projects.
Journal ArticleDOI

Fiji: an open-source platform for biological-image analysis

TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal ArticleDOI

Ror2 signaling regulates Golgi structure and transport through IFT20 for tumor invasiveness

TL;DR: It is found that intraflagellar transport 20 mediates the ability of Ror2 signaling to induce the invasiveness of tumors that lack primary cilia, and IFT20 regulates the nucleation of Golgi-derived microtubules by affecting the GM130-AKAP450 complex.
Related Papers (5)
Frequently Asked Questions (18)
Q1. What are the contributions mentioned in the paper "Labkit: labeling and segmentation toolkit for big image data" ?

The authors present LABKIT, a user-friendly Fiji plugin for the segmentation of microscopy image data. Optionally the authors harness the power of graphics processing units ( GPU ) to gain additional runtime performance. The ability to use pixel classifiers trained in LABKIT via the ImageJ macro language enables their users to integrate this functionality as a processing step in automated image processing workflows. Furthermore, memory efficient and fast random forest based pixel classification inspired by the Waikato Environment for Knowledge Analysis ( Weka ) is implemented. 

In the future, the authors intend to extend LABKIT ’ s functionalities to improve manual and automated segmentation. For instance, the authors will add a magic wand tool to select, fill, fuse or delete labels based on the pixel classification. LABKIT source code is open source and can be found online ( 49 ), together with its command-line interface ( 36 ) and tutorials ( 50 ). 

Once the image is fully segmented, the generated segmentation masks can be transferred to label layers and the drawing tools can now be used to curate them. 

LABKIT can also be used to curate segmentation results obtained by other methods that are not available within LABKIT, including deep learning based methods (34). 

sparse labeling combined with random forest pixel classification in LABKIT was used to segment mice epidermal cells (32), as well as mRNA foci in neurons (33). 

In cases where higher segmentation quality is truly necessary, curated results from shallow learning can be used to generate the massive amount of ground-truth required to train a deep learning algorithm. 

While running computation on an HPC cluster comes with overhead, increasing the number of CPU nodes shortens the computation dramatically, reaching a 40- fold improvement from 1 CPU node to 50. 

In recent years, new and powerful microscopy and sample preparation techniques have emerged, such as lightsheet (1), super-resolution microscopy (2–6), modern tissue clearing (7, 8), or serial section scanning electron microscopy (9, 10) enabling researchers to observe biological tissues and their underlying cellular and molecular composition and dynamics in unprecedented details. 

GPU nodes on an HPC allow for more parallelization of the computation and therefore even higher computational speed-up on the segmentation task, with 10 GPU nodes processing the data in slightly over a minute. 

LABKIT features a user-friendly interface allowing for rapid scribble labeling, training, and interactive curation of the segmented image. 

The required foundations for such a software tool have in recent years been built by the vibrant research software engineering community around Fiji and ImageJ2. 

It is, regrettably, not capable of processing very large datasets due to its excessive demand for CPU memory, leaving the sizable Fiji community with a lack of user-friendly pixel classification or segmentation tools that can operate on large multi-dimensional data. 

Examples for such methods range from intensity thresholding and seeded watershed, to shallow machine learning approaches on manually chosen or designed features. 

For more complex segmentation tasks that need to discriminate various visible structures (e.g. nucleus vs. cytoplasm vs. background) or cell types (as in Fig. 2C), two or more foreground classes can be used (Fig. 2D). 

to simplify the integration into existing workflows in Fiji, LABKIT can be easily called from the ImageJ macro language. 

LABKIT pixel classification ranked as the highest performing segmentation method on this dataset for all three evaluation metrics (OPCSB , SEG and DET ) (40). 

In contrast, manually segmenting all images required more than an hour (Fig. 4D), which is four times longer than scribble-based pixel classification with LABKIT, followed by full curation of the results to obtain images of comparable quality. 

Generating ground-truth data for a deep learning method is a tedious endeavour without the insurance of a perfect segmentation result.