What future works have the authors mentioned in the paper "Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features" ?

Future work will investigate this. The authors will also focus on learning boundaries using higherorder cliques, exploring the use of other features, and applying their technique to additional types of data.

(Open Access) Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks With Learned Shape Features (2012) | Aurelien Lucchi

Q: What have the authors contributed in "Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features" ?

In this work, the authors propose an automated graph partitioning scheme that addresses these issues. Their experiments demonstrate that their approach is able to segment mitochondria at a performance level close to that of a human annotator, and outperforms a state-of-the-art 3D segmentation technique.

IEEE TRANSACTIONS ON MEDICAL IMAGING, REVISED SUBMISSION, SEPT 2011 1

Supervoxel-Based Segmentation of Mitochondria in

EM Image Stacks with Learned Shape Features

Aur

elien Lucchi, Kevin Smith, Radhakrishna Achanta, Graham Knott, and Pascal Fua

Abstract—It is becoming increasingly clear that mitochondria

play an important role in neural function. Recent studies show

mitochondrial morphology to be crucial to cellular physiology

and synaptic function and a link between mitochondrial defects

and neuro-degenerative diseases is strongly suspected. EM mi-

croscopy, with its very high resolution in all three directions, is

one of the key tools to look more closely into these issues but

the huge amounts of data it produces make automated analysis

necessary.

State-of-the-art computer vision algorithms designed to oper-

ate on natural 2D images tend to perform poorly when applied

to EM data for a number of reasons. First, the sheer size of a

typical EM volume renders most modern segmentation schemes

intractable. Furthermore, most approaches ignore important

shape cues, relying only on local statistics that easily become

confused when confronted with noise and textures inherent in

the data. Finally, the conventional assumption that strong image

gradients always correspond to object boundaries is violated by

the clutter of distracting membranes.

In this work, we propose an automated graph partitioning

scheme that addresses these issues. It reduces the computational

complexity by operating on supervoxels instead of voxels, incor-

porates shape features capable of describing the 3D shape of the

target objects, and learns to recognize the distinctive appearance

of true boundaries.

Our experiments demonstrate that our approach is able to

segment mitochondria at a performance level close to that

of a human annotator, and outperforms a state-of-the-art 3D

segmentation technique.

Index Terms—Electron microscopy, segmentation, supervoxels,

mitochondria, shape features.

I. INTRODUCTION

N addition to providing energy to the cell, mitochondria

play an important role in many essential cellular func-

tions including signaling, differentiation, growth and death.

An increasing body of research suggests that regulation of

mitochondrial shape is crucial for cellular physiology [10].

Furthermore, localization and morphology of mitochondria

have been tightly linked to neural functionality. For example,

pre- and post- synaptic presence of mitochondria is known to

have an important role in synaptic function [34].

Mounting evidence also indicates that there is a close link

between mitochondrial function and many neuro-degenerative

A. Lucchi and K. Smith contributed equally to this work.

A. Lucchi, R. Achanta, K. Smith, and P. Fua are in the Computer,

Communication, and Information Sciences Department; G. Knott is with the

Interdisciplinary Center for Electron Microscopy, EPFL, Lausanne CH-1015

Switzerland. E-mail: ﬁrstname.lastname@epﬂ.ch.

Manuscript received December 23, 2010. Revised May 24, 2011.

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

diseases. Mutations in genes that control fusion and divi-

sion events have been found to cause neurodegenerative pro-

cesses [26]. For example, mutations of the gene coding for

a protein kinase called PINK1, which is known to regulate

mitochondrial division, have been linked to a type of early-

onset Parkinson’s disease [46].

Unfortunately, because mitochondria range from less than

0.5 to 10 µm in diameter [9], optical microscopy does not

provide sufﬁcient resolution to reveal ﬁne structures that are

critical to unlocking new insights into brain function. Recent

Electron Microscopy (EM) advances, however, have made it

possible to acquire much higher resolution images, and have

already provided new insights into mitochondrial structure and

function [39]. The data used in this work were acquired by a

focused ion beam scanning electron microscope (FIB-SEM,

Zeiss NVision40), which uses a focused beam of gallium

ions to mill the surface of a sample and an electron beam

to image the milled face [27]. The milling process removes

approximately 5nm of the surface, while the scanning beam

produces images with a pixel size of 5 × 5nm. Repeated

milling and imaging yielded nearly isotropic image stacks

containing billions of voxels, such as the ones appearing in

Figure 1.

Analyzing such an image stack by hand could require

months of tedious manual labor [40] and, without reliable

automated image-segmentation tools, much of this high quality

data would go unused. This situation arises in part from

the fact that most state-of-the-art EM segmentation algo-

rithms [25], [42] were designed for highly anisotropic EM

modalities, such as Transmission Electron Microscopy (TEM).

Such data tends to have a greatly reduced resolution in

the z-direction, and associated segmentation algorithms often

process slices individually to deal with the missing data.

Our approach processes large 3D volumes in a single step,

which is advantageous for isotropic FIB-SEM stacks. More

generic Computer Vision algorithms that perform well on

natural image benchmarking data sets such as the Pascal VOC

(Visual Object Classes) data set [13] perform poorly on EM

data, whether it is isotropic or not. There are several reasons

for this. The amount of data in a typical EM stack is a

major bottleneck, rendering these approaches intractable both

in terms of memory and computation time. Furthermore, these

approaches rarely account for important shape cues and often

rely only on local statistics which can easily become confused

when confronted with the noise and textures found in EM

data. Finally, the conventional assumption that strong image

gradients always correspond to signiﬁcant boundaries does not

hold, as illustrated in Figure 1.

To overcome these limitations, we advocate a graph parti-

IEEE TRANSACTIONS ON MEDICAL IMAGING, REVISED SUBMISSION, SEPT 2011 2

CA1 Hippocampus Striatum

5 × 5 × 5 µm sample size 9 × 5 × 2.5 µm sample size

1024 × 1024 × 1000 voxels 1536 × 872 × 318 voxels

(5 × 5 × 5

voxel

) (6 × 6 × 7.8

voxel

)

testing

training

testing

Fig. 1. FIB-SEM data sets. The top row contains 3D image stacks acquired

using FIB-SEM microscopy. Details in the bottom row are taken from the

blue boxes overlaid on the stacks. Mitochondria, which we wish to segment,

are indicated by black arrows. The high resolution allows neuroscientists to

see important details but poses unique challenges. FIB-SEM image stack

dimensions are orders of magnitude larger than conventional images, which

limits the usefulness of many state-of-the-art segmentation algorithms, as

discussed in Sec. IV-D1. Further complicating the problem are the presence of

numerous objects with distracting shapes and textures, including vesicles and

various membranes. Finally, we can not rely on strong contrasts to indicate

object boundaries. Note that the Striatum data is split into training and testing

sections, denoted by a dashed line. A separate training stack is used for the

CA1 Hippocampus (not shown).

tioning approach that combines the following components.

• Operating on supervoxels instead of voxels. We cluster

groups of similar voxels into regularly spaced supervoxels

of nearly uniform size, which are used to compute

robust local statistics. This reduces the computational and

memory costs by several orders of magnitude without

sacriﬁcing accuracy because supervoxels naturally respect

image boundaries.

• Including global shape cues. The supervoxels are con-

nected to their neighbors by edges and form a graph.

Most graph segmentation techniques rely only on local

statistics to partition the graph, ignoring important shape

information. We introduce features that capture non-local

shape properties and use them to evaluate how likely a

supervoxel is to be part of the target structure.

• Learning boundary appearance. EM data is notori-

ously complex, violating the standard assumption that

strong image gradients always correspond to signiﬁcant

boundaries. Spatial and textural cues must be considered

when determining where true object boundaries lay. We

therefore train a classiﬁer to recognize which pairs of su-

pervoxels are most likely to straddle a relevant boundary.

This prediction determines which edges of the supervoxel

graph should most likely be cut during segmentation.

We demonstrate our approach for the purpose of segmenting

mitochondria in two large FIB-SEM image stacks taken from

the CA1 hippocampus and the striatum regions of the brain.

We show that our approach performs close to the level of a

human annotator and is much more accurate than a state-of-

the-art 3D segmentation approach [52].

II. RELATED WORK

In this section, we begin by examining previous attempts

to segment mitochondria. We then broaden our discussion to

include the use of machine learning techniques for other tasks

in EM imagery. Finally, we discuss methods that rely on a

graph partitioning approach to segmentation.

A. Mitochondria Segmentation

As discussed in the introduction, understanding the pro-

cesses that regulate mitochondrial shape and function is

important. Perhaps due to the difﬁculty in acquiring the

data, relatively few researchers have attempted to quantify

important mitochondria properties in recent years. In [59],

a Gentle-Boost classiﬁer is trained to detect mitochondria

based on textural features. In [43], texton-based mitochondria

classiﬁcation of melanoma cells is performed using a variety

of classiﬁers including k-NN, SVM, and Adaboost. While

these techniques achieve reasonable results, they consider

only textural cues while ignoring shape information. A recent

approach, described in in [52], using state-of-the-art features

and a Random Forest learning approach for segmentation has

been successfully applied to 3D EM data in [32]. We compare

our approach to [52] in Section IV.

In [44], shape-driven watersnakes that exploit prior knowl-

edge about the shape of membranes are used to segment

mitochondria from the liver. However, this approach is adapted

to anisotropic TEM data. Recently, new features have been

introduced to segment mitochondria in neural EM imagery.

Ray features, ﬁrst introduced in [51], were applied to 2D

mitochondria segmentation in [36]. Inspired by Ray features,

Radon-like features were proposed in [33], but have shown to

perform signiﬁcantly worse than Ray features in [55].

B. Machine Learning in EM Imagery

Besides mitochondria segmentation, machine learning tech-

niques have found their way into other tasks in EM imagery

including membrane detection and dendrite reconstruction. We

refer the reader to [23] for an excellent survey covering some

of these applications. EM data poses unique challenges for

machine learning algorithms. In addition to the large number

of voxels involved, a variety of sub-cellular structures exist

including mitochondria, vesicles, synapses, and membranes.

As seen in Fig. 1, these structures can be easily confused when

IEEE TRANSACTIONS ON MEDICAL IMAGING, REVISED SUBMISSION, SEPT 2011 3

Fig. 2. Segmenting an image stack into supervoxels. (left) A cropped FIB-

SEM image stack containing a mitochondrion. (right) The cropped stack is

segmented using the SLIC algorithm into groups of similar voxels called

supervoxels. For visualization, supervoxels in the center of the image stack

have been removed, leaving supervoxels belonging to the mitochondrion

interior and on the caps of volume. Boundaries between supervoxels are

marked in black. Notice that voxels with similar intensities are grouped while

respecting natural boundaries.

only local image statistics are considered, especially given the

often low signal-to-noise ratio of the data. This is one of the

reasons why algorithms that perform well on natural images

are far less successful on EM data.

While a large body of research is dedicated to segmenting

axons and dendrites from EM data, only a small faction uses

a machine learning approach. In [22], a Convolutional Net-

work (CN) performs neuronal segmentation by binary image

restoration. This work is extended in [21] by incorporating

topological constraints. In [54], CNs are used to predict an

afﬁnity graph that expresses which pixels should be grouped

together using the Rand index [49], a quantitative measure of

segmentation performance. In another recent approach [25], a

random forest classiﬁer is used in a cost function that enforces

gap-completion constraints to segment TEM slices.

Machine learning techniques have also been applied to de-

tect membranes, a common preprocessing step in registration

and axon/dendrite reconstruction. In [24], Neural Networks

relying on feature vectors composed of intensities sampled

over stencil neighborhoods are trained to recognize membranes

in TEM image stacks. In [58], an Adaboost classiﬁer is trained

to detect cell membranes based on eigenvalues and Hessian

features. A hierarchical random forest classiﬁcation scheme is

used to detect boundaries and segment EM stacks in [5].

C. Segmentation by Graph-Partitioning

While active contours and level sets have been successfully

applied to many medical imaging problems [12], they suffer

from two important limitations: each object requires individual

initialization and each contour requires a shape prior that may

not generalize well to variations in the target objects. EM

image stacks contain hundreds of mitochondria, which vary

greatly in size and shape. Proper initialization and deﬁnition

of a shape prior for so many objects is problematic.

In recent years, graph partitioning approaches to segmen-

tation have become popular. They produce state-of-the-art

segmentations for 2D natural images [50], [14], generalize

well, and unlike level sets and active contours, their com-

plexity is not affected by the number of target objects. In

2010, the top two competitors [11], [16] in the VOC seg-

mentation challenge [13] relied on such techniques. Graph

Algorithm 1 SLIC Supervoxels

/∗ Initialization ∗/

Initialize cluster centers C

= [I

, u

, v

, z

]

by sam-

pling voxels at regular grid steps S.

Move cluster centers to the lowest gradient position in a

3 × 3 × 3 neighborhood.

Set label l(i) = −1 for each voxel i.

Set distance d(i) = ∞ for each voxel i.

repeat

/∗ Assignment ∗/

for each cluster center C

for each voxel i in a 2S × 2S × 2S neighborhood

surrounding C

Compute distance δ

between C

and voxel i.

if δ

< d(i) then

set d(i) = δ

set l(i) = k

end if

end for

/∗ Update ∗/

Compute new cluster centers.

Compute residual error E.

until E ≤ threshold

/∗ Post-processing ∗/

Enforce connectivity.

partitioning approaches minimize a global objective function

deﬁned over an undirected graph whose nodes correspond

to pixels, voxels, superpixels, or supervoxels; and whose

edges connect these nodes [6], [8], [2]. The energy function

is typically composed of two terms: the unary term which

draws evidence from a given node, and the pairwise term

which enforces smoothness between neighboring nodes. Some

works introduce supplementary terms to the energy function,

including a term favoring cuts that maximize the object’s

surface gradient ﬂux [28]. This alleviates the tendency to

pinch off long or convoluted shapes, which is important when

tracking elongated processes [42]. However, as noted in [25],

it cannot entirely compensate for weakly detected membranes

and further terms may have to be added.

A shortcoming of standard graph partitioning methods, as

we will discuss in Section III-C, is that most do not consider

the shape of the segmented objects.

III. METHOD

The ﬁrst step of our approach is to over-segment the image

stack into supervoxels, small clusters of voxels with similar

intensities. All subsequent steps operate on supervoxels instead

of individual voxels, speeding up the algorithm by several

orders of magnitude. This step is described in Section III-A.

Next, a feature vector containing shape and intensity in-

formation is extracted for each supervoxel, as described in

Section III-B. The ﬁnal segmentation is produced by feeding

IEEE TRANSACTIONS ON MEDICAL IMAGING, REVISED SUBMISSION, SEPT 2011 4

the extracted feature vectors to classiﬁers that deﬁne the unary

and pairwise potentials of a graph cut segmentation step

described in Section III-C. The learning procedure and a list

of parameters are provided in Section IV.

A. Supervoxel Over-segmentation

Many popular graph-based segmentation approaches such

as graph cuts [6] become exponentially more complex as

nodes are added to the graph. In practice, this limits the

amount of data that can be processed. EM stacks can contain

billions of voxels, making such methods intractable both in

terms of memory and computation time. Even for moderately-

sized stacks, standard minimization techniques [29], [60],

[31] become intractable. By replacing the voxel-grid with a

graph deﬁned over supervoxels, we reduce the complexity by

several orders of magnitude while sacriﬁcing little in terms of

segmentation accuracy.

To efﬁciently generate high-quality supervoxels, we extend

our earlier superpixel algorithm, simple linear iterative clus-

tering (SLIC) [48], to produce 3D supervoxels such as those

depicted in Fig. 2. The approach used in SLIC is closely

related to k-means clustering, with two important distinctions.

First, the number of distance calculations in the optimization

is dramatically reduced by limiting the search space to a

region proportional to the supervoxel size. Second, a novel

distance measure combines intensity and spatial proximity,

while simultaneously providing control over the size and

compactness of the supervoxels.

The supervoxel clustering procedure is summarized in the

table marked Algorithm 1. Initial cluster centers are chosen

by sampling the image stack at regular intervals of length S

in all three dimensions. The number of supervoxels k and the

number of voxels in the volume N determines the length,

S =

N/k. Next, the centers are moved to the nearest

gradient local minimum. The algorithm then assigns each

voxel to the nearest cluster center, recomputes the centers, and

iterates. After n iterations, the ﬁnal cluster members deﬁne the

supervoxels.

SLIC is many times faster than standard k-means cluster-

ing thanks to a distance function measuring the spatial and

intensity similarities of voxels within a limited 2S × 2S × 2S

region

− I

)

− u

)

+ (v

− v

)

+ (z

− z

)

(1)

where I is image intensity; u

, v

, and z

are the spatial

coordinates of voxel i; u

, v

, and z

are those of cluster

center k. Normalizing the spatial proximity and intensity terms

by S and m

allows the distance measure to combine these

quantities which have very different ranges. Simply applying

a Euclidean distance without normalization would result in

clustering biased towards spatial proximity. Supervoxel com-

pactness is regulated by m. As seen in Figure 3, higher m

S and m are the average expected spatial and intensity distances within

a supervoxel, respectively. m can be adjusted to control compactness.

values produce more compact supervoxels while lower m

values produce less compact ones that more tightly ﬁt the

image boundaries.

To ensure that the total number of distance calculations

remains constant in N, irrespective of k, the distance calcu-

lations are limited to a 2S × 2S × 2S volume around the

cluster centers. This makes the complexity O(N), whereas a

conventional k-means implementation would be of complexity

of O(kN) where N is the number of voxels.

A post-processing step enforces connectivity because the

clustering procedure does not guarantee that supervoxels will

be fully connected. Orphan voxels are assigned to the most

similar nearby supervoxels using a ﬂood-ﬁll algorithm. We

refer the interested reader to [4] for further details.

We found SLIC to be particularly well adapted to EM

segmentation as it delivers high quality supervoxels efﬁciently,

provides size and compactness control, and can operate on

large volumes. Besides SLIC, only a few algorithms are

designed to generate supervoxels. In [57], supervoxels are

obtained by stitching together overlapping patches followed

by optimizing an energy function using a graph cuts approach.

However, this approach performs worse than SLIC in terms of

segmentation quality using standard measures [4], consumes

too much memory, and it is 20 times slower with a worst

case complexity is O(N

). A second alternative, used in [5],

applies the watershed algorithm [57] to generate supervoxels.

However, the size and quality of the watershed supervoxels

are unreliable. Finally, other popular superpixel methods could

potentially be extended to 3D, including Quickshift [35],

Turbopixels [56], and the method of [14]. However, these

methods all produce lower quality segmentations than SLIC

in 2D [4], and are orders of magnitude slower: 13, 164 and

5 times slower, respectively. They also require much more

memory. These comparisons are documented in [4].

B. Feature Vector Extraction

After extracting supervoxels, the next step of the algorithm

is to extract feature vectors that capture local shape and texture

information. For each supervoxel i, we extract a feature vector

combining Ray descriptors and intensity histograms, written

= [f

Ray

, f

Hist

]

, (2)

where f

Ray

represents a Ray descriptor and f

Hist

represents an

intensity histogram. For simplicity, we omit the i subscript in

the remainder of the section.

1) Ray Descriptors: Rays are a class of image features

introduced in [51] that capture non-local shape information

around a given point. We extend Ray features to 3D in this

work, and propose a method for bundling a set of Ray fea-

tures into a rotationally invariant descriptor. Ray features are

attractive because they provide a description of the local shape

relative to a given location. This formulation ﬁts naturally into

a graph partitioning framework because Rays can provide a

description of the local shape for locations corresponding to

every node in the graph. Descriptors commonly used for shape

retrieval that rely on skeletonization or contours, including

IEEE TRANSACTIONS ON MEDICAL IMAGING, REVISED SUBMISSION, SEPT 2011 5

m = 20 m = 40 m = 60 m = 80

S = 10

S=10 S=20 S=30

original image

Typical supervoxel size

S = 20

0 500 1000 1500 2000

supervoxel size

compactness, m

S = 30

Fig. 3. Supervoxel size and compactness as a function of parameters m and S of Eq. 1. (top left) A cropped EM slice containing three mitochondria.

(middle left) Typical supervoxels sizes for S = 10, S = 20 , and S = 30. (bottom left) Standard deviation of supervoxel size as a function of varying m.

(right) A matrix of supervoxel segmentations showing the effect of varying m and S. Increasing m produces more compact, regular supervoxels. Increasing

S increases supervoxel size. Note that supervoxels are three-dimensional, yet the images above show only a two-dimensional slice of each supervoxel.

(I, θ γ )

r =

l l

Fig. 4. Ray feature function r(I, c

, θ

, γ

). All components of the Ray

descriptor depend on this basic function. For a given location c

, it returns

the location of the closest boundary point r in direction l deﬁned by angles

(θ

, γ

). d

is the corresponding distance from c

to the boundary.

distance sets [18] and Lipschitz embeddings [19], do not have

this property.

A Ray feature is computed by casting an imaginary ray in

an arbitrary direction (θ

, γ

) from a point c, and measuring

an image property at a distant point

r = r(I, c

, θ

, γ

) (3)

where the ray encounters an edge (depicted in Figure 4). In our

implementation, edges are found by applying a 3D extension

of the Canny edge detection algorithm [20].

For supervoxel i, we construct a Ray descriptor by con-

catenating a set of 3L Ray features emanating from the

supervoxel center c

, where L is a ﬁxed set of orientations.

The L orientations are uniformly spaced over a geodesic

sphere, as depicted in Figure 5, and deﬁned by polar angles

Θ = {θ

, . . . , θ

} and Γ = {γ

, . . . , γ

}. The Ray descriptor

for supervoxel i in an image stack I at orientation (θ

, γ

) is

written

Ray

(I, c

, θ

, γ

) = [f

ndist

, f

norm

, f

ori

]

, (4)

where individual Ray features are given by

ndist

(I, c

, θ

, γ

) =

kr(I, c

, θ

, γ

) − c

norm

(I, c

, θ

, γ

) = k∇I(r(I, c

, θ

, γ

))k , (5)

ori

(I, c

, θ

, γ

) =

∇I(r(I, c

, θ

, γ

))

k∇I(r(I, c

, θ

, γ

))k

r − c

kr − c

and ∇I is the gradient of the image stack.

In other words, each descriptor f

Ray

contains three Ray

features that measure image characteristics at the nearest edge

point r given by Eq. 3. The features in Eq. 5 are

• f

ndist

, the most basic feature, simply encodes the distance

from c

to the closest edge d

= kr(I, c

, θ

, γ

) − c

k. It

is made scale-invariant by normalizing by D, the mean

distance over all L directions,

• f

norm

, the gradient norm at r,

• f

ori

, the orientation of the gradient at r computed as the

dot product of the unit Ray vector and a unit vector in

the direction of the local gradient at r.

The ﬁnal step is to align the descriptor to a canonical

orientation, making it rotation invariant. It is important that

the descriptor is the same no matter the orientation of the

mitochondria, otherwise the learning step would have difﬁculty

ﬁnding a good decision boundary. In Fig. 5(a), two perpendic-

ular axes n

and n

deﬁne a canonical frame of reference for

the descriptor. These axes are assigned speciﬁc locations in the

feature vector shown in Fig. 5(b), and all other elements are

ordered according to their angular offsets from n

and n

. To

achieve rotational invariance, we re-order the descriptor such

that n

and n

align with an orientation estimate.

To obtain an orientation estimate, Principle Component

Analysis (PCA) is applied to the set of Ray terminal points,

yielding two orthogonal vectors e

and e

in the directions of

maximal variance of the local shape. Because e

and e

Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks With Learned Shape Features

Figures

Citations

SLIC Superpixels Compared to State-of-the-Art Superpixel Methods

The PASCAL Visual Object Classes Challenge

Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

Superpixels: An evaluation of the state-of-the-art

Convolutional networks can learn to generate affinity graphs for image segmentation

References

Normalized cuts and image segmentation

Normalized cuts and image segmentation

Objective Criteria for the Evaluation of Clustering Methods

Efficient Graph-Based Image Segmentation

"GrabCut": interactive foreground extraction using iterated graph cuts

Related Papers (5)

SLIC Superpixels Compared to State-of-the-Art Superpixel Methods

U-Net: Convolutional Networks for Biomedical Image Segmentation

Normalized cuts and image segmentation

Efficient Graph-Based Image Segmentation

Fast approximate energy minimization via graph cuts

Frequently Asked Questions (2)

Q1. What have the authors contributed in "Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features" ?

Q2. What future works have the authors mentioned in the paper "Supervoxel-based segmentation of mitochondria in em image stacks with learned shape features" ?