Journal Article•DOI•

Inter-foetus Membrane Segmentation for TTTS Using Adversarial Networks

Alessandro Casella¹, Alessandro Casella², Sara Moccia¹, Sara Moccia³, Emanuele Frontoni³, Dario Paladini⁴, Elena De Momi², Leonardo S. Mattos¹ - Show less +4 more•Institutions (4)

Istituto Italiano di Tecnologia¹, Polytechnic University of Milan², Marche Polytechnic University³, Istituto Giannina Gaslini⁴

01 Feb 2020-Annals of Biomedical Engineering (Springer US)-Vol. 48, Iss: 2, pp 848-859

TL;DR: An adversarial network consisting of two Fully-Convolutional Neural Networks that could be a valuable and robust solution to assist surgeons in providing membrane identification while performing fetoscopy.

read less

Abstract: Twin-to-Twin Transfusion Syndrome is commonly treated with minimally invasive laser surgery in fetoscopy. The inter-foetal membrane is used as a reference to find abnormal anastomoses. Membrane identification is a challenging task due to small field of view of the camera, presence of amniotic liquid, foetus movement, illumination changes and noise. This paper aims at providing automatic and fast membrane segmentation in fetoscopic images. We implemented an adversarial network consisting of two Fully-Convolutional Neural Networks. The former (the segmentor) is a segmentation network inspired by U-Net and integrated with residual blocks, whereas the latter acts as critic and is made only of the encoding path of the segmentor. A dataset of 900 images acquired in 6 surgical cases was collected and labelled to validate the proposed approach. The adversarial networks achieved a median Dice similarity coefficient of 91.91% with Inter-Quartile Range (IQR) of 4.63%, overcoming approaches based on U-Net (82.98%-IQR: 14.41%) and U-Net with residual blocks (86.13%-IQR: 13.63%). Results proved that the proposed architecture could be a valuable and robust solution to assist surgeons in providing membrane identification while performing fetoscopic surgery.

...read moreread less

Summary (3 min read)

Jump to: [1 Introduction] – [1.1 Related work on intraoperative tissue segmentation] – [2.1 SAN architecture] – [2.2 Training strategy] – [3.1 Dataset] – [3.2 Training setting and Ablation Study] – [3.3 Performance metrics] – [4 Results] – [5 Discussion] and [5.1 Conclusion]

1 Introduction

Twin-to-Twin Transfusion Syndrome (TTTS) is a pathology with deadly consequences that occurs in the 15% of monochorionic pregnancies (75% of twin homozygous pregnancies) 3 .
Fetoscopic Minimally Invasive Surgery (MIS) has largely decreased maternal and foetal morbidity or mortality 35 , becoming the recommended technique for the first-line treatment of TTTS.
The surgery consists of a direct interruption of anastomoses that are responsible for TTTS via laser photo-coagulation.
The selection of the vessels to be treated relies on the location of abnormal vascular formations, at the small branches of normal blood vessels.
Additional challenges include large variability in the illumination level, which ranges from intense illumination (causing specular reflections) to dim lighting conditions.

2.1 SAN architecture

Similarly to the original generative framework, the Segmentation Adversarial Network (SAN) implemented in this work consists of two networks, where the generator (which here acts as segmentation network (S )) and the discriminator (here, the critic network (C )) are alternately trained to minimise and maximise an objective function, respectively.
Figure 2 shows the overall diagram of the framework.
Considering improvements in network training speed and performances reported in the literature 40 , Leaky ReLU is chosen over the standard one.
The last step of the encoding path is made of a convolution layer with ReLU activation.
The architecture of the C network (Table 2 ) contains the same encoding path of the segmentor network for feature extractions.

2.2 Training strategy

In the SAN framework there are two loss functions, one for the segmentor S and one for the critic C network.
The computation of the L L1 loss term is based on high-level features differences between the predicted and the true segmentation extracted from the critic network.
L L1 loss function force the segmentor to learn both global and local features that capture long-and short-range spatial relationships between pixels.

3.1 Dataset

In order to train the proposed framework, the authors built a new dataset in collaboration with the Department of Foetal and Perinatal Medicine, Istituto Giannina Gaslini, Genoa .
The dataset consisted of 900 frames (frame size: 720 x 576 pixels) extracted from 6 videos (150 frames per video) of patients acquired during the normal surgical practice.
The authors randomly assembled a dataset acquired from patients who received TTTS laser treatments at the same hospital.
The followed procedures were in accordance with the image data collection and retrospec-.
The black borders surrounding the FoV do not bring any additional information to segment the membrane but increase the GPU-memory and computational-cost requirements during training.

3.2 Training setting and Ablation Study

To limit memory requirements in the training phase, still promoting the convergence of the gradient, SAN was trained with mini-batches (batch size = 30 frames) minimising L SAN (Eq. 5) with Adam 19 .
To initialise the weights, the segmentor was prior trained without the critic in the first 25 epochs.
The best model was selected as the one that minimised the DSC on the validation set.
The framework was originally proposed for skin lesion segmentation for ISBI International Skin Imaging Collaboration 2017 41 .
To evaluate inter-annotator variability the authors asked a second expert to annotate the fetoscopic video used as test set.

3.3 Performance metrics

The Lilliefors test was used to assess population normality on DSC.
The Kruskal-Wallis on DSC and Westenberg-Mood test on IQR, both imposing a significance level (p) equal to 0.05, were used to assess whether or not remarkable differences existed between the tested architectures.

4 Results

The processing time of images in the test set was less than a millisecond, on average.
This performance confirms the compatibility with real-time applications of this approach.
In (i) all the networks achieved good results despite the presence of spots and specularities; (ii) all networks achieved good results despite the fact that the U-Net and the residual architecture produced some spots in the lower area where the texture could suggest the presence of the membrane.
This suggests that the action of critic network provides the ability to the segmentor network to enhance the processing of poor quality images (e.g., with laser pointer, light specularities, drop of light intensity, etc.).

5 Discussion

During TTTS surgery, the identification of the inter-foetal membrane helps the surgeon to remain oriented in the surgical site.
The complexity of the placental environment, especially in advanced pregnancies, makes this task very challenging also for expert clinicians when performing surgery.
The authors also compare this framework with state-of-the-art FCNNs for medical-image segmentation.
For this reason, some kind of images (e.g., with a small portion of the membrane) are less numerous than others, limiting the network learning capability.
Further improvements will deal with the exploitation of temporal features, as suggested in 43, 6 , considering that the temporal information is naturally encoded in the surgical videos.

5.1 Conclusion

The authors proposed an adversarial framework for accurate and fast inter-foetal mem- Sample segmentation results on the test set using (second column) U-Net, (third column) U-Net with the residual implementation and (last column) the proposed SAN along the manual expert clinician ground-truth (first column).
Each network was trained both with grey-scale and RGB fetoscopic images.
The green, grey and blue contours refers to the ground-truth, grey scale-based and RGB-based segmentation results, respectively.

Did you find this useful? Give us your feedback

Figures (7)

Figure 3: Results comparison using different depth of the segmentor Network for the RGB dataset. Blue and black asterisks highlight significant differences between the different architectures in terms of median DSC (Kruskal-Wallis) and inter-quartile range (WestenbergMood) (∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001), respectively.

Figure 4: Dice similarity coefficient (DSC) obtained testing the state-of-the-art architectures (U-Net and residual architecture) and the SAN architecture. In the last boxplot, annotation performed by a second expert clinician is compared to consider inter-annotator variability (Inter-ann.). Performance metrics were calculated feeding the networks with the grey-scale dataset (in grey) and the RGB dataset (in blue). Asterisks indicate statistical difference in median DSC with Kruskal-Wallis test (∗p < 0.05, ∗ ∗ p < 0.01).

Table 2: Specifications of the proposed Critic network architecture. Kernel size and stride (kernel height x kernel width), as well as output dimensions (height (H ) x width (W ) x N. Channels) of each layer, are shown. The final output is a segmentation mask with the same dimension of the input.

Table 1: Specifications of the segmentor network (S) architecture. Kernel size and stride (kernel height x kernel width), as well as output dimensions (height (H ) x width (W ) x N. Channels) of each layer, are shown. The final output is a segmentation mask with the same dimension of the input.

Figure 1: Examples of challenging cases for inter-foetus membrane identification. (A) The membrane covers a small portion of the field of view, (B) anterior placenta partially occludes the membrane, (C) image has low illumination level, (D) amniotic fluid turbidity makes the image blurred.

Figure 5: Sample segmentation results on the test set using (second column) U-Net, (third column) U-Net with the residual implementation and (last column) the proposed SAN along the manual expert clinician ground-truth (first column). Each network was trained both with grey-scale and RGB fetoscopic images. The green, grey and blue contours refers to the ground-truth, grey scale-based and RGB-based segmentation results, respectively.

Figure 2: The proposed Segmentation Adversarial Network (SAN) architecture. Dashed arrows refer to skip connections. Black thin arrows refer to 2D strided convolution (downscale). Green thin arrow refers to 2D strided deconvolution. Conv2D-BN-ReLU module: 2D convolution followed by batch normalization (BN) and rectified linear unit (ReLU) activation. Conv2D-BN-Leaky ReLU module: 2D convolution followed by batch normalization (BN) and leaky rectified linear unit (ReLU) activation. Only the first downscale (last upscaling) block does not include a batch normalization (BN) layer. Concatenate: join the two feature vector with the same shape, from the critic network, to assemble a unique output. Masked images are calculated by pixel-wise multiplication (×) of the ground-truth (predicted) mask and the input image.

Content maybe subject to copyright Report

Inter-Foetus Membrane Segmentation for TTTS using

Adversarial Networks

Alessandro Casella

1,2,∗

Sara Moccia

2,3,∗

Emanuele Frontoni

Dario Paladini

Elena De Momi

Leonardo S. Mattos

aﬃliations:

Department of Electronics, Information and Bioengineering, Politecnico di

Milano, Milan, Italy

Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy

Department of Information Engineering, Universit`a Politecnica delle Marche, Ancona,

Italy

Department of Foetal and Perinatal Medicine, Istituto Giannina Gaslini, Genoa, Italy

∗

These authors equally contributed to the work

abbreviated title: Adversarial Networks for Membrane Segmentation in TTTS

correspondence: Alessandro Casella Department of Advanced Robotics, Istituto Italiano

di Tecnologia, Genoa, Italy. e-mail: alessandro.casella@polimi.it

Abstract

Twin-to-Twin Transfusion Syndrome (TTTS) is commonly treated with minimally

invasive laser surgery in fetoscopy. The inter-foetal membrane is used as a reference to

ﬁnd abnormal anastomoses. Membrane identiﬁcation is a challenging task due to small

ﬁeld of view of the camera, presence of amniotic liquid, foetus movement, illumination

changes and noise. This paper aims at providing automatic and fast membrane seg-

mentation in fetoscopic images. We implemented an adversarial network consisting of

two Fully-Convolutional Neural Networks (FCNNs). The former (the segmentor) is a

segmentation network inspired by U-Net and integrated with residual blocks, whereas

the latter acts as critic and is made only of the encoding path of the segmentor.

A dataset of 900 images acquired in 6 surgical cases was collected and labelled to

validate the proposed approach.

The adversarial networks achieved a median Dice similarity coeﬃcient of 91.91%

with Inter-Quartile Range (IQR) of 4.63%, overcoming approaches based on U-Net

(82.98% - IQR : 14.41%) and U-Net with residual blocks (86.13% - IQR : 13.63%).

Results proved that the proposed architecture could be a valuable and robust solution

to assist surgeons in providing membrane identiﬁcation while performing fetoscopic

surgery.

keywords: Deep learning adversarial networks fetoscopy intraoperative-image segmen-

tation

1 Introduction

Twin-to-Twin Transfusion Syndrome (TTTS) is a pathology with deadly consequences that

occurs in the 15% of monochorionic pregnancies (75% of twin homozygous pregnancies)

The aetiology of TTTS is correlated to the anomalous presence of unidirectional inter-

placental anastomoses, which cause an imbalance in the blood ﬂow between the foetuses.

The risk of perinatal mortality of one or both foetuses can exceed 90% without any treat-

ment, with an incidence of physical or neurological complications in the 50% of the surviving

foetuses

31,32

. Fetoscopic Minimally Invasive Surgery (MIS) has largely decreased maternal

and foetal morbidity or mortality

, becoming the recommended technique for the ﬁrst-line

treatment of TTTS. The surgery consists of a direct interruption of anastomoses that are

responsible for TTTS via laser photo-coagulation. The procedure is performed using a feto-

scope and a ﬁbre laser for ablation, which is driven through a small working channel in the

fetoscope. Fetoscopic MIS is performed in a selective way (i.e., only communicating vessels

among the foetuses should be coagulated, preserving all the others).

The selection of the vessels to be treated relies on the location of abnormal vascular

formations, at the small branches of normal blood vessels. The ﬁrst step for the surgeon,

to ﬁnd these abnormal vessels, is a visual inspection of the entire foetal environment (most

of the time randomly moving the fetoscope) until he/she locates the inter-foetus membrane,

which is used as a reference for the navigation of the vascular network. However, as described

in the clinical literature

29,37,25

, the identiﬁcation of the membrane is a challenging task since

the surgeon’s ability to maintain orientation is hampered by several factors: (i) there is a

limited Field of View (FoV) on the surgical scene, constraining the surgeon to view only a

small portion of the placental surface

17,8

; (ii) the fetoscope often goes out of focus due to

dynamic changes in the foetal environment; (iii) foetuses can unpredictably move and often

occlude the camera FoV, hiding the membrane; (iv) the surgical environment is immersed

to the amniotic ﬂuid, which is turbid. Additional challenges include large variability in the

illumination level, which ranges from intense illumination (causing specular reﬂections) to

dim lighting conditions. Some visual examples to highlight the challenges in identifying the

membrane are shown in Fig. 1.

[Figure 1 about here.]

Computer-assisted solutions may be used to identify and segment the membrane in order

to support surgeons during TTTS surgery. Such solutions may tackle the complexity of

intraoperative images through learning-based segmentation, as highlighted by a review on

medical-image segmentation

. Recently, researchers in other medical ﬁelds (e.g., skin le-

sion segmentation) have shown the potentiality of adversarial training for further increasing

segmentation performance

Inspired by such considerations, this paper proposes a framework based on adversarial

networks for the segmentation of inter-foetal membrane from in-vivo fetoscopy images ac-

quired during TTTS MIS. The additional L

loss term computed by the critic network realise

a multi-scale features analysis during training preserving high-level features connected to

macro appearance. The proposed framework aims at supporting clinicians by automatically

detecting the membrane on the fetoscope video stream and highlighting its borders. The

integration of this framework in a smart fetoscope system may lead to a decrease in surgeon

mental workload during the surgery, possibly reducing the duration of the intervention.

The paper is organised as follows: Sec. 1.1 surveys intraoperative medical image seg-

mentation strategies, with a focus on learning algorithms and adversarial training; Sec. 2

presents the proposed segmentation framework and describes the experimental protocol for

validating it. The obtained results are presented in Sec. 4 and discussed in Sec. 5. Conclusive

remarks are presented in Sec. 5.1.

1.1 Related work on intraoperative tissue segmentation

[Figure 2 about here.]

In the past, intraoperative tissue segmentation approaches mostly dealt with ﬁltering or

deformable models. For instance, in

steerable ﬁlters and textural descriptor were used

for gastric-lesion segmentation in capsule endoscopy. To tackle some of the limitations of

these approaches (e.g., needs for parameter tuning and long processing time), supervised

machine learning algorithms have been proposed to provide fast and accurate segmentation

Supervised machine learning addresses the segmentation as a two-step problem: ﬁrst, image

features are extracted (e.g. intensity and textural features), then such features are classiﬁed

(e.g. with support vector machines (SVMs) and decision trees). Applications include uterus

segmentation from endoscopic images, where Gabor ﬁltering is used for feature extraction

;

other examples are segmentation of Fallopian tubes from endoscopic images, obtained using

tube-speciﬁc geometrical features

and segmentation of abdominal organs with textural

features

. More recently, Fully-Convolutional Neural Networks (FCNNs) have emerged as

a powerful supervised-learning tool for many visual recognition tasks such as segmentation

of complex scenes from in-vivo endoscopic images. FCNNs allow for accurate segmentation

when the large annotated training datasets are available. FCNN ﬁrst layers are responsible

for automatic image-feature extraction, while the last layer classiﬁes the features and provides

the segmentation mask

. After their ﬁrst implementation

, FCNNs have been deployed in a

variety of architectures, such as U-Net

, SegNet

and residual architectures

(mainly based

on the residual blocks proposed in ResNet architectures

). In

39,13

, SegNet is used for polyp

segmentation. In both cases, SegNet is pre-trained on the ImageNet dataset

and then ﬁne-

tuned to address the segmentation task. Similarly, in

several state-of-the-art FCNNs (i.e.,

AlexNet, GoogleNet, VVG and residual network) are pre-trained on the PASCAL VOC

and ﬁne tuned for polyp segmentation.

FCNNs are trained by minimising an error metric between the ground-truth and the pre-

dicted segmentation. This error metric is commonly computed by measuring the overlap or

by comparing the pixel-probability distributions between the ground-truth and the predicted

segmentation

. Following a diﬀerent perspective, researchers have recently investigated the

use of adversarial training. Adversarial training was initially proposed by Goodfellow et al.

as a generative framework for natural images (i.e., in the context of Generative Adversarial

Networks (GANs)) made of a generator and a discriminator network

. This framework

HTML Viewer

Frequently Asked Questions (18)

Q1. What have the authors contributed in "Inter-foetus membrane segmentation for ttts using adversarial networks" ?

This paper aims at providing automatic and fast membrane segmentation in fetoscopic images.

Q2. What are the future works mentioned in the paper "Inter-foetus membrane segmentation for ttts using adversarial networks" ?

This problem will be addressed in the future by investigating extensions of this framework supported by a broader dataset and more advanced data augmentation techniques. Further improvements will deal with the exploitation of temporal features, as suggested in43,6, considering that the temporal information is naturally encoded in the surgical videos.

Q3. What tests were used to assess the performance of the SAN?

The Kruskal-Wallis on DSC and Westenberg-Mood test on IQR, both imposing a significance level (p) equal to 0.05, were used to assess whether or not remarkable differences existed between the tested architectures.

Q4. Why did the second expert annotate only the test set?

The authors asked the second expert to annotate only the test set (150frames) due to the high time demand needed to perform manual annotation.

Q5. What is the risk of perinatal mortality of one or both foetuses?

The risk of perinatal mortality of one or both foetuses can exceed 90% without any treatment, with an incidence of physical or neurological complications in the 50% of the surviving foetuses31,32.

Q6. What could be exploited to avoid the processing of uninformative video portions?

Frame selection strategies26 could be exploited too, such as to avoid the processing of uninformative (e.g., blurred) video portions.

Q7. What is the encoding path of the decoder?

Each step of the decoder is made of a strided deconvolution layer with BN and a ReLU activation layer followed by a residual block.

Q8. What was the first proposed framework for natural images?

Adversarial training was initially proposed by Goodfellow et al.15 as a generative framework for natural images (i.e., in the context of Generative Adversarial Networks (GANs)) made of a generator and a discriminator network15.

Q9. What is the median DSC for U-Net and the proposed adversarial network?

In (iv), the presence of low contrast and laser light compromises the detection of the membrane in U-Net and Residual networks while in their framework produces good segmentation.

Q10. How many frames did the proposed framework achieve?

In this paper, the authors proposed an adversarial framework for accurate and fast inter-foetal membrane segmentation in fetoscopic MIS images achieving a median DSC of 91.91% on a new dataset of 150 images from intraoperative TTTS surgery videos.

Q11. What is the architecture of the segmentor network S?

The architecture of the segmentor network S (Table 1) is based on the U-Net33 encoderdecoder structure, a fully convolutional network that naturally performs overlap-tile extraction, preserving spatial connectivity between tiles while speeding up network training.

Q12. What other work could be integrated with the proposed approach?

The proposed approach may also be integrated with recent work, which deals with vessel segmentation from placenta images1,34, stitching of fetoscopy images to build placental panoramic image12,44 and classification of TTTS surgical phases38.

Q13. What is the definition of the segmentor loss in the SAN framework?

The segmentor loss (LSSAN ) (Eq. 5) in their framework, consists of two terms: a common overlap metrics based on Dice similarity coefficient (LDSC) and an additional term derived from the critic (LL1 ).

Q14. What are the main reasons why the fetoscopic images may look different?

The high level of noise, the blurred vision due to amniotic fluid with suspended particulate matter, the wide range of illumination and the variation of the fetoscope pose to the recorded tissues further increase the complexity of the structures segmentation.

Q15. Why is the dataset size a strong limitation of this study?

Despite their efforts, due to the limited amount of available videos and the complexity of the task, the dataset size remains a strong limitation of this study.

Q16. What is the way to train a TTTS dataset?

The achievement of such a large dataset, as recommended to avoid overfitting, was difficult because: (i) data manual annotation is a complex and time-consuming task, (ii) the data availability is limited, since TTTS is a rare pathology.

Q17. What are the limitations of supervised machine learning?

To tackle some of the limitations of these approaches (e.g., needs for parameter tuning and long processing time), supervised machine learning algorithms have been proposed to provide fast and accurate segmentation36.

Q18. What is the combination between segmentation performance and robustness?

An ablation study was performed, showing that the S network with 5 encoding-decoding layers was the best combination between segmentation performance and robustness.

Inter-foetus Membrane Segmentation for TTTS Using Adversarial Networks

Summary (3 min read)

1 Introduction

2.1 SAN architecture

2.2 Training strategy

3.1 Dataset

3.2 Training setting and Ablation Study

3.3 Performance metrics

4 Results

5 Discussion

5.1 Conclusion

Figures (7)

Citations

Cites background from "Inter-foetus Membrane Segmentation ..."

References

Related Papers (5)

Frequently Asked Questions (18)

Q1. What have the authors contributed in "Inter-foetus membrane segmentation for ttts using adversarial networks" ?

Q2. What are the future works mentioned in the paper "Inter-foetus membrane segmentation for ttts using adversarial networks" ?

Q3. What tests were used to assess the performance of the SAN?

Q4. Why did the second expert annotate only the test set?

Q5. What is the risk of perinatal mortality of one or both foetuses?

Q6. What could be exploited to avoid the processing of uninformative video portions?

Q7. What is the encoding path of the decoder?

Q8. What was the first proposed framework for natural images?

Q9. What is the median DSC for U-Net and the proposed adversarial network?

Q10. How many frames did the proposed framework achieve?

Q11. What is the architecture of the segmentor network S?

Q12. What other work could be integrated with the proposed approach?

Q13. What is the definition of the segmentor loss in the SAN framework?

Q14. What are the main reasons why the fetoscopic images may look different?

Q15. Why is the dataset size a strong limitation of this study?

Q16. What is the way to train a TTTS dataset?

Q17. What are the limitations of supervised machine learning?

Q18. What is the combination between segmentation performance and robustness?