A shape-constraint adversarial framework with instance-normalized spatio-temporal features for inter-fetal membrane segmentation

doi:10.1016/J.MEDIA.2021.102008

Journal Article•DOI•

A shape-constraint adversarial framework with instance-normalized spatio-temporal features for inter-fetal membrane segmentation

Alessandro Casella¹, Alessandro Casella², Sara Moccia³, Dario Paladini⁴, Emanuele Frontoni⁵, Elena De Momi¹, Leonardo S. Mattos² - Show less +3 more•Institutions (5)

Polytechnic University of Milan¹, Istituto Italiano di Tecnologia², Sant'Anna School of Advanced Studies³, Istituto Giannina Gaslini⁴, Marche Polytechnic University⁵

19 Feb 2021-Medical Image Analysis (Elsevier)-Vol. 70, pp 102008-102008

TL;DR: In this paper, a new deep learning framework for inter-fetal membrane segmentation on in-vivo fetoscopic videos is presented, which enhances existing architectures by encoding a novel (instance-normalized) dense block, invariant to illumination changes, that extracts spatio-temporal features to enforce pixel connectivity in time, and relying on an adversarial training, which constrains macro appearance.

read less

About: This article is published in Medical Image Analysis.The article was published on 2021-02-19 and is currently open access. It has received 9 citations till now.

...read moreread less

Summary (3 min read)

Jump to: [1. Introduction] – [1.1. Contribution of the work] – [2. Methods] – [2.1. Segmentor] – [2.2. Critic] – [2.3. Adversarial training strategy] – [3.1. Dataset] – [3.2. Parameter setting] – [3.4. Ablation studies] and [5. Discussion and conclusions]

1. Introduction

Twin-to-twin transfusion syndrome (TTTS) may occur, during identical twin pregnancies, when abnormal vascular anastomoses in the monochorionic placenta result in uneven blood flow between the fetuses.
At the beginning of the surgical treatment, the surgeon identifies the interfetal membrane, which is used as a reference to explore the placenta vascular network and identify vessels to be treated.
As for placental vessel segmentation, the work in Almoussa et al. (2011) proposes a30 neural network trained on manually handcrafted features from E4-vivo placenta images.
The instance-normalized topology can tackle the il- lumination variability typical of fetoscopic videos acquired during TTTS surgery.
The spatio-temporal features can boost segmentation performance enforcing the consistency of segmentation masks across sequential frames.

1.1. Contribution of the work

The authors address the problem of automatic inter-fetal membrane segmentation to enhance surgeon context awareness during TTTS surgery.
Specifically, the authors extend the adversarial framework presented in Casella et al. (2020) to process, via spatio-temporal convolution, surgical video clips.
This allows us to70 exploit the temporal information naturally encoded in videos.
The authors further design a dense block that encodes instance normalization, to account for illumination changes in the video clips.
The authors will make the dataset collected for this work publicly available, to foster further research in the field.

2. Methods

The proposed framework consists of the segmentor, described in Sec. 2.1, and a discriminator network , described in Sec. 2.2.
The segmentor and critic are trained in an adversarial fashion, following the strategy proposed in Casella et al. (2020) and described in Sec. 2.3.90.

2.1. Segmentor

The segmentor has a dense UNet-like architecture consisting of downsampling and upsampling path, linked via long-skip connections.
This process is repeated until there are available frames, and results in a collection of temporal clips.
Each dense block is followed by a transition down module for downscaling.
By building upon the dense module proposed in (Huang et al., 2017), the authors propose a new dense module that uses two (leaky ReLu) pre-activated convolutions, instead of a single one.

2.2. Critic

It is com-140 posed by two branches, as described in Table 1 and shown in Fig. 2, for extracting features from both the gold-standard segmentation and the segmentor output.
The authors decided to keep the critic architecture similar to its original implementation because the role of the critic is to provide a shape constraining mechanism for the segmentor output.
The use of dense blocks would have introduced unnecessary complexity with an increase in memory requirements.
The segmentor branch takes as input x masked by the output of the segmentor (S(x)).

2.3. Adversarial training strategy

The segmentor and critic layers are initialised using.
While there160 is a possible risk of divergence of the loss during training, the introduction of hyper parameters may allow to balance the action of the two terms in the loss function avoiding possible divergences, However, this never occurred in their experiments.

3.1. Dataset

To experimentally evaluate their two research hypotheses, the authors collected a dataset of 20 fetoscopic videos acquired during 20 different surgical procedures for treating TTTS in 20 women.
The membrane was manually annotated in each frame under the supervision of the surgeon.
This dataset, to the best of their knowledge, is the biggest dataset currently available for inter-fetal membrane segmentation.
Each frame was cropped to contain only the FoV of the fetoscope and, resized185 to 128x128 pixels both for smoothing noise and limiting memory usage.

3.2. Parameter setting

The authors used wlength = 4 due to the higher complexity of their framework, which required higher memory usage and computational power.
Validation and testing temporal clips were built using the same parameters but with ∆w = 4 (i.e., without195 overlap).
During training, at each iteration step, each batch was augmented200 with random rotation in range (−25◦,+25◦), horizontal and vertical flip, and scaling with a scaling factor in range (0.5, 1.5).
The Mann–Whitney–Wilcoxon test on Acc and DSC, both imposing a significance level (p) equal to 0.05, were used to assess whether or not remarkable differences existed between the tested architectures.

3.4. Ablation studies

The authors compared the results of the proposed framework against those of the adversarial network presented in Casella et al. (2020), which is the closest work with respect to ours.
Considering that a comprehensive comparison with standard state of the art approaches (e.g., UNet (Ronneberger et al., 2015) and220 ResNet (He et al., 2016)) is already provided in Casella et al. (2020), the authors here focused on the ablation studies.
For E6, the lowest performance was the one with ∆w = 4 (no overlap between temporal clips).
Visual samples for the tested models are shown in Fig.

5. Discussion and conclusions

This paper introduced a shape-constrained adversarial framework with instance-285 normalized spatio-temporal features to perform automatic inter-fetal membrane segmentation in fetoscopic video clips, while tackling the high illumination variability in fetoscopic videos.
The authors noticed310 that 3D convolution alone was not able to boost segmentation consistency, as the results are comparable with the 2D vanilla adversarial framework (E3 ).
In such cases, the temporal connectivity introduced to guarantee consistency across consecutive frames can affect the accuracy of segmentation negatively.
To conclude, the achieved results suggest that the proposed approach may be effective in supporting surgeons in the identification of the inter-fetal membrane390 in fetoscopic videos.
Data used for the analysis395 were acquired during actual surgery procedures and then were anonymized to allow researchers to conduct the study.

Did you find this useful? Give us your feedback