A shape-constraint adversarial framework with instance-normalized spatio-temporal features for inter-fetal membrane segmentation
Abstract: Background and Objectives During Twin-to-Twin Transfusion Syndrome (TTTS), abnormal vascular anastomoses in the monochorionic placenta can produce uneven blood flow between the fetuses. In the current practice, this syndrome is surgically treated by closing the abnormal connections using laser ablation. Surgeons commonly use the inter-fetal membrane as a reference. Limited field of view, low fetoscopic image quality and high inter-subject variability make the membrane identification a challenging task. However, currently available tools are not optimal for automatic membrane segmentation in fetoscopic videos, due to membrane texture homogeneity and high illumination variability. Methods To tackle these challenges, we present a new deep-learning framework for inter-fetal membrane segmentation on in-vivo fetoscopic videos. The framework enhances existing architectures by (i) encoding a novel (instance-normalized) dense block, invariant to illumination changes, that extracts spatio-temporal features to enforce pixel connectivity in time, and (ii) relying on an adversarial training, which constrains macro appearance. Results We performed a comprehensive validation using 20 different videos (2000 frames) from 20 different surgeries, achieving a mean Dice Similarity Coefficient of 0.8780 ± 0.1383 . Conclusions The proposed framework has great potential to positively impact the actual surgical practice for TTTS treatment, allowing the implementation of surgical guidance systems that can enhance context awareness and potentially lower the duration of the surgeries.
Summary (3 min read)
- Twin-to-twin transfusion syndrome (TTTS) may occur, during identical twin pregnancies, when abnormal vascular anastomoses in the monochorionic placenta result in uneven blood flow between the fetuses.
- At the beginning of the surgical treatment, the surgeon identifies the interfetal membrane, which is used as a reference to explore the placenta vascular network and identify vessels to be treated.
- As for placental vessel segmentation, the work in Almoussa et al. (2011) proposes a30 neural network trained on manually handcrafted features from E4-vivo placenta images.
- The instance-normalized topology can tackle the il- lumination variability typical of fetoscopic videos acquired during TTTS surgery.
- The spatio-temporal features can boost segmentation performance enforcing the consistency of segmentation masks across sequential frames.
1.1. Contribution of the work
- The authors address the problem of automatic inter-fetal membrane segmentation to enhance surgeon context awareness during TTTS surgery.
- Specifically, the authors extend the adversarial framework presented in Casella et al. (2020) to process, via spatio-temporal convolution, surgical video clips.
- This allows us to70 exploit the temporal information naturally encoded in videos.
- The authors further design a dense block that encodes instance normalization, to account for illumination changes in the video clips.
- The authors will make the dataset collected for this work publicly available, to foster further research in the field.
- The proposed framework consists of the segmentor, described in Sec. 2.1, and a discriminator network , described in Sec. 2.2.
- The segmentor and critic are trained in an adversarial fashion, following the strategy proposed in Casella et al. (2020) and described in Sec. 2.3.90.
- The segmentor has a dense UNet-like architecture consisting of downsampling and upsampling path, linked via long-skip connections.
- This process is repeated until there are available frames, and results in a collection of temporal clips.
- Each dense block is followed by a transition down module for downscaling.
- By building upon the dense module proposed in (Huang et al., 2017), the authors propose a new dense module that uses two (leaky ReLu) pre-activated convolutions, instead of a single one.
- It is com-140 posed by two branches, as described in Table 1 and shown in Fig. 2, for extracting features from both the gold-standard segmentation and the segmentor output.
- The authors decided to keep the critic architecture similar to its original implementation because the role of the critic is to provide a shape constraining mechanism for the segmentor output.
- The use of dense blocks would have introduced unnecessary complexity with an increase in memory requirements.
- The segmentor branch takes as input x masked by the output of the segmentor (S(x)).
2.3. Adversarial training strategy
- The segmentor and critic layers are initialised using.
- While there160 is a possible risk of divergence of the loss during training, the introduction of hyper parameters may allow to balance the action of the two terms in the loss function avoiding possible divergences, However, this never occurred in their experiments.
- To experimentally evaluate their two research hypotheses, the authors collected a dataset of 20 fetoscopic videos acquired during 20 different surgical procedures for treating TTTS in 20 women.
- The membrane was manually annotated in each frame under the supervision of the surgeon.
- This dataset, to the best of their knowledge, is the biggest dataset currently available for inter-fetal membrane segmentation.
- Each frame was cropped to contain only the FoV of the fetoscope and, resized185 to 128x128 pixels both for smoothing noise and limiting memory usage.
3.2. Parameter setting
- The authors used wlength = 4 due to the higher complexity of their framework, which required higher memory usage and computational power.
- Validation and testing temporal clips were built using the same parameters but with ∆w = 4 (i.e., without195 overlap).
- During training, at each iteration step, each batch was augmented200 with random rotation in range (−25◦,+25◦), horizontal and vertical flip, and scaling with a scaling factor in range (0.5, 1.5).
- The Mann–Whitney–Wilcoxon test on Acc and DSC, both imposing a significance level (p) equal to 0.05, were used to assess whether or not remarkable differences existed between the tested architectures.
3.4. Ablation studies
- The authors compared the results of the proposed framework against those of the adversarial network presented in Casella et al. (2020), which is the closest work with respect to ours.
- Considering that a comprehensive comparison with standard state of the art approaches (e.g., UNet (Ronneberger et al., 2015) and220 ResNet (He et al., 2016)) is already provided in Casella et al. (2020), the authors here focused on the ablation studies.
- For E6, the lowest performance was the one with ∆w = 4 (no overlap between temporal clips).
- Visual samples for the tested models are shown in Fig.
5. Discussion and conclusions
- This paper introduced a shape-constrained adversarial framework with instance-285 normalized spatio-temporal features to perform automatic inter-fetal membrane segmentation in fetoscopic video clips, while tackling the high illumination variability in fetoscopic videos.
- The authors noticed310 that 3D convolution alone was not able to boost segmentation consistency, as the results are comparable with the 2D vanilla adversarial framework (E3 ).
- In such cases, the temporal connectivity introduced to guarantee consistency across consecutive frames can affect the accuracy of segmentation negatively.
- To conclude, the achieved results suggest that the proposed approach may be effective in supporting surgeons in the identification of the inter-fetal membrane390 in fetoscopic videos.
- Data used for the analysis395 were acquired during actual surgery procedures and then were anonymized to allow researchers to conduct the study.
Did you find this useful? Give us your feedback