scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Inter-foetus Membrane Segmentation for TTTS Using Adversarial Networks

TL;DR: An adversarial network consisting of two Fully-Convolutional Neural Networks that could be a valuable and robust solution to assist surgeons in providing membrane identification while performing fetoscopy.
Abstract: Twin-to-Twin Transfusion Syndrome is commonly treated with minimally invasive laser surgery in fetoscopy. The inter-foetal membrane is used as a reference to find abnormal anastomoses. Membrane identification is a challenging task due to small field of view of the camera, presence of amniotic liquid, foetus movement, illumination changes and noise. This paper aims at providing automatic and fast membrane segmentation in fetoscopic images. We implemented an adversarial network consisting of two Fully-Convolutional Neural Networks. The former (the segmentor) is a segmentation network inspired by U-Net and integrated with residual blocks, whereas the latter acts as critic and is made only of the encoding path of the segmentor. A dataset of 900 images acquired in 6 surgical cases was collected and labelled to validate the proposed approach. The adversarial networks achieved a median Dice similarity coefficient of 91.91% with Inter-Quartile Range (IQR) of 4.63%, overcoming approaches based on U-Net (82.98%-IQR: 14.41%) and U-Net with residual blocks (86.13%-IQR: 13.63%). Results proved that the proposed architecture could be a valuable and robust solution to assist surgeons in providing membrane identification while performing fetoscopic surgery.

Summary (3 min read)

1 Introduction

  • Twin-to-Twin Transfusion Syndrome (TTTS) is a pathology with deadly consequences that occurs in the 15% of monochorionic pregnancies (75% of twin homozygous pregnancies) 3 .
  • Fetoscopic Minimally Invasive Surgery (MIS) has largely decreased maternal and foetal morbidity or mortality 35 , becoming the recommended technique for the first-line treatment of TTTS.
  • The surgery consists of a direct interruption of anastomoses that are responsible for TTTS via laser photo-coagulation.
  • The selection of the vessels to be treated relies on the location of abnormal vascular formations, at the small branches of normal blood vessels.
  • Additional challenges include large variability in the illumination level, which ranges from intense illumination (causing specular reflections) to dim lighting conditions.

2.1 SAN architecture

  • Similarly to the original generative framework, the Segmentation Adversarial Network (SAN) implemented in this work consists of two networks, where the generator (which here acts as segmentation network (S )) and the discriminator (here, the critic network (C )) are alternately trained to minimise and maximise an objective function, respectively.
  • Figure 2 shows the overall diagram of the framework.
  • Considering improvements in network training speed and performances reported in the literature 40 , Leaky ReLU is chosen over the standard one.
  • The last step of the encoding path is made of a convolution layer with ReLU activation.
  • The architecture of the C network (Table 2 ) contains the same encoding path of the segmentor network for feature extractions.

2.2 Training strategy

  • In the SAN framework there are two loss functions, one for the segmentor S and one for the critic C network.
  • The computation of the L L1 loss term is based on high-level features differences between the predicted and the true segmentation extracted from the critic network.
  • L L1 loss function force the segmentor to learn both global and local features that capture long-and short-range spatial relationships between pixels.

3.1 Dataset

  • In order to train the proposed framework, the authors built a new dataset in collaboration with the Department of Foetal and Perinatal Medicine, Istituto Giannina Gaslini, Genoa .
  • The dataset consisted of 900 frames (frame size: 720 x 576 pixels) extracted from 6 videos (150 frames per video) of patients acquired during the normal surgical practice.
  • The authors randomly assembled a dataset acquired from patients who received TTTS laser treatments at the same hospital.
  • The followed procedures were in accordance with the image data collection and retrospec-.
  • The black borders surrounding the FoV do not bring any additional information to segment the membrane but increase the GPU-memory and computational-cost requirements during training.

3.2 Training setting and Ablation Study

  • To limit memory requirements in the training phase, still promoting the convergence of the gradient, SAN was trained with mini-batches (batch size = 30 frames) minimising L SAN (Eq. 5) with Adam 19 .
  • To initialise the weights, the segmentor was prior trained without the critic in the first 25 epochs.
  • The best model was selected as the one that minimised the DSC on the validation set.
  • The framework was originally proposed for skin lesion segmentation for ISBI International Skin Imaging Collaboration 2017 41 .
  • To evaluate inter-annotator variability the authors asked a second expert to annotate the fetoscopic video used as test set.

3.3 Performance metrics

  • The Lilliefors test was used to assess population normality on DSC.
  • The Kruskal-Wallis on DSC and Westenberg-Mood test on IQR, both imposing a significance level (p) equal to 0.05, were used to assess whether or not remarkable differences existed between the tested architectures.

4 Results

  • The processing time of images in the test set was less than a millisecond, on average.
  • This performance confirms the compatibility with real-time applications of this approach.
  • In (i) all the networks achieved good results despite the presence of spots and specularities; (ii) all networks achieved good results despite the fact that the U-Net and the residual architecture produced some spots in the lower area where the texture could suggest the presence of the membrane.
  • This suggests that the action of critic network provides the ability to the segmentor network to enhance the processing of poor quality images (e.g., with laser pointer, light specularities, drop of light intensity, etc.).

5 Discussion

  • During TTTS surgery, the identification of the inter-foetal membrane helps the surgeon to remain oriented in the surgical site.
  • The complexity of the placental environment, especially in advanced pregnancies, makes this task very challenging also for expert clinicians when performing surgery.
  • The authors also compare this framework with state-of-the-art FCNNs for medical-image segmentation.
  • For this reason, some kind of images (e.g., with a small portion of the membrane) are less numerous than others, limiting the network learning capability.
  • Further improvements will deal with the exploitation of temporal features, as suggested in 43, 6 , considering that the temporal information is naturally encoded in the surgical videos.

5.1 Conclusion

  • The authors proposed an adversarial framework for accurate and fast inter-foetal mem- Sample segmentation results on the test set using (second column) U-Net, (third column) U-Net with the residual implementation and (last column) the proposed SAN along the manual expert clinician ground-truth (first column).
  • Each network was trained both with grey-scale and RGB fetoscopic images.
  • The green, grey and blue contours refers to the ground-truth, grey scale-based and RGB-based segmentation results, respectively.

Did you find this useful? Give us your feedback

Figures (7)

Content maybe subject to copyright    Report

Inter-Foetus Membrane Segmentation for TTTS using
Adversarial Networks
Alessandro Casella
1,2,
Sara Moccia
2,3,
Emanuele Frontoni
3
Dario Paladini
4
Elena De Momi
1
Leonardo S. Mattos
2
affiliations:
1
Department of Electronics, Information and Bioengineering, Politecnico di
Milano, Milan, Italy
2
Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
3
Department of Information Engineering, Universit`a Politecnica delle Marche, Ancona,
Italy
4
Department of Foetal and Perinatal Medicine, Istituto Giannina Gaslini, Genoa, Italy
These authors equally contributed to the work
abbreviated title: Adversarial Networks for Membrane Segmentation in TTTS
correspondence: Alessandro Casella Department of Advanced Robotics, Istituto Italiano
di Tecnologia, Genoa, Italy. e-mail: alessandro.casella@polimi.it
1

Abstract
Twin-to-Twin Transfusion Syndrome (TTTS) is commonly treated with minimally
invasive laser surgery in fetoscopy. The inter-foetal membrane is used as a reference to
find abnormal anastomoses. Membrane identification is a challenging task due to small
field of view of the camera, presence of amniotic liquid, foetus movement, illumination
changes and noise. This paper aims at providing automatic and fast membrane seg-
mentation in fetoscopic images. We implemented an adversarial network consisting of
two Fully-Convolutional Neural Networks (FCNNs). The former (the segmentor) is a
segmentation network inspired by U-Net and integrated with residual blocks, whereas
the latter acts as critic and is made only of the encoding path of the segmentor.
A dataset of 900 images acquired in 6 surgical cases was collected and labelled to
validate the proposed approach.
The adversarial networks achieved a median Dice similarity coefficient of 91.91%
with Inter-Quartile Range (IQR) of 4.63%, overcoming approaches based on U-Net
(82.98% - IQR : 14.41%) and U-Net with residual blocks (86.13% - IQR : 13.63%).
Results proved that the proposed architecture could be a valuable and robust solution
to assist surgeons in providing membrane identification while performing fetoscopic
surgery.
keywords: Deep learning adversarial networks fetoscopy intraoperative-image segmen-
tation
2

1 Introduction
Twin-to-Twin Transfusion Syndrome (TTTS) is a pathology with deadly consequences that
occurs in the 15% of monochorionic pregnancies (75% of twin homozygous pregnancies)
3
.
The aetiology of TTTS is correlated to the anomalous presence of unidirectional inter-
placental anastomoses, which cause an imbalance in the blood flow between the foetuses.
The risk of perinatal mortality of one or both foetuses can exceed 90% without any treat-
ment, with an incidence of physical or neurological complications in the 50% of the surviving
foetuses
31,32
. Fetoscopic Minimally Invasive Surgery (MIS) has largely decreased maternal
and foetal morbidity or mortality
35
, becoming the recommended technique for the first-line
treatment of TTTS. The surgery consists of a direct interruption of anastomoses that are
responsible for TTTS via laser photo-coagulation. The procedure is performed using a feto-
scope and a fibre laser for ablation, which is driven through a small working channel in the
fetoscope. Fetoscopic MIS is performed in a selective way (i.e., only communicating vessels
among the foetuses should be coagulated, preserving all the others).
The selection of the vessels to be treated relies on the location of abnormal vascular
formations, at the small branches of normal blood vessels. The first step for the surgeon,
to find these abnormal vessels, is a visual inspection of the entire foetal environment (most
of the time randomly moving the fetoscope) until he/she locates the inter-foetus membrane,
which is used as a reference for the navigation of the vascular network. However, as described
in the clinical literature
29,37,25
, the identification of the membrane is a challenging task since
the surgeon’s ability to maintain orientation is hampered by several factors: (i) there is a
limited Field of View (FoV) on the surgical scene, constraining the surgeon to view only a
small portion of the placental surface
17,8
; (ii) the fetoscope often goes out of focus due to
dynamic changes in the foetal environment; (iii) foetuses can unpredictably move and often
occlude the camera FoV, hiding the membrane; (iv) the surgical environment is immersed
to the amniotic fluid, which is turbid. Additional challenges include large variability in the
illumination level, which ranges from intense illumination (causing specular reflections) to
dim lighting conditions. Some visual examples to highlight the challenges in identifying the
3

membrane are shown in Fig. 1.
[Figure 1 about here.]
Computer-assisted solutions may be used to identify and segment the membrane in order
to support surgeons during TTTS surgery. Such solutions may tackle the complexity of
intraoperative images through learning-based segmentation, as highlighted by a review on
medical-image segmentation
20
. Recently, researchers in other medical fields (e.g., skin le-
sion segmentation) have shown the potentiality of adversarial training for further increasing
segmentation performance
42
.
Inspired by such considerations, this paper proposes a framework based on adversarial
networks for the segmentation of inter-foetal membrane from in-vivo fetoscopy images ac-
quired during TTTS MIS. The additional L
1
loss term computed by the critic network realise
a multi-scale features analysis during training preserving high-level features connected to
macro appearance. The proposed framework aims at supporting clinicians by automatically
detecting the membrane on the fetoscope video stream and highlighting its borders. The
integration of this framework in a smart fetoscope system may lead to a decrease in surgeon
mental workload during the surgery, possibly reducing the duration of the intervention.
The paper is organised as follows: Sec. 1.1 surveys intraoperative medical image seg-
mentation strategies, with a focus on learning algorithms and adversarial training; Sec. 2
presents the proposed segmentation framework and describes the experimental protocol for
validating it. The obtained results are presented in Sec. 4 and discussed in Sec. 5. Conclusive
remarks are presented in Sec. 5.1.
1.1 Related work on intraoperative tissue segmentation
[Figure 2 about here.]
In the past, intraoperative tissue segmentation approaches mostly dealt with filtering or
deformable models. For instance, in
24
steerable filters and textural descriptor were used
4

for gastric-lesion segmentation in capsule endoscopy. To tackle some of the limitations of
these approaches (e.g., needs for parameter tuning and long processing time), supervised
machine learning algorithms have been proposed to provide fast and accurate segmentation
36
.
Supervised machine learning addresses the segmentation as a two-step problem: first, image
features are extracted (e.g. intensity and textural features), then such features are classified
(e.g. with support vector machines (SVMs) and decision trees). Applications include uterus
segmentation from endoscopic images, where Gabor filtering is used for feature extraction
5
;
other examples are segmentation of Fallopian tubes from endoscopic images, obtained using
tube-specific geometrical features
30
and segmentation of abdominal organs with textural
features
27
. More recently, Fully-Convolutional Neural Networks (FCNNs) have emerged as
a powerful supervised-learning tool for many visual recognition tasks such as segmentation
of complex scenes from in-vivo endoscopic images. FCNNs allow for accurate segmentation
when the large annotated training datasets are available. FCNN first layers are responsible
for automatic image-feature extraction, while the last layer classifies the features and provides
the segmentation mask
14
. After their first implementation
21
, FCNNs have been deployed in a
variety of architectures, such as U-Net
33
, SegNet
2
and residual architectures
9
(mainly based
on the residual blocks proposed in ResNet architectures
16
). In
39,13
, SegNet is used for polyp
segmentation. In both cases, SegNet is pre-trained on the ImageNet dataset
7
and then fine-
tuned to address the segmentation task. Similarly, in
4
several state-of-the-art FCNNs (i.e.,
AlexNet, GoogleNet, VVG and residual network) are pre-trained on the PASCAL VOC
11
and fine tuned for polyp segmentation.
FCNNs are trained by minimising an error metric between the ground-truth and the pre-
dicted segmentation. This error metric is commonly computed by measuring the overlap or
by comparing the pixel-probability distributions between the ground-truth and the predicted
segmentation
14
. Following a different perspective, researchers have recently investigated the
use of adversarial training. Adversarial training was initially proposed by Goodfellow et al.
15
as a generative framework for natural images (i.e., in the context of Generative Adversarial
Networks (GANs)) made of a generator and a discriminator network
15
. This framework
5

Citations
More filters
Journal ArticleDOI
TL;DR: An unprecedented interest in AI applied to imaging in Italy is witnessed, in a diversity of fields and imaging techniques, which is needed to build common frameworks and databases, collaborations among different types of institutions, and guidelines for research on AI.

40 citations

Journal ArticleDOI
19 Feb 2021-Sensors
TL;DR: In this article, a pixel-wise polyp segmentation model named A-DenseUNet was developed to detect colorectal cancer in colonoscopy videos. But the proposed architecture adapts different datasets, adjusting for the unknown depth of the network by sharing multiscale encoding information to the different levels of the decoder side.
Abstract: Colon carcinoma is one of the leading causes of cancer-related death in both men and women. Automatic colorectal polyp segmentation and detection in colonoscopy videos help endoscopists to identify colorectal disease more easily, making it a promising method to prevent colon cancer. In this study, we developed a fully automated pixel-wise polyp segmentation model named A-DenseUNet. The proposed architecture adapts different datasets, adjusting for the unknown depth of the network by sharing multiscale encoding information to the different levels of the decoder side. We also used multiple dilated convolutions with various atrous rates to observe a large field of view without increasing the computational cost and prevent loss of spatial information, which would cause dimensionality reduction. We utilized an attention mechanism to remove noise and inappropriate information, leading to the comprehensive re-establishment of contextual features. Our experiments demonstrated that the proposed architecture achieved significant segmentation results on public datasets. A-DenseUNet achieved a 90% Dice coefficient score on the Kvasir-SEG dataset and a 91% Dice coefficient score on the CVC-612 dataset, both of which were higher than the scores of other deep learning models such as UNet++, ResUNet, U-Net, PraNet, and ResUNet++ for segmenting polyps in colonoscopy images.

24 citations

Journal ArticleDOI
TL;DR: The experimental results showed the effectiveness of the proposed framework, proving its potential in supporting clinicians during the clinical practice and overcoming approaches in the literature.

24 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method for Minimally Invasive Surgery image synthesis is actually able to translate MIS segmentations to realistic MIS images, which can in turn be used to augment existing data sets and help at overcoming the lack of useful images.

19 citations

Journal ArticleDOI
Dong Xiao1, Liu Xiwen1, Ba Tuan Le1, Zhiwen Ji1, Xiaoyu Sun1 
02 Sep 2020-Sensors
TL;DR: The experimental results show that compared with other U-Net and DUNet, the RDU-Net has significantly improved segmentation accuracy, and has better generalization ability, which can fully meet the requirements of ore fragment size detection in the concentrator.
Abstract: The ore fragment size on the conveyor belt of concentrators is not only the main index to verify the crushing process, but also affects the production efficiency, operation cost and even production safety of the mine. In order to get the size of ore fragments on the conveyor belt, the image segmentation method is a convenient and fast choice. However, due to the influence of dust, light and uneven color and texture, the traditional ore image segmentation methods are prone to oversegmentation and undersegmentation. In order to solve these problems, this paper proposes an ore image segmentation model called RDU-Net (R: residual connection; DU: DUNet), which combines the residual structure of convolutional neural network with DUNet model, greatly improving the accuracy of image segmentation. RDU-Net can adaptively adjust the receptive field according to the size and shape of different ore fragments, capture the ore edge of different shape and size, and realize the accurate segmentation of ore image. The experimental results show that compared with other U-Net and DUNet, the RDU-Net has significantly improved segmentation accuracy, and has better generalization ability, which can fully meet the requirements of ore fragment size detection in the concentrator.

13 citations


Cites background from "Inter-foetus Membrane Segmentation ..."

  • ...[17] implemented an adversarial network consisting of two fully convolutional neural networks....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations

Proceedings ArticleDOI
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

49,639 citations

Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Journal ArticleDOI
08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

38,211 citations

Frequently Asked Questions (18)
Q1. What have the authors contributed in "Inter-foetus membrane segmentation for ttts using adversarial networks" ?

This paper aims at providing automatic and fast membrane segmentation in fetoscopic images. 

This problem will be addressed in the future by investigating extensions of this framework supported by a broader dataset and more advanced data augmentation techniques. Further improvements will deal with the exploitation of temporal features, as suggested in43,6, considering that the temporal information is naturally encoded in the surgical videos. 

The Kruskal-Wallis on DSC and Westenberg-Mood test on IQR, both imposing a significance level (p) equal to 0.05, were used to assess whether or not remarkable differences existed between the tested architectures. 

The authors asked the second expert to annotate only the test set (150frames) due to the high time demand needed to perform manual annotation. 

The risk of perinatal mortality of one or both foetuses can exceed 90% without any treatment, with an incidence of physical or neurological complications in the 50% of the surviving foetuses31,32. 

Frame selection strategies26 could be exploited too, such as to avoid the processing of uninformative (e.g., blurred) video portions. 

Each step of the decoder is made of a strided deconvolution layer with BN and a ReLU activation layer followed by a residual block. 

Adversarial training was initially proposed by Goodfellow et al.15 as a generative framework for natural images (i.e., in the context of Generative Adversarial Networks (GANs)) made of a generator and a discriminator network15. 

In (iv), the presence of low contrast and laser light compromises the detection of the membrane in U-Net and Residual networks while in their framework produces good segmentation. 

In this paper, the authors proposed an adversarial framework for accurate and fast inter-foetal membrane segmentation in fetoscopic MIS images achieving a median DSC of 91.91% on a new dataset of 150 images from intraoperative TTTS surgery videos. 

The architecture of the segmentor network S (Table 1) is based on the U-Net33 encoderdecoder structure, a fully convolutional network that naturally performs overlap-tile extraction, preserving spatial connectivity between tiles while speeding up network training. 

The proposed approach may also be integrated with recent work, which deals with vessel segmentation from placenta images1,34, stitching of fetoscopy images to build placental panoramic image12,44 and classification of TTTS surgical phases38. 

The segmentor loss (LSSAN ) (Eq. 5) in their framework, consists of two terms: a common overlap metrics based on Dice similarity coefficient (LDSC) and an additional term derived from the critic (LL1 ). 

The high level of noise, the blurred vision due to amniotic fluid with suspended particulate matter, the wide range of illumination and the variation of the fetoscope pose to the recorded tissues further increase the complexity of the structures segmentation. 

Despite their efforts, due to the limited amount of available videos and the complexity of the task, the dataset size remains a strong limitation of this study. 

The achievement of such a large dataset, as recommended to avoid overfitting, was difficult because: (i) data manual annotation is a complex and time-consuming task, (ii) the data availability is limited, since TTTS is a rare pathology. 

To tackle some of the limitations of these approaches (e.g., needs for parameter tuning and long processing time), supervised machine learning algorithms have been proposed to provide fast and accurate segmentation36. 

An ablation study was performed, showing that the S network with 5 encoding-decoding layers was the best combination between segmentation performance and robustness.