scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A shape-constraint adversarial framework with instance-normalized spatio-temporal features for inter-fetal membrane segmentation

TL;DR: In this paper, a new deep learning framework for inter-fetal membrane segmentation on in-vivo fetoscopic videos is presented, which enhances existing architectures by encoding a novel (instance-normalized) dense block, invariant to illumination changes, that extracts spatio-temporal features to enforce pixel connectivity in time, and relying on an adversarial training, which constrains macro appearance.
About: This article is published in Medical Image Analysis.The article was published on 2021-02-19 and is currently open access. It has received 9 citations till now.

Summary (3 min read)

1. Introduction

  • Twin-to-twin transfusion syndrome (TTTS) may occur, during identical twin pregnancies, when abnormal vascular anastomoses in the monochorionic placenta result in uneven blood flow between the fetuses.
  • At the beginning of the surgical treatment, the surgeon identifies the interfetal membrane, which is used as a reference to explore the placenta vascular network and identify vessels to be treated.
  • As for placental vessel segmentation, the work in Almoussa et al. (2011) proposes a30 neural network trained on manually handcrafted features from E4-vivo placenta images.
  • The instance-normalized topology can tackle the il- lumination variability typical of fetoscopic videos acquired during TTTS surgery.
  • The spatio-temporal features can boost segmentation performance enforcing the consistency of segmentation masks across sequential frames.

1.1. Contribution of the work

  • The authors address the problem of automatic inter-fetal membrane segmentation to enhance surgeon context awareness during TTTS surgery.
  • Specifically, the authors extend the adversarial framework presented in Casella et al. (2020) to process, via spatio-temporal convolution, surgical video clips.
  • This allows us to70 exploit the temporal information naturally encoded in videos.
  • The authors further design a dense block that encodes instance normalization, to account for illumination changes in the video clips.
  • The authors will make the dataset collected for this work publicly available, to foster further research in the field.

2. Methods

  • The proposed framework consists of the segmentor, described in Sec. 2.1, and a discriminator network , described in Sec. 2.2.
  • The segmentor and critic are trained in an adversarial fashion, following the strategy proposed in Casella et al. (2020) and described in Sec. 2.3.90.

2.1. Segmentor

  • The segmentor has a dense UNet-like architecture consisting of downsampling and upsampling path, linked via long-skip connections.
  • This process is repeated until there are available frames, and results in a collection of temporal clips.
  • Each dense block is followed by a transition down module for downscaling.
  • By building upon the dense module proposed in (Huang et al., 2017), the authors propose a new dense module that uses two (leaky ReLu) pre-activated convolutions, instead of a single one.

2.2. Critic

  • It is com-140 posed by two branches, as described in Table 1 and shown in Fig. 2, for extracting features from both the gold-standard segmentation and the segmentor output.
  • The authors decided to keep the critic architecture similar to its original implementation because the role of the critic is to provide a shape constraining mechanism for the segmentor output.
  • The use of dense blocks would have introduced unnecessary complexity with an increase in memory requirements.
  • The segmentor branch takes as input x masked by the output of the segmentor (S(x)).

2.3. Adversarial training strategy

  • The segmentor and critic layers are initialised using.
  • While there160 is a possible risk of divergence of the loss during training, the introduction of hyper parameters may allow to balance the action of the two terms in the loss function avoiding possible divergences, However, this never occurred in their experiments.

3.1. Dataset

  • To experimentally evaluate their two research hypotheses, the authors collected a dataset of 20 fetoscopic videos acquired during 20 different surgical procedures for treating TTTS in 20 women.
  • The membrane was manually annotated in each frame under the supervision of the surgeon.
  • This dataset, to the best of their knowledge, is the biggest dataset currently available for inter-fetal membrane segmentation.
  • Each frame was cropped to contain only the FoV of the fetoscope and, resized185 to 128x128 pixels both for smoothing noise and limiting memory usage.

3.2. Parameter setting

  • The authors used wlength = 4 due to the higher complexity of their framework, which required higher memory usage and computational power.
  • Validation and testing temporal clips were built using the same parameters but with ∆w = 4 (i.e., without195 overlap).
  • During training, at each iteration step, each batch was augmented200 with random rotation in range (−25◦,+25◦), horizontal and vertical flip, and scaling with a scaling factor in range (0.5, 1.5).
  • The Mann–Whitney–Wilcoxon test on Acc and DSC, both imposing a significance level (p) equal to 0.05, were used to assess whether or not remarkable differences existed between the tested architectures.

3.4. Ablation studies

  • The authors compared the results of the proposed framework against those of the adversarial network presented in Casella et al. (2020), which is the closest work with respect to ours.
  • Considering that a comprehensive comparison with standard state of the art approaches (e.g., UNet (Ronneberger et al., 2015) and220 ResNet (He et al., 2016)) is already provided in Casella et al. (2020), the authors here focused on the ablation studies.
  • For E6, the lowest performance was the one with ∆w = 4 (no overlap between temporal clips).
  • Visual samples for the tested models are shown in Fig.

5. Discussion and conclusions

  • This paper introduced a shape-constrained adversarial framework with instance-285 normalized spatio-temporal features to perform automatic inter-fetal membrane segmentation in fetoscopic video clips, while tackling the high illumination variability in fetoscopic videos.
  • The authors noticed310 that 3D convolution alone was not able to boost segmentation consistency, as the results are comparable with the 2D vanilla adversarial framework (E3 ).
  • In such cases, the temporal connectivity introduced to guarantee consistency across consecutive frames can affect the accuracy of segmentation negatively.
  • To conclude, the achieved results suggest that the proposed approach may be effective in supporting surgeons in the identification of the inter-fetal membrane390 in fetoscopic videos.
  • Data used for the analysis395 were acquired during actual surgery procedures and then were anonymized to allow researchers to conduct the study.

Did you find this useful? Give us your feedback

Figures (14)
Citations
More filters
Journal ArticleDOI
TL;DR: A detailed survey of the most recent work in the field can be found in this paper , with a total of 145 research papers published after 2017 and each paper is analyzed and commented on from both the methodology and application perspective.

15 citations

Journal ArticleDOI
TL;DR: A literature search on the use of AI in the diagnosis of NEC yielded 118 publications that were reduced to 8 after screening and checking for eligibility, and most publications showed promising results but no publications with evident clinical benefits were found.

3 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors used Mask R-CNN with two additional transposed layers at segmentation head to accurately segment the median nerve directly on transverse US images, which achieved good performances both in median nerve detection and segmentation: Precision (Prec), Recall (Rec), Mean Average Precision (mAP) and Dice Similarity Coefficient (DSC).
Abstract: Ultrasound (US) imaging is recognized as a useful support for Carpal Tunnel Syndrome (CTS) assessment through the evaluation of median nerve morphology. However, US is still far to be systematically adopted to evaluate this common entrapment neuropathy, due to US intrinsic challenges, such as its operator dependency and the lack of standard protocols. To support sonographers, the present study proposes a fully-automatic deep learning approach to median nerve segmentation from US images. We collected and annotated a dataset of 246 images acquired in clinical practice involving 103 rheumatic patients, regardless of anatomical variants (bifid nerve, closed vessels). We developed a Mask R-CNN with two additional transposed layers at segmentation head to accurately segment the median nerve directly on transverse US images. We calculated the cross-sectional area (CSA) of the predicted median nerve. Proposed model achieved good performances both in median nerve detection and segmentation: Precision (Prec), Recall (Rec), Mean Average Precision (mAP) and Dice Similarity Coefficient (DSC) values are 0.916 ± 0.245, 0.938 ± 0.233, 0.936 ± 0.235 and 0.868 ± 0.201, respectively. The CSA values measured on true positive predictions were comparable with the sonographer manual measurements with a mean absolute error (MAE) of 0.918 mm2. Experimental results showed the potential of proposed model, which identified and segmented the median nerve section in normal anatomy images, while still struggling when dealing with infrequent anatomical variants. Future research will expand the dataset including a wider spectrum of normal anatomy and pathology to support sonographers in daily practice.

2 citations

Journal ArticleDOI
TL;DR: In this paper , a fully-unsupervised approach for binary Surgical Instrument Segmentation is proposed, which uses shape-priors as realistic segmentation masks of the instruments, not necessarily coming from the same dataset/domain as the videos.

2 citations

Journal ArticleDOI
TL;DR: This review uncovers the literature on computer‐assisted software solutions focused on TTTS and evaluates the current maturity of technologies by the technology readiness level and enumerates the necessary aspects to bring these new technologies to clinical practice.
Abstract: Fetal laser surgery has emerged as the preferred treatment of twin‐to‐twin transfusion syndrome (TTTS). However, the limited field of view of the fetoscope and the complexity of the procedure make the treatment challenging. Therefore, preoperative planning and intraoperative guidance solutions have been proposed to cope with these challenges. This review uncovers the literature on computer‐assisted software solutions focused on TTTS. These solutions are classified by the pre‐ or intraoperative phase of the procedure and further categorized by discussed hardware and software approaches. In addition, it evaluates the current maturity of technologies by the technology readiness level and enumerates the necessary aspects to bring these new technologies to clinical practice.

1 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations

Posted Content
TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

11,866 citations

Posted Content
TL;DR: A small change in the stylization architecture results in a significant qualitative improvement in the generated images, and can be used to train high-performance architectures for real-time image generation.
Abstract: It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code will is made available on github at this https URL. Full paper can be found at arXiv:1701.02096.

3,118 citations