scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Instance Tumor Segmentation using Multitask Convolutional Neural Network

08 Jul 2018-pp 1-8
TL;DR: A new technique for tumor detection based on high-level extracted features from convolutional neural networks (CNNs) using the Hough transform technique is introduced.
Abstract: Automatic tumor segmentation is an important and challenging clinical task because tumors have different sizes, shapes, contrasts, and locations. In this paper, we present an automatic instance semantic segmentation method based on deep neural networks (DNNs). The proposed networks are tailored to picture tumors in magnetic resonance imaging (MRI) and computed tomography (CT) images. We present an end-to-end multitask learning architecture comprising three stages, namely detection, segmentation, and classification. This paper introduces a new technique for tumor detection based on high-level extracted features from convolutional neural networks (CNNs) using the Hough transform technique. The detected tumor(s), are segmented with a set of fully connected (FC) layers, and the segmented mask is classified through FCs. The proposed architecture gives promising results on the popular medical image benchmarks. Our framework is generalized in the sense that it can be used in different types of medical images in varied sizes, such as the Liver Tumor Segmentation (LiTS-2017) challenge, and the Brain Tumor Segmentation (BraTS-2016) benchmark.
Citations
More filters
Book ChapterDOI
16 Sep 2018
TL;DR: A new adversarial network, named voxel-GAN, is proposed to mitigate imbalanced data problem in brain tumor semantic segmentation where the majority of voxels belong to a healthy region and few belong to tumor or non-health region.
Abstract: We propose a new adversarial network, named voxel-GAN, to mitigate imbalanced data problem in brain tumor semantic segmentation where the majority of voxels belong to a healthy region and few belong to tumor or non-health region. We introduce a 3D conditional generative adversarial network (cGAN) comprises two components: a segmentor and a discriminator. The segmentor is trained on 3D brain MR or CT images to learn the segmentation label’s in voxel-level, while the discriminator is trained to distinguish a segmentor output, coming from the ground truth or generated artificially. The segmentor and discriminator networks simultaneously train with new weighted adversarial loss to mitigate imbalanced training data issue. We show evidence that the proposed framework is applicable to different types of brain images of varied sizes. In our experiments on BraTS-2018 and ISLES-2018 benchmarks, we find improved results, demonstrating the efficacy of our approach.

20 citations


Cites background from "Instance Tumor Segmentation using M..."

  • ...Examples are cascade training [8,33,36], training with cost-sensitive function [40], such as Dice coefficient loss [12,35,38], and asymmetric similarity loss [16] that modifying the training data distribution with regards to the misclassification cost....

    [...]

Journal ArticleDOI
TL;DR: The experimental results show that by applying the multi-task representation learning framework, StyleNet can achieve a better classification accuracy, the optimized loss function can also bring performance improvement for deep learning models, and the classification effect of StyleNet becomes better as the size of the data set increases.
Abstract: Clothing images vary in style and everyone has a different understanding of style. Even with the current popular deep learning methods, it is difficult to accurately classify style labels. A style representation learning model based on the deep neural networks called StyleNet is proposed in this paper. We adopt a multi-task learning framework to build the model and make full use of various types of label information to represent the clothing images in a finer-grained manner. Due to the semantic abstraction of image labels in the current fashion field, using a simple migration learning method cannot fully meet the requirements of clothing image classification. An objective function optimization method is put forward by combining the distance confusion loss and the traditional cross entropy loss to improve the accuracy of StyleNet further. The experimental results show that by applying the multi-task representation learning framework, StyleNet can achieve a better classification accuracy, the optimized loss function can also bring performance improvement for deep learning models, and the classification effect of StyleNet becomes better as the size of the data set increases. In order to verify the robustness and effectiveness of the deep learning method in StyleNet, we also apply a Faster R-CNN module to pre-process the clothing images and use the result as the input of StyleNet. The classifier can only get a limited performance improvement, which is negligible compared with the methods proposed in this paper of increasing the depth of the neural network and optimizing the loss function.

9 citations


Cites background from "Instance Tumor Segmentation using M..."

  • ...[22] presented an end-to-end multitask learning architecture and it was tailored to picture tumors in magnetic resonance imaging (MRI) and computed tomography (CT) images....

    [...]

Book ChapterDOI
20 Jan 2020
TL;DR: In this article, the authors reviewed performance of different 2D and 3D CNN models for liver image segmentation and analyzed studies that used variants of ResNet, FCN, U-Net, and 3d U-net along with various evaluation metrics.
Abstract: Image segmentation is one of the most popular methods in automated computational medical image analysis. Precise and significant semantic segmentation on abdominal Magnetic Resonance Imaging (MRI) and Computer Tomography (CT) volume images, specifically liver segmentation has a lot of contribution towards clinical decision making for patient treatment. Apart from the many state-of-the-art methods, different cutting-edge deep learning architectures are being developed rapidly. Those architectures are performing better segmentation while at the same time outperforming other state-of- the-art models. Different deep learning models perform differently based on cell types, organ shapes and the type of medical imaging (i.e. CT, MRI). Starting from 2D convolutional networks (CNN), many variations of 3D convolutional neural network architectures have achieved significant results in segmentation tasks on MRI and CT images. In this paper, we review performance of different 2D and 3D CNN models for liver image segmentation. We also analyzed studies that used variants of ResNet, FCN, U-Net, and 3D U-Net along with various evaluation metrics. How these variants of 2D and 3D CNN models enhance the performance against its state-of-the-art architectures are demonstrated in the results section. Besides the architectural development, each year, new segmentation and other biomedical challenges are being offered. These challenges come with their own datasets. Apart from challenges, some datasets are provided and supported by different organizations. Use of such data set can be found in this study. Moreover, this review of reported results, along with different datasets and architectures will help future researchers in liver semantic segmentation tasks. Furthermore, our listing of results will give the insight to analyze the use of different metrics for the same organs with the change in performances.

7 citations

Journal ArticleDOI
TL;DR: A convolutional neural network with multi‐task learning, trained with SLR transformation pairs, that is capable of simultaneously generating RF and two channels of gradient waveforms, given the desired spatially two‐dimensional excitation profiles is presented.
Abstract: Modern MRI systems usually load the predesigned RFs and the accompanying gradients during clinical scans, with minimal adaption to the specific requirements of each scan. Here, we describe a neural network-based method for real-time design of excitation RF pulses and the accompanying gradients' waveforms to achieve spatially two-dimensional selectivity. Nine thousand sets of radio frequency (RF) and gradient waveforms with two-dimensional spatial selectivity were generated as the training dataset using the Shinnar-Le Roux (SLR) method. Neural networks were created and trained with five strategies (TS-1 to TS-5). The neural network-designed RF and gradients were compared with their SLR-designed counterparts and underwent Bloch simulation and phantom imaging to investigate their performances in spin manipulations. We demonstrate a convolutional neural network (TS-5) with multi-task learning to yield both the RF pulses and the accompanying two channels of gradient waveforms that comply with the SLR design, and these design results also provide excitation spatial profiles comparable with SLR pulses in both simulation (normalized root mean square error [NRMSE] of 0.0075 ± 0.0038 over the 400 sets of testing data between TS-5 and SLR) and phantom imaging. The output RF and gradient waveforms between the neural network and SLR methods were also compared, and the joint NRMSE, with both RF and the two channels of gradient waveforms considered, was 0.0098 ± 0.0024 between TS-5 and SLR. The RF and gradients were generated on a commercially available workstation, which took ~130 ms for TS-5. In conclusion, we present a convolutional neural network with multi-task learning, trained with SLR transformation pairs, that is capable of simultaneously generating RF and two channels of gradient waveforms, given the desired spatially two-dimensional excitation profiles.

5 citations

Proceedings ArticleDOI
13 Mar 2019
TL;DR: The proposed 3DJoinGANs is able to mitigate imbalanced data problems and improve segmentation results due to oversampling and training through a joint distribution of cross-domain images.
Abstract: Inspired by the recent success of generative adversarial networks (GANs), we propose a multi-agent GANs, named 3DJoinGANs, for handling imbalanced training data for the task of semantic segmentation. Our proposed method comprises two conditional GANs with four agents: a couple segmentors and a couple discriminators. The proposed framework learns a joint distribution of magnetic resonance images (MRI) and computed tomography images (CT) from different brain diseases by enforcing a weight-sharing constraint. While the first segmentor is trained on 3D multi-model MRI to learn semantic segmentation of a brain tumor(s), the first discriminator classifies whether predicted output by segmentor is real or fake. On the other hand, the second segmentor takes 3D multi modal CT images to learn segmentation of brain stroke lesions, and the second discriminator classifies between a segmented output by segmentor and a ground truth data annotated by an expert. We investigate, the 3DJoinGANs is able to mitigate imbalanced data problems and improve segmentation results due to oversampling and training through a joint distribution of cross-domain images.The proposed architecture has shown promising performance on the ISLES-2018 benchmark for segmentation of 3D multi modal ischemic stroke lesions and semantic segmentation of 3D multi modal brain tumors from the BraTS-2018 challenge.

3 citations

References
More filters
Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations


"Instance Tumor Segmentation using M..." refers background or methods in this paper

  • ...[21], batch normalization has improved overall optimization and gradient issues....

    [...]

  • ...Of late, several popular techniques have been developed for normalization such as batch normalization [21] and max norm...

    [...]

Journal ArticleDOI
TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

28,073 citations


Additional excerpts

  • ...by applying the Canny edge detector [25])....

    [...]

Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations