scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation

TL;DR: This work proposes a deep convolution encoder-decoder structure with fusion layers to incorporate different modalities of MRI data and exploits convolutional LSTM to model a sequence of 2D slices, and jointly learn the multi-modalities and convLSTM in an end-to-end manner.
Abstract: Deep learning models such as convolutional neural network have been widely used in 3D biomedical segmentation and achieve state-of-the-art performance. However, most of them often adapt a single modality or stack multiple modalities as different input channels, which ignores the correlations among them. To leverage the multi-modalities, we propose a deep convolution encoder-decoder structure with fusion layers to incorporate different modalities of MRI data. In addition, we exploit convolutional LSTM (convLSTM) to model a sequence of 2D slices, and jointly learn the multi-modalities and convLSTM in an end-to-end manner. To avoid converging to the certain labels, we adopt a re-weighting scheme and two phase training to handle the label imbalance. Experimental results on BRATS-2015 [13] show that our method outperforms state-of-the-art biomedical segmentation approaches.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a novel hybrid densely connected UNet (H-DenseUNet), which consists of a 2-D Dense UNet for efficiently extracting intra-slice features and a 3-D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation.
Abstract: Liver cancer is one of the leading causes of cancer death. To assist doctors in hepatocellular carcinoma diagnosis and treatment planning, an accurate and automatic liver and tumor segmentation method is highly demanded in the clinical practice. Recently, fully convolutional neural networks (FCNs), including 2-D and 3-D FCNs, serve as the backbone in many volumetric image segmentation. However, 2-D convolutions cannot fully leverage the spatial information along the third dimension while 3-D convolutions suffer from high computational cost and GPU memory consumption. To address these issues, we propose a novel hybrid densely connected UNet (H-DenseUNet), which consists of a 2-D DenseUNet for efficiently extracting intra-slice features and a 3-D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation. We formulate the learning process of the H-DenseUNet in an end-to-end manner, where the intra-slice representations and inter-slice features can be jointly optimized through a hybrid feature fusion layer. We extensively evaluated our method on the data set of the MICCAI 2017 Liver Tumor Segmentation Challenge and 3DIRCADb data set. Our method outperformed other state-of-the-arts on the segmentation results of tumors and achieved very competitive performance for liver segmentation even with a single model.

1,561 citations


Cites methods from "Joint Sequence Learning and Cross-M..."

  • ...To solve this problem, some researchers proposed to use tri-planar schemes or RNN to probe the 3D contexts [4], [20], [21]....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive review of the current state-of-the-art in medical image analysis using deep convolutional networks is presented and the challenges and potential of these techniques are also highlighted.
Abstract: The science of solving clinical problems by analyzing images generated in clinical practice is known as medical image analysis. The aim is to extract information in an effective and efficient manner for improved clinical diagnosis. The recent advances in the field of biomedical engineering has made medical image analysis one of the top research and development area. One of the reason for this advancement is the application of machine learning techniques for the analysis of medical images. Deep learning is successfully used as a tool for machine learning, where a neural network is capable of automatically learning features. This is in contrast to those methods where traditionally hand crafted features are used. The selection and calculation of these features is a challenging task. Among deep learning techniques, deep convolutional networks are actively used for the purpose of medical image analysis. This include application areas such as segmentation, abnormality detection, disease classification, computer aided diagnosis and retrieval. In this study, a comprehensive review of the current state-of-the-art in medical image analysis using deep convolutional networks is presented. The challenges and potential of these techniques are also highlighted.

277 citations

Journal ArticleDOI
TL;DR: A novel fully automatic segmentation method from MRI data containing in vivo brain gliomas that can not only localize the entire tumor region but can also accurately segment the intratumor structure is presented.
Abstract: Brain tumors can appear anywhere in the brain and have vastly different sizes and morphology. Additionally, these tumors are often diffused and poorly contrasted. Consequently, the segmentation of brain tumor and intratumor subregions using magnetic resonance imaging (MRI) data with minimal human interventions remains a challenging task. In this paper, we present a novel fully automatic segmentation method from MRI data containing in vivo brain gliomas. This approach can not only localize the entire tumor region but can also accurately segment the intratumor structure. The proposed work was based on a cascaded deep learning convolutional neural network consisting of two subnetworks: (1) a tumor localization network (TLN) and (2) an intratumor classification network (ITCN). The TLN, a fully convolutional network (FCN) in conjunction with the transfer learning technology, was used to first process MRI data. The goal of the first subnetwork was to define the tumor region from an MRI slice. Then, the ITCN was used to label the defined tumor region into multiple subregions. Particularly, ITCN exploited a convolutional neural network (CNN) with deeper architecture and smaller kernel. The proposed approach was validated on multimodal brain tumor segmentation (BRATS 2015) datasets, which contain 220 high-grade glioma (HGG) and 54 low-grade glioma (LGG) cases. Dice similarity coefficient (DSC), positive predictive value (PPV), and sensitivity were used as evaluation metrics. Our experimental results indicated that our method could obtain the promising segmentation results and had a faster segmentation speed. More specifically, the proposed method obtained comparable and overall better DSC values (0.89, 0.77, and 0.80) on the combined (HGG + LGG) testing set, as compared to other methods reported in the literature. Additionally, the proposed approach was able to complete a segmentation task at a rate of 1.54 seconds per slice.

138 citations


Cites methods from "Joint Sequence Learning and Cross-M..."

  • ...For each type of regions, we compute DSC [34], PPV, and sensitivity [35] as quantitative evaluation metrics....

    [...]

Journal ArticleDOI
TL;DR: A new supervised convolutional neural network (CNN) that learns to fuse complementary information for multi-modality medical image analysis with significantly higher foreground detection accuracy and a significantly higher Dice score than the recent PET-CT tumor segmentation methods.
Abstract: The analysis of multi-modality positron emission tomography and computed tomography (PET-CT) images for computer-aided diagnosis applications (e.g., detection and segmentation) requires combining the sensitivity of PET to detect abnormal regions with anatomical localization from CT. Current methods for PET-CT image analysis either process the modalities separately or fuse information from each modality based on knowledge about the image analysis task. These methods generally do not consider the spatially varying visual characteristics that encode different information across different modalities, which have different priorities at different locations. For example, a high abnormal PET uptake in the lungs is more meaningful for tumor detection than physiological PET uptake in the heart. Our aim is to improve the fusion of the complementary information in multi-modality PET-CT with a new supervised convolutional neural network (CNN) that learns to fuse complementary information for multi-modality medical image analysis. Our CNN first encodes modality-specific features and then uses them to derive a spatially varying fusion map that quantifies the relative importance of each modality’s feature across different spatial locations. These fusion maps are then multiplied with the modality-specific feature maps to obtain a representation of the complementary multi-modality information at different locations, which can then be used for image analysis. We evaluated the ability of our CNN to detect and segment multiple regions (lungs, mediastinum, and tumors) with different fusion requirements using a dataset of PET-CT images of lung cancer. We compared our method to baseline techniques for multi-modality image fusion (fused inputs (FSs), multi-branch (MB) techniques, and multi-channel (MC) techniques) and segmentation. Our findings show that our CNN had a significantly higher foreground detection accuracy (99.29%, ${p} ) than the fusion baselines (FS: 99.00%, MB: 99.08%, and TC: 98.92%) and a significantly higher Dice score (63.85%) than the recent PET-CT tumor segmentation methods.

136 citations


Cites background from "Joint Sequence Learning and Cross-M..."

  • ...and other multi-modality CNN research [10], [47], [48], [50]....

    [...]

Journal ArticleDOI
Chen Hao, Zhiguang Qin, Yi Ding, Lan Tian, Zhen Qin 
TL;DR: A novel deep convolutional neural network which combines symmetry have been proposed to automatically segment brain tumors, called Deep Convolutional Symmetric Neural Network (DCSNN), extends DCNN based segmentation networks by adding symmetric masks in several layers.

105 citations

References
More filters
Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations


Additional excerpts

  • ...Different from traditional LSTM [11], convLSTM replaces the matrix multiplication by the convolution operators in state-to-state and input-to-state transitions, which preservers the spatial information for long-term sequences....

    [...]

Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations


"Joint Sequence Learning and Cross-M..." refers methods in this paper

  • ...Each convolution layer uses the kernel size 3 × 3 to produce a set of feature maps, which are further applied by a batch normalization layer [12] and an element-wise rectified-linear non-linearity (ReLU)....

    [...]

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

28,225 citations

Posted Content
TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at this http URL .

19,534 citations


"Joint Sequence Learning and Cross-M..." refers background or methods or result in this paper

  • ...U-Net [20] consists of a contracting path that contains multiple convolutions for downsampling, and a expansive path that has several deconvolution layers to up-sample the features and concatenate the cropped feature maps from the contracting path....

    [...]

  • ...Recent image segmentation models [8, 15, 9, 20] fuse multi-resolution feature maps with the concatenation....

    [...]

  • ...Some methods apply 2D segmentation to 3D biomedical data [20] [5]....

    [...]

  • ...The results show that the proposed model: MME + MRF + CMC and MME + MRF + CMC + convLSTM achieve the best performance and outperform the baseline method U-Net [20]....

    [...]

  • ...Segmentation results of variants of our method and U-Net [20] baseline....

    [...]