U-Net: Convolutional Networks for Biomedical Image Segmentation

Home
/
Papers
/
U-Net: Convolutional Networks for Biomedical Image Segmentation

Posted Content•

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger¹, Philipp Fischer¹, Thomas Brox¹•Institutions (1)

18 May 2015-arXiv: Computer Vision and Pattern Recognition-

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

read less

Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at this http URL .

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

[...]

Vijay Badrinarayanan¹, Alex Kendall¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

01 Dec 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.

...read moreread less

Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .

...read moreread less

13,468 citations

Posted Content•

Image-to-Image Translation with Conditional Adversarial Networks

[...]

Phillip Isola¹, Jun-Yan Zhu¹, Tinghui Zhou¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

21 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

...read moreread less

11,127 citations

Posted Content•

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou¹, Iasonas Kokkinos², Kevin Murphy¹, Alan L. Yuille³ - Show less +1 more•Institutions (3)

Google¹, University College London², Johns Hopkins University³

02 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.

...read moreread less

Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

...read moreread less

10,120 citations

Posted Content•

Fully Convolutional Networks for Semantic Segmentation

[...]

Jonathan Long¹, Evan Shelhamer¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

14 Nov 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.

...read moreread less

Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.

...read moreread less

9,803 citations

Cites background from "U-Net: Convolutional Networks for B..."

...es features for fusion and captures context with global pooling. The “deconvolutional network” approach of [42] restores resolution by proposals, stacks of learned deconvolution, and unpooling. U-Net [43] combines skip layers and learned deconvolution for pixel labeling of microscopy images. The dilation architecture of [44] makes 3 thorough use of dilated convolution for pixel-precise output without ...
[...]

Journal Article•DOI•

A survey on deep learning in medical image analysis

[...]

Geert Litjens¹, Thijs Kooi¹, Babak Ehteshami Bejnordi¹, Arnaud Arindra Adiyoso Setio¹, Francesco Ciompi¹, Mohsen Ghafoorian¹, Jeroen van der Laak¹, Bram van Ginneken¹, Clara I. Sánchez¹ - Show less +5 more•Institutions (1)

Radboud University Nijmegen¹

01 Dec 2017-Medical Image Analysis

TL;DR: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year, to survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks.

...read moreread less

8,730 citations

Cites background or methods from "U-Net: Convolutional Networks for B..."

...An example is the lastic deformations that were applied in the original U-Net paper Ronneberger et al., 2015 )....
[...]
...Ronneberger et al. (2015) took the idea of the fCNN one step further and proposed the U-net architecture, comprising a ‘regu- lar’ fCNN followed by an upsampling part where ‘up’-convolutions are used to increase the image size, coined contractive and expan- sive paths....
[...]
...The most well-known, in medical image analysis, of these ovel CNN architectures is U-net, published by Ronneberger et al. 2015) ( Section 2.4.3 )....
[...]
...The most well-known of these novel CNN architectures is U-net, published by Ronneberger et al. (2015) (section 2....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Posted Content•

Caffe: Convolutional Architecture for Fast Feature Embedding

[...]

Yangqing Jia¹, Evan Shelhamer², Jeff Donahue², Sergey Karayev², Jonathan Long², Ross Girshick², Sergio Guadarrama², Trevor Darrell² - Show less +4 more•Institutions (2)

Google¹, University of California, Berkeley²

20 Jun 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

...read moreread less

12,531 citations

Posted Content•

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[...]

Kaiming He¹, Xiangyu Zhang², Shaoqing Ren¹, Jian Sun¹•Institutions (2)

Microsoft¹, Xi'an Jiaotong University²

06 Feb 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.

...read moreread less

Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

...read moreread less

11,866 citations

Proceedings Article•DOI•

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[...]

Kaiming He¹, Xiangyu Zhang², Shaoqing Ren¹, Jian Sun¹•Institutions (2)

Microsoft¹, Xi'an Jiaotong University²

07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

...read moreread less

11,732 citations

Proceedings Article•DOI•

Caffe: Convolutional Architecture for Fast Feature Embedding

[...]

Yangqing Jia¹, Evan Shelhamer², Jeff Donahue², Sergey Karayev², Jonathan Long², Ross Girshick², Sergio Guadarrama², Trevor Darrell² - Show less +4 more•Institutions (2)

Google¹, University of California, Berkeley²

03 Nov 2014

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

...read moreread less

10,161 citations

"U-Net: Convolutional Networks for B..." refers methods in this paper

...We provide the full Caffe[6]-based implementation and the trained networks4....
[...]
...We provide the full Caffe[6]-based implementation and the trained networks(4)....
[...]
...The input images and their corresponding segmentation maps are used to train the network with the stochastic gradient descent implementation of Caffe [6]....
[...]

Posted Content•

Fully Convolutional Networks for Semantic Segmentation

[...]

Jonathan Long¹, Evan Shelhamer¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

14 Nov 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.

...read moreread less

9,803 citations

"U-Net: Convolutional Networks for B..." refers background or methods in this paper

...In this paper, we build upon a more elegant architecture, the so-called “fully convolutional network” [9]....
[...]
...The main idea in [9] is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators....
[...]