scispace - formally typeset
Open AccessJournal ArticleDOI

Convolutional Neural Networks for Inverse Problems in Imaging: A Review

Reads0
Chats0
TLDR
Recent experimental work in convolutional neural networks to solve inverse problems in imaging, with a focus on the critical design decisions is reviewed, including sparsity-based techniques such as compressed sensing.
Abstract
In this article, we review recent uses of convolutional neural networks (CNNs) to solve inverse problems in imaging. It has recently become feasible to train deep CNNs on large databases of images, and they have shown outstanding performance on object classification and segmentation tasks. Motivated by these successes, researchers have begun to apply CNNs to the resolution of inverse problems such as denoising, deconvolution, superresolution, and medical image reconstruction, and they have started to report improvements over state-of-the-art methods, including sparsity-based techniques such as compressed sensing. Here, we review the recent experimental work in these areas, with a focus on the critical design decisions.

read more

Content maybe subject to copyright    Report

85
IEEE SIgnal ProcESSIng MagazInE | November 2017 |
Michael T. McCann, Kyong Hwan Jin,
and Michael Unser
Deep Learning for VisuaL unDerstanDing
1053-5888/17©2017IEEE
Convolutional Neural Networks
for Inverse Problems in Imaging
A review
I
n this article, we review recent uses of convolutional neural
networks (CNNs) to solve inverse problems in imaging. It has
recently become feasible to train deep CNNs on large databas-
es of images, and they have shown outstanding performance on
object classification and segmentation tasks. Motivated by these
successes, researchers have begun to apply CNNs to the resolu-
tion of inverse problems such as denoising, deconvolution, super-
resolution, and medical image reconstruction, and they have
started to report improvements over state-of-the-art methods,
including sparsity-based techniques such as compressed sensing.
Here, we review the recent experimental work in these areas,
with a focus on the critical design decisions:
From where do the training data come?
What is the architecture of the CNN?
How is the learning problem formulated and solved?
We also mention a few key theoretical papers that offer perspec-
tives on why CNNs are appropriate for inverse problems, and we
point to some next steps in the field.
Introduction
The basic ideas underlying the use of CNNs (also known as
ConvNets) for inverse problems are not new. Here, we give a
condensed history of CNNs to provide context to what fol-
lows. For further historical perspective, see [1]; for an acces-
sible introduction to deep neural networks and a summary of
their recent history, see [2]. The CNN architecture was pro-
posed in 1986 [3], and neural networks were developed for
solving inverse imaging problems as early as 1988 [4]. These
approaches, which used networks with few parameters and did
not always include learning, were largely superseded by com-
pressed sensing (or, broadly, convex optimization with regulariza-
tion) approaches in the 2000s. As computer hardware improved,
it became feasible to train larger neural networks, until, in 2012,
Krizhevsky et al. [5] achieved a significant improvement over the
state of the art on the ImageNet classification challenge by using
a graphics processing unit (GPU) to train a CNN with five con-
volutional layers and 60 million parameters on a set of 1.3 mil-
lion images. This work spurred a resurgence of interest in neural
Digital Object Identifier 10.1109/MSP.2017.2739299
Date of publication: 13 November 2017
©Istockphoto.com/zapp2photo

86
IEEE SIgnal ProcESSIng MagazInE | November 2017 |
networks and, specifically, CNNs—not only for computer vision
tasks but also for inverse problems.
The purpose of this article is to summarize the recent works
using CNNs for inverse problems in imaging, i.e., in problems
most naturally formulated as recovering an image from a set
of noisy measurements. This criterion excludes detection, seg-
mentation, classification, quality assessment, etc. We also focus
on CNNs, avoiding other architectures such as recurrent neu-
ral networks, fully connected networks, and stacked denoising
autoencoders. We organized our literature search by application,
selecting topics of broad interest where we could find at least
three peer-reviewed papers from the last ten years. (Much of the
work on the theory and practice of CNNs is posted on the pre-
print server arXiv.org before eventually appearing in traditional
journals. Because of the lack of peer review on arXiv.org, we
have preferred not to cite these papers, except in cases where we
are trying to illustrate a very recent trend or future direction for
the field.) The resulting applications and references are summa-
rized in Table 1. The aim of this constrained scope is to allow
us to draw meaningful generalizations from the surveyed works.
Background
We begin by introducing inverse problems and contrasting the
traditional approach to solving them with a learning-based
approach. For a textbook treatment of inverse problems, see
[28]. Throughout the section, we use X-ray computed tomogra-
phy (CT) as a running example, and Figure 1 shows images of
the various mathematical quantities we mention.
Learning for inverse problems in imaging
Mathematically speaking, an imaging system is an operator
:H
X Y" that acts on an image
,x X! to create a vector of
measurements
,y Y! with
{} .Hx y
= The underlying func-
tion/vector spaces are
the space,
,X of acceptable images, which can be two-
dimensional (2-D), three-dimensional (3-D), or even
3-D+time, with its values representing a physical quantity
of interest, such as X-ray attenuation or concentration of
fluorophores
the space, ,Y of measurement vectors that depends on the
imaging operator and could include images (discrete arrays
of pixels), Fourier samples, line integrals, etc.
We typically consider
x
to be a continuous object (function of
space), while
y
is usually discrete:
.
RY
M
= For example, in
X-ray CT,
x
is an image representing X-ray attenuations,
H
rep-
resents the physics of the X-ray source and detector, and
y
is the
measured sinogram (see Figure 1).
In an inverse imaging problem, we aim to develop a recon-
struction algorithm (which is also an operator),
: ,R Y X"
to recover the original image,
,x
from the measurements,
.y
The dominant approach for reconstruction, which we call the
objective function approach, is to model
H
and recover an
estimate of
x
from
y
by
argminRy fH
x
obj
X
=
!
(1)
where
:H XY
" is the system model, which is usually lin-
ear, and
:f RY Y "#
+
is an appropriate measure of error.
H
H
T
R
reg
CNN
θ
CNN
θ
?
[25], [27] [25]
?
x
y
H
T
{y }
H
–1
{y }
H
–1
x
CNN
θ
CNN
θ
FIGURE 1. A block diagram of image reconstruction methods, using images from X-ray CT as examples. An image,
,x
creates measurements,
,y
that can
be used to estimate
x
in a variety of ways. The traditional approach is to apply a direct inversion,
,H
1-
u
which is artifact prone in the sparse-measurement
case (note the stripes in the reconstruction). The current state of the art is a regularized reconstruction,
,R
reg
written, in general, in (2). Several recent
works apply CNNs to the result of the direct inversion or an iterative reconstruction, but it might also be reasonable to use as input the measurements
themselves or the back projected measurements.
Table 1. Reviewed applications and associated references.
Denoising Deconvolution Superresolution MRI CT
[6]–[11] [10], [12]–[14] [9], [15]–[20] [21]–[23] [24]–[27]

87
IEEE SIgnal ProcESSIng MagazInE | November 2017 |
Continuing the CT example,
H
would be a discretization of the
X-ray transform (such as MATLAB’s radon), and
f
could
be the Euclidean distance,
{} .Hx y
2
-
For many appli-
cations, decades of engineering have gone into developing
a fast and reasonably accurate inverse operator,
,H
1-
u
so (1)
is easily approximated by
{}
{}
;Ry
Hy
1
obj
=
-
u
for CT,
H
1-
u
is the filtered back projection (FBP) algorithm. An important,
related operator is the back projection,
: ,H Y X
T
" which can
be interpreted as the simplest way to put measurements back
into the image domain (see Figure 1).
These direct inverses begin to show significant artifacts
when the number or quality of the measurements decreases,
either because the underlying discretization breaks down or
because the inversion of (1) becomes ill
posed (lacking a solution, lacking a unique
solution, or being unstable with respect to the
measurements). Unfortunately, in many real-
world problems, measurements are costly (in
terms of time, or, e.g., X-ray damage to the
patient), which motivates us to collect as few
as possible. To reconstruct from sparse or
noisy measurements, it is often better to use a
regularized formulation,
,,
argminRy fHxy gx
x
reg
X
=+
!
^^hh
" "
, ,
(2)
where
:g RX "
+
is a regularization func-
tional that promotes solutions that match our
prior knowledge of
x
and, simultaneously,
makes the problem well posed. For CT,
g
could be the total variation (TV) regularization, which penalizes
large gradients in
.x
From this perspective, the challenge of solving an inverse
problem is designing and implementing (2) for a specific appli-
cation. Much effort has gone into designing general-purpose
regularizers and minimization algorithms. For example, com-
pressed sensing [29] provides sparsity-promoting regularizers.
Nonetheless, in the worst case, a new application necessitates
developing accurate and efficient
H
,
,g
and
,f
along with a
minimization algorithm.
An alternative to the objective function approach is called
the learning approach, where a training set of ground-truth
images and their corresponding measurements,
{( ,)
},
xy
nn
n
N
1=
is known. A parametric reconstruction algorithm,
,R
learn
is
then learned by solving
,(
),argminRfxR
yg
,R
n
N
nn
1
learn
i
=+
!i
i
H
=
i
^h
"
,
/
(3)
where
H
is the set of all possible parameters,
:f RX X "#
+
is a measure of error, and
:g R"H
+
is a regularizer on the
parameters with the aim of avoiding overfitting. Once the
learning step is complete,
R
learn
can then be used to reconstruct
a new image from its measurements.
To summarize, in the objective function approach, the
reconstruction function is itself a regularized minimization
problem, while in the learning approach, the solution of a regu-
larized minimization problem is a parametric function that can
be used to solve the inverse problem. The learning formulation
is attractive because it overcomes many of the limitations of
the objective function approach: there is no need to handcraft
the forward model, cost function, regularizer, and optimizer
from (2). On the other hand, the learning approach requires a
training set, and the minimization (3) is typically more dif-
ficult than (2) and requires a problem-dependant choice of
,f
,g
and the class of functions described by
R
and
.H
Finally, we note that the learning and objective function
approaches describe a spectrum rather than a dichotomy. In
fact, the learning formulation is strictly more general, includ-
ing the objective function formulation as a
special case. As we will discuss further in
the section “Network Architecture, which
(if any) aspects of the objective formula-
tion approach to retain is a critical choice
in the design of learning-based approaches
to inverse problems in imaging.
CNNs
Our focus here is the formulation of (3) using
CNNs. Using a CNN means, roughly, fixing
the set of functions,
R
i
, to be a sequence of
(linear) filtering operations alternating with
simple nonlinear operations. This class of
functions is parametrized by the values of the
filters used (also known as filter weights),
and these filter weights are the parameters
over which the minimization occurs. For illustration, Figure 2
shows a typical CNN architecture.
We will discuss the theoretical motivations for using CNNs
as the learning architecture for inverse problems in the sec-
tion “Theory,” but we mention some practical advantages
here. First, the forward operation of a CNN consists of (usu-
ally small) convolutions and simple, pointwise nonlinear func-
tions. This means that, once training is complete, the execution
of
R
learn
is very fast and amenable to hardware acceleration
on GPUs. Second, the gradient of (3) is computable via the
chain rule, and these gradients again involve small convolu-
tions, meaning that the parameters can be learned efficiently
via gradient descent.
When the first CNN-based method entered the ImageNet
Large-Scale Visual Recognition Challenge in 2012 [5], its
error rate on the object localization and classification task was
15.3%, as compared to an error rate 26.2% for the next closest
method and 25.8% for the 2011 winner. In subsequent com-
petitions (2013–2016), the majority of the entries (and all of
the winners) were CNN based and continued to improve sub-
stantially, with the 2016 winner achieving an error rate of just
2.99%. Can we expect such large gains in inverse problems?
That is, can we expect denoising results to improve by an order
of magnitude (20 dB) in the next few years? Next, we answer
this question by surveying the results reported by recent CNN-
based approaches to image reconstruction.
In the objective
function approach, the
reconstruction function
is itself a regularized
minimization problem,
while in the learning
approach, the solution of
a regularized minimization
problem is a parametric
function that can be
used to solve the
inverse problem.

88
IEEE SIgnal ProcESSIng MagazInE | November 2017 |
Current state of performance
Of the inverse problems we review here, denoising provides the
best look at recent trends in results because there are standard
experiments that appear in most papers. Work on CNN-based
denoising from 2009 [6] showed an average peak signal-to-noise
ratio (PSNR) of 28.5 on the Berkeley segmentation data set, a
less than 1-dB improvement over contemporary wavelet and
Markov random field-based approaches. For comparison, one
very recent denoising work [11] reported a 0.7-dB improvement
on a similar experiment, which remains less than 1 dB better than
contemporary non-CNN methods (including block-matching and
3-D filtering, which had remained the state of the art for years).
As another point of reference, in 2012, one CNN approach [7]
reported an average PSNR of 30.2 dB on a set of standard test
images (Lena, peppers, etc.), less than 0.1 dB better than com-
parisons, and another [8] reported an average of 30.5 dB on the
same experiment. Recently, [11] achieved an average of 30.4 dB
under the same conditions. One important perspective on these
denoising results is that the CNN is learning the distribution of
natural images (or, equivalently, is learning a regularization).
Such a CNN could be reused inside an iterative optimization as a
proximal operator to enforce this learned regularization for any
inverse problem.
The trends are similar in deblurring and superresolution,
although experiments are more varied and therefore harder to
compare. For deblurring, [12] showed around a 1-dB PSNR
improvement over comparison methods, and [13] showed a
further improvement of approximately 1 dB. For superresolu-
tion, work from 2014 [15] reported a less than 0.5-dB improve-
ment in PSNR over comparisons. During the next two years,
[16] and [19] both reported a 0.5-dB PSNR increase over this
baseline. Even more recent work, [30], improves on the 2014
work by around 1.5dB in PSNR. For video superresolution,
[18] improves on non-CNN-based methods by about 0.5 dB
PSNR and [20] improves upon that result by another 0.5 dB.
For inverse problems in medical imaging, direct com-
parison between works is impossible due to the wide vari-
ety of experimental setups. A 2013 CNN-based work [24]
shows improvement in limited-view CT reconstruction over
direct methods and unregularized iterative methods but does
not compare to regularized iterative methods. In 2015, [25]
showed (in full-view CT) an improvement of several decibels
in signal-to-noise ratio (SNR) over direct reconstruction and
around 1-dB improvement over regularized iterative recon-
struction. Recently, [26] showed about 0.5-dB improvement in
PSNR over TV-regularized reconstruction, while [27] showed
a larger (1–4 dB) improvement in SNR over a different TV-
regularized method (Figure 3). In magnetic resonance imaging
(MRI), [22] demonstrates performance equal to the state of the
art, with advantages in running time.
Do these improvements matter? CNN-based methods have
not, so far, had the profound impact on inverse problems that
they have had for object classification. The difference between
30 and 30.5 dB is impossible to see by eye. On the other hand,
R
learn
{x } = c
3
°
T (c
2
°
T (c
1
°
x + b
1
) + b
2
) + b
3
Architecture
R
θ
x
TT
H
W
C
H
W
C
3
64 64 3
256
256
256
256
256
256
256
256
c
1
b
1
c
2
c
3
3
3
3
3
3
3
3
3
1
1
b
2
b
3
1
1
1
1
3
64
64
64
64
64
64
y
Objective Function
Filters and Biases
f (
.
) =
.
2
2
FIGURE 2. An illustration of a typical CNN architecture for 256
2
pixel RGB images, including the objective function used for training.
()
T $ is the rectified
linear unit function (point-wise nonlinear function). The symbol
%
denotes a 2-D convolution. The convolutions in each layer are described by a four-
dimensional tensor representing a stack of 3-D filters.

89
IEEE SIgnal ProcESSIng MagazInE | November 2017 |
these improvements occur in heavily studied fields: we have
been denoising the Lena image since the 1970s. Furthermore,
CNNs offer some unique advantages over many traditional
methods. The design of the CNN architecture can be more or
less decoupled from the application at hand and reused from
problem to problem. They can also be expanded in straightfor-
ward ways as computer memory grows, and there is some evi-
dence that larger networks lead to better performance. Finally,
once trained, running the model is fast (dozens of convolutions
per image, usually less than 1 s). This means that CNN-based
methods can be attractive in terms of running time even if they
do not improve upon state-of-the-art performance.
Designing CNNs for inverse problems
In this section, we survey the design decisions needed to devel-
op CNN-based approaches for inverse problems in imaging.
We organize the section around the learning equation as sum-
marized in Figure 4, first describing how the training set is
created, then how the network architecture is designed, and,
finally, how the learning problem is formulated and solved.
Training set
Learning requires a suitable training set, i.e., the (input, out-
put) pairs from which the CNN will learn. In a typical learning
problem, training outputs are provided by some oracle label-
ing a set of inputs. For example, in object classification, a set
of human graders might view a large number of images and
provide annotations for each. In the inverse problem setting,
this is considerably more difficult because no such oracle exists.
For example, in X-ray CT, to generate a training set, we would
need to image a large number of physical phantoms for which
wehave exact 3-D models, which is not feasible in practice. The
choice of the training set also constrains the network architec-
ture because the input and output of the network must match the
dimensions of
y
n
and
,x
n
respectively.
Generating training data
In some cases, generating training data is straightforward
because the forward model we aim to invert is known exactly
and easily computable. In denoising, training data are generated
by corrupting images with noise; the noisy image then serves as
training input and the clean image as the training output, as in,
e.g., [6] and [7]. Or, the noise itself can serve as the oracle
output, in a scheme called residual learning [11], [23]. Super-
resolution follows the same pattern, where training pairs are eas-
ily generated by downsampling, as in, e.g., [19]. The same is true
Ground Truth FBP SNR 13.43 TV SNR 24.89 FBP ConvNet SNR 28.53
(a) (b) (c) (d)
FIGURE 3. An example of X-ray CT reconstructions. (a) The ground truth comes from an FBP reconstruction using 1,000 views. (b)–(d) are reconstructions from
just 50 views using FBP, a regularized reconstruction, and from a CNN-based approach. The CNN-based reconstruction preserves more of the texture present in the
ground truth and results in a significant increase in SNR. (Images are reproduced with permission from [27]).
R
learn =
arg min
R
θ
, θ
Θ
a
N
n = 1
f (x
n
, R
θ
(y
n
)) + g (θ )
A) Training Set
D) Optimization
C) Cost Function and
Regularization
B) Network Architecture
FIGURE 4. The learning equation, which we use to organize the parts of
the section “Designing CNNs for Inverse Problems”.

Figures
Citations
More filters
Journal ArticleDOI

An overview of deep learning in medical imaging focusing on MRI

TL;DR: In this article, the authors provide a short overview of recent advances and some associated challenges in machine learning applied to medical image processing and image analysis, and provide a starting point for people interested in experimenting and perhaps contributing to the field of machine learning for medical imaging.
Journal ArticleDOI

An overview of deep learning in medical imaging focusing on MRI

TL;DR: This paper indicates how deep learning has been applied to the entire MRI processing chain, from acquisition to image retrieval, from segmentation to disease prediction, and provides a starting point for people interested in experimenting and contributing to the field of deep learning for medical imaging.
Journal ArticleDOI

Machine learning for data-driven discovery in solid Earth geoscience

TL;DR: Solid Earth geoscience is a field that has very large set of observations, which are ideal for analysis with machine-learning methods, and how these methods can be applied to solid Earth datasets is reviewed.
Journal ArticleDOI

Deep learning on image denoising: An overview.

TL;DR: A comparative study of deep techniques in image denoising by classifying the deep convolutional neural networks for additive white noisy images, the deep CNNs for real noisy images; the deepCNNs for blind Denoising and the deep network for hybrid noisy images.
Journal ArticleDOI

Solving inverse problems using data-driven models

TL;DR: This survey paper aims to give an account of some of the main contributions in data-driven inverse problems.
References
More filters
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Journal ArticleDOI

ImageNet classification with deep convolutional neural networks

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Convolutional neural networks for inverse problems in imaging" ?

In this article, the authors review recent uses of convolutional neural networks ( CNNs ) to solve inverse problems in imaging. Motivated by these successes, researchers have begun to apply CNNs to the resolution of inverse problems such as denoising, deconvolution, superresolution, and medical image reconstruction, and they have started to report improvements over state-of-the-art methods, including sparsity-based techniques such as compressed sensing. Here, the authors review the recent experimental work in these areas, with a focus on the critical design decisions: ■ 

The most concrete perspective on CNNs as generalizations of established algorithms comes from the idea of unrolling, which the authors discussed in the section “Network Architecture.” 

The patch size also has important ramifications for the performance of the network and is linked to its architecture, with larger filters and deeper networks requiring larger training patches [17]. 

The emerging paradigm is to learn to reconstruct from sparse measurements, using reconstructions from fully sampled measurements to train. 

To avoid a small set of images dominating the error during training, it is best to scale the dynamic range of the training set [23], [27]. 

demonstrating generalization between data sets (where the network learns on one data set, but is evaluated on another) can help improve confidence in the results by showing that the performance of the network is not dependent on some systematic bias of the data set. 

CNNs have so far been applied mostly to inverse problems where the measurements take the form of an image and the measurement model is simple, and less so for CT and MRI, which have relatively more complicated models. 

While this critique can be made of any approach to inverse problems, it is especially relevant for CNNs because they are often treated as black boxes and because the reconstructions they generate are plausible-looking by design, hiding areas where the algorithm is less sure of the result. 

Most of the surveyed works involve using a CNN to correct the artifacts created by direct or iterative methods, where it remains an open question what is the best such prereconstruction method. 

For comparison, one very recent denoising work [11] reported a 0.7-dB improvement on a similar experiment, which remains less than 1 dB better than contemporary non-CNN methods (including block-matching and 3-D filtering, which had remained the state of the art for years). 

As another point of reference, in 2012, one CNN approach [7] reported an average PSNR of 30.2 dB on a set of standard test images (Lena, peppers, etc.), less than 0.1 dB better than comparisons, and another [8] reported an average of 30.5 dB on the same experiment. 

One creative approach is to build the inverse operator into the network architecture as in [22], where the network can compute inverse Fourier transforms. 

A more general perspective is that nearly all state-of-the-art iterative reconstruction algorithms alternate between linear steps and pointwise nonlinear steps, so it follows that CNNs should be able to perform similarly well given appropriate training.