What is the concrete perspective on CNNs?

The most concrete perspective on CNNs as generalizations of established algorithms comes from the idea of unrolling, which the authors discussed in the section “Network Architecture.”

What is the importance of patch size in a training set?

The patch size also has important ramifications for the performance of the network and is linked to its architecture, with larger filters and deeper networks requiring larger training patches [17].

What is the emerging paradigm for learning from sparse measurements?

The emerging paradigm is to learn to reconstruct from sparse measurements, using reconstructions from fully sampled measurements to train.

What is the way to avoid a small set of images dominating the error during training?

To avoid a small set of images dominating the error during training, it is best to scale the dynamic range of the training set [23], [27].

What can be done to improve the confidence in the results of a CNN?

demonstrating generalization between data sets (where the network learns on one data set, but is evaluated on another) can help improve confidence in the results by showing that the performance of the network is not dependent on some systematic bias of the data set.

What are the main areas of research that have been applied to inverse problems?

CNNs have so far been applied mostly to inverse problems where the measurements take the form of an image and the measurement model is simple, and less so for CT and MRI, which have relatively more complicated models.

Why is this critique of CNNs important?

While this critique can be made of any approach to inverse problems, it is especially relevant for CNNs because they are often treated as black boxes and because the reconstructions they generate are plausible-looking by design, hiding areas where the algorithm is less sure of the result.

What is the method for correcting artifacts created by direct or iterative?

Most of the surveyed works involve using a CNN to correct the artifacts created by direct or iterative methods, where it remains an open question what is the best such prereconstruction method.

What is the way to build the inverse operator into the network architecture?

One creative approach is to build the inverse operator into the network architecture as in [22], where the network can compute inverse Fourier transforms.

What is the general perspective on CNNs?

A more general perspective is that nearly all state-of-the-art iterative reconstruction algorithms alternate between linear steps and pointwise nonlinear steps, so it follows that CNNs should be able to perform similarly well given appropriate training.

(Open Access) Convolutional Neural Networks for Inverse Problems in Imaging: A Review (2017) | Michael T. McCann

Q: What have the authors contributed in "Convolutional neural networks for inverse problems in imaging" ?

In this article, the authors review recent uses of convolutional neural networks ( CNNs ) to solve inverse problems in imaging. Motivated by these successes, researchers have begun to apply CNNs to the resolution of inverse problems such as denoising, deconvolution, superresolution, and medical image reconstruction, and they have started to report improvements over state-of-the-art methods, including sparsity-based techniques such as compressed sensing. Here, the authors review the recent experimental work in these areas, with a focus on the critical design decisions: ■

IEEE SIgnal ProcESSIng MagazInE | November 2017 |

Michael T. McCann, Kyong Hwan Jin,

and Michael Unser

Deep Learning for VisuaL unDerstanDing

Convolutional Neural Networks

for Inverse Problems in Imaging

A review

n this article, we review recent uses of convolutional neural

networks (CNNs) to solve inverse problems in imaging. It has

recently become feasible to train deep CNNs on large databas-

es of images, and they have shown outstanding performance on

object classification and segmentation tasks. Motivated by these

successes, researchers have begun to apply CNNs to the resolu-

tion of inverse problems such as denoising, deconvolution, super-

resolution, and medical image reconstruction, and they have

started to report improvements over state-of-the-art methods,

including sparsity-based techniques such as compressed sensing.

Here, we review the recent experimental work in these areas,

with a focus on the critical design decisions:

■ From where do the training data come?

■ What is the architecture of the CNN?

■ How is the learning problem formulated and solved?

We also mention a few key theoretical papers that offer perspec-

tives on why CNNs are appropriate for inverse problems, and we

point to some next steps in the field.

Introduction

The basic ideas underlying the use of CNNs (also known as

ConvNets) for inverse problems are not new. Here, we give a

condensed history of CNNs to provide context to what fol-

lows. For further historical perspective, see [1]; for an acces-

sible introduction to deep neural networks and a summary of

their recent history, see [2]. The CNN architecture was pro-

posed in 1986 [3], and neural networks were developed for

solving inverse imaging problems as early as 1988 [4]. These

approaches, which used networks with few parameters and did

not always include learning, were largely superseded by com-

pressed sensing (or, broadly, convex optimization with regulariza-

tion) approaches in the 2000s. As computer hardware improved,

it became feasible to train larger neural networks, until, in 2012,

Krizhevsky et al. [5] achieved a significant improvement over the

state of the art on the ImageNet classification challenge by using

a graphics processing unit (GPU) to train a CNN with five con-

volutional layers and 60 million parameters on a set of 1.3 mil-

lion images. This work spurred a resurgence of interest in neural

Digital Object Identifier 10.1109/MSP.2017.2739299

Date of publication: 13 November 2017

©Istockphoto.com/zapp2photo

IEEE SIgnal ProcESSIng MagazInE | November 2017 |

networks and, specifically, CNNs—not only for computer vision

tasks but also for inverse problems.

The purpose of this article is to summarize the recent works

using CNNs for inverse problems in imaging, i.e., in problems

most naturally formulated as recovering an image from a set

of noisy measurements. This criterion excludes detection, seg-

mentation, classification, quality assessment, etc. We also focus

on CNNs, avoiding other architectures such as recurrent neu-

ral networks, fully connected networks, and stacked denoising

autoencoders. We organized our literature search by application,

selecting topics of broad interest where we could find at least

three peer-reviewed papers from the last ten years. (Much of the

work on the theory and practice of CNNs is posted on the pre-

print server arXiv.org before eventually appearing in traditional

journals. Because of the lack of peer review on arXiv.org, we

have preferred not to cite these papers, except in cases where we

are trying to illustrate a very recent trend or future direction for

the field.) The resulting applications and references are summa-

rized in Table 1. The aim of this constrained scope is to allow

us to draw meaningful generalizations from the surveyed works.

Background

We begin by introducing inverse problems and contrasting the

traditional approach to solving them with a learning-based

approach. For a textbook treatment of inverse problems, see

[28]. Throughout the section, we use X-ray computed tomogra-

phy (CT) as a running example, and Figure 1 shows images of

the various mathematical quantities we mention.

Learning for inverse problems in imaging

Mathematically speaking, an imaging system is an operator

X Y" that acts on an image

,x X! to create a vector of

measurements

,y Y! with

{} .Hx y

= The underlying func-

tion/vector spaces are

■ the space,

,X of acceptable images, which can be two-

dimensional (2-D), three-dimensional (3-D), or even

3-D+time, with its values representing a physical quantity

of interest, such as X-ray attenuation or concentration of

fluorophores

■ the space, ,Y of measurement vectors that depends on the

imaging operator and could include images (discrete arrays

of pixels), Fourier samples, line integrals, etc.

We typically consider

to be a continuous object (function of

space), while

is usually discrete:

= For example, in

X-ray CT,

is an image representing X-ray attenuations,

rep-

resents the physics of the X-ray source and detector, and

is the

measured sinogram (see Figure 1).

In an inverse imaging problem, we aim to develop a recon-

struction algorithm (which is also an operator),

: ,R Y X"

to recover the original image,

from the measurements,

The dominant approach for reconstruction, which we call the

objective function approach, is to model

and recover an

estimate of

from

argminRy fH

obj

" "

, ,

(1)

where

:H XY

" is the system model, which is usually lin-

ear, and

:f RY Y "#

is an appropriate measure of error.

reg

CNN

[25], [27] [25]

{y }

∼

–1

{y }

∼

–1

∼

CNN

FIGURE 1. A block diagram of image reconstruction methods, using images from X-ray CT as examples. An image,

creates measurements,

that can

be used to estimate

in a variety of ways. The traditional approach is to apply a direct inversion,

which is artifact prone in the sparse-measurement

case (note the stripes in the reconstruction). The current state of the art is a regularized reconstruction,

reg

written, in general, in (2). Several recent

works apply CNNs to the result of the direct inversion or an iterative reconstruction, but it might also be reasonable to use as input the measurements

themselves or the back projected measurements.

Table 1. Reviewed applications and associated references.

Denoising Deconvolution Superresolution MRI CT

[6]–[11] [10], [12]–[14] [9], [15]–[20] [21]–[23] [24]–[27]

IEEE SIgnal ProcESSIng MagazInE | November 2017 |

Continuing the CT example,

would be a discretization of the

X-ray transform (such as MATLAB’s radon), and

could

be the Euclidean distance,

{} .Hx y

For many appli-

cations, decades of engineering have gone into developing

a fast and reasonably accurate inverse operator,

so (1)

is easily approximated by

{}

;Ry

obj

for CT,

is the filtered back projection (FBP) algorithm. An important,

related operator is the back projection,

: ,H Y X

" which can

be interpreted as the simplest way to put measurements back

into the image domain (see Figure 1).

These direct inverses begin to show significant artifacts

when the number or quality of the measurements decreases,

either because the underlying discretization breaks down or

because the inversion of (1) becomes ill

posed (lacking a solution, lacking a unique

solution, or being unstable with respect to the

measurements). Unfortunately, in many real-

world problems, measurements are costly (in

terms of time, or, e.g., X-ray damage to the

patient), which motivates us to collect as few

as possible. To reconstruct from sparse or

noisy measurements, it is often better to use a

regularized formulation,

argminRy fHxy gx

reg

^^hh

" "

, ,

(2)

where

:g RX "

is a regularization func-

tional that promotes solutions that match our

prior knowledge of

and, simultaneously,

makes the problem well posed. For CT,

could be the total variation (TV) regularization, which penalizes

large gradients in

From this perspective, the challenge of solving an inverse

problem is designing and implementing (2) for a specific appli-

cation. Much effort has gone into designing general-purpose

regularizers and minimization algorithms. For example, com-

pressed sensing [29] provides sparsity-promoting regularizers.

Nonetheless, in the worst case, a new application necessitates

developing accurate and efficient

and

along with a

minimization algorithm.

An alternative to the objective function approach is called

the learning approach, where a training set of ground-truth

images and their corresponding measurements,

{( ,)

is known. A parametric reconstruction algorithm,

learn

then learned by solving

),argminRfxR

learn

(3)

where

is the set of all possible parameters,

:f RX X "#

is a measure of error, and

:g R"H

is a regularizer on the

parameters with the aim of avoiding overfitting. Once the

learning step is complete,

learn

can then be used to reconstruct

a new image from its measurements.

To summarize, in the objective function approach, the

reconstruction function is itself a regularized minimization

problem, while in the learning approach, the solution of a regu-

larized minimization problem is a parametric function that can

be used to solve the inverse problem. The learning formulation

is attractive because it overcomes many of the limitations of

the objective function approach: there is no need to handcraft

the forward model, cost function, regularizer, and optimizer

from (2). On the other hand, the learning approach requires a

training set, and the minimization (3) is typically more dif-

ficult than (2) and requires a problem-dependant choice of

and the class of functions described by

and

Finally, we note that the learning and objective function

approaches describe a spectrum rather than a dichotomy. In

fact, the learning formulation is strictly more general, includ-

ing the objective function formulation as a

special case. As we will discuss further in

the section “Network Architecture,” which

(if any) aspects of the objective formula-

tion approach to retain is a critical choice

in the design of learning-based approaches

to inverse problems in imaging.

CNNs

Our focus here is the formulation of (3) using

CNNs. Using a CNN means, roughly, fixing

the set of functions,

, to be a sequence of

(linear) filtering operations alternating with

simple nonlinear operations. This class of

functions is parametrized by the values of the

filters used (also known as filter weights),

and these filter weights are the parameters

over which the minimization occurs. For illustration, Figure 2

shows a typical CNN architecture.

We will discuss the theoretical motivations for using CNNs

as the learning architecture for inverse problems in the sec-

tion “Theory,” but we mention some practical advantages

here. First, the forward operation of a CNN consists of (usu-

ally small) convolutions and simple, pointwise nonlinear func-

tions. This means that, once training is complete, the execution

learn

is very fast and amenable to hardware acceleration

on GPUs. Second, the gradient of (3) is computable via the

chain rule, and these gradients again involve small convolu-

tions, meaning that the parameters can be learned efficiently

via gradient descent.

When the first CNN-based method entered the ImageNet

Large-Scale Visual Recognition Challenge in 2012 [5], its

error rate on the object localization and classification task was

15.3%, as compared to an error rate 26.2% for the next closest

method and 25.8% for the 2011 winner. In subsequent com-

petitions (2013–2016), the majority of the entries (and all of

the winners) were CNN based and continued to improve sub-

stantially, with the 2016 winner achieving an error rate of just

2.99%. Can we expect such large gains in inverse problems?

That is, can we expect denoising results to improve by an order

of magnitude (20 dB) in the next few years? Next, we answer

this question by surveying the results reported by recent CNN-

based approaches to image reconstruction.

In the objective

function approach, the

reconstruction function

is itself a regularized

minimization problem,

while in the learning

approach, the solution of

a regularized minimization

problem is a parametric

function that can be

used to solve the

inverse problem.

IEEE SIgnal ProcESSIng MagazInE | November 2017 |

Current state of performance

Of the inverse problems we review here, denoising provides the

best look at recent trends in results because there are standard

experiments that appear in most papers. Work on CNN-based

denoising from 2009 [6] showed an average peak signal-to-noise

ratio (PSNR) of 28.5 on the Berkeley segmentation data set, a

less than 1-dB improvement over contemporary wavelet and

Markov random field-based approaches. For comparison, one

very recent denoising work [11] reported a 0.7-dB improvement

on a similar experiment, which remains less than 1 dB better than

contemporary non-CNN methods (including block-matching and

3-D filtering, which had remained the state of the art for years).

As another point of reference, in 2012, one CNN approach [7]

reported an average PSNR of 30.2 dB on a set of standard test

images (Lena, peppers, etc.), less than 0.1 dB better than com-

parisons, and another [8] reported an average of 30.5 dB on the

same experiment. Recently, [11] achieved an average of 30.4 dB

under the same conditions. One important perspective on these

denoising results is that the CNN is learning the distribution of

natural images (or, equivalently, is learning a regularization).

Such a CNN could be reused inside an iterative optimization as a

proximal operator to enforce this learned regularization for any

inverse problem.

The trends are similar in deblurring and superresolution,

although experiments are more varied and therefore harder to

compare. For deblurring, [12] showed around a 1-dB PSNR

improvement over comparison methods, and [13] showed a

further improvement of approximately 1 dB. For superresolu-

tion, work from 2014 [15] reported a less than 0.5-dB improve-

ment in PSNR over comparisons. During the next two years,

[16] and [19] both reported a 0.5-dB PSNR increase over this

baseline. Even more recent work, [30], improves on the 2014

work by around 1.5dB in PSNR. For video superresolution,

[18] improves on non-CNN-based methods by about 0.5 dB

PSNR and [20] improves upon that result by another 0.5 dB.

For inverse problems in medical imaging, direct com-

parison between works is impossible due to the wide vari-

ety of experimental setups. A 2013 CNN-based work [24]

shows improvement in limited-view CT reconstruction over

direct methods and unregularized iterative methods but does

not compare to regularized iterative methods. In 2015, [25]

showed (in full-view CT) an improvement of several decibels

in signal-to-noise ratio (SNR) over direct reconstruction and

around 1-dB improvement over regularized iterative recon-

struction. Recently, [26] showed about 0.5-dB improvement in

PSNR over TV-regularized reconstruction, while [27] showed

a larger (1–4 dB) improvement in SNR over a different TV-

regularized method (Figure 3). In magnetic resonance imaging

(MRI), [22] demonstrates performance equal to the state of the

art, with advantages in running time.

Do these improvements matter? CNN-based methods have

not, so far, had the profound impact on inverse problems that

they have had for object classification. The difference between

30 and 30.5 dB is impossible to see by eye. On the other hand,

learn

{x } = c

T (c

x + b

) + b

Architecture

64 64 3

256

Objective Function

Filters and Biases

f (

) =



–

FIGURE 2. An illustration of a typical CNN architecture for 256

pixel RGB images, including the objective function used for training.

()

T $ is the rectified

linear unit function (point-wise nonlinear function). The symbol

denotes a 2-D convolution. The convolutions in each layer are described by a four-

dimensional tensor representing a stack of 3-D filters.

IEEE SIgnal ProcESSIng MagazInE | November 2017 |

these improvements occur in heavily studied fields: we have

been denoising the Lena image since the 1970s. Furthermore,

CNNs offer some unique advantages over many traditional

methods. The design of the CNN architecture can be more or

less decoupled from the application at hand and reused from

problem to problem. They can also be expanded in straightfor-

ward ways as computer memory grows, and there is some evi-

dence that larger networks lead to better performance. Finally,

once trained, running the model is fast (dozens of convolutions

per image, usually less than 1 s). This means that CNN-based

methods can be attractive in terms of running time even if they

do not improve upon state-of-the-art performance.

Designing CNNs for inverse problems

In this section, we survey the design decisions needed to devel-

op CNN-based approaches for inverse problems in imaging.

We organize the section around the learning equation as sum-

marized in Figure 4, first describing how the training set is

created, then how the network architecture is designed, and,

finally, how the learning problem is formulated and solved.

Training set

Learning requires a suitable training set, i.e., the (input, out-

put) pairs from which the CNN will learn. In a typical learning

problem, training outputs are provided by some oracle label-

ing a set of inputs. For example, in object classification, a set

of human graders might view a large number of images and

provide annotations for each. In the inverse problem setting,

this is considerably more difficult because no such oracle exists.

For example, in X-ray CT, to generate a training set, we would

need to image a large number of physical phantoms for which

wehave exact 3-D models, which is not feasible in practice. The

choice of the training set also constrains the network architec-

ture because the input and output of the network must match the

dimensions of

and

respectively.

Generating training data

In some cases, generating training data is straightforward

because the forward model we aim to invert is known exactly

and easily computable. In denoising, training data are generated

by corrupting images with noise; the noisy image then serves as

training input and the clean image as the training output, as in,

e.g., [6] and [7]. Or, the noise itself can serve as the oracle

output, in a scheme called residual learning [11], [23]. Super-

resolution follows the same pattern, where training pairs are eas-

ily generated by downsampling, as in, e.g., [19]. The same is true

Ground Truth FBP SNR 13.43 TV SNR 24.89 FBP ConvNet SNR 28.53

(a) (b) (c) (d)

FIGURE 3. An example of X-ray CT reconstructions. (a) The ground truth comes from an FBP reconstruction using 1,000 views. (b)–(d) are reconstructions from

just 50 views using FBP, a regularized reconstruction, and from a CNN-based approach. The CNN-based reconstruction preserves more of the texture present in the

ground truth and results in a significant increase in SNR. (Images are reproduced with permission from [27]).

learn =

arg min

, θ

∈

n = 1

f (x

, R

)) + g (θ )

A) Training Set

D) Optimization

C) Cost Function and

Regularization

B) Network Architecture

FIGURE 4. The learning equation, which we use to organize the parts of

the section “Designing CNNs for Inverse Problems”.

Convolutional Neural Networks for Inverse Problems in Imaging: A Review

Figures

Citations

An overview of deep learning in medical imaging focusing on MRI

An overview of deep learning in medical imaging focusing on MRI

Machine learning for data-driven discovery in solid Earth geoscience

Deep learning on image denoising: An overview.

Solving inverse problems using data-driven models

References

Deep learning

Generative Adversarial Nets

Dropout: a simple way to prevent neural networks from overfitting

ImageNet classification with deep convolutional neural networks

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Related Papers (5)

U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep learning

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Convolutional neural networks for inverse problems in imaging" ?

Q2. What is the concrete perspective on CNNs?

Q3. What is the importance of patch size in a training set?

Q4. What is the emerging paradigm for learning from sparse measurements?

Q5. What is the way to avoid a small set of images dominating the error during training?

Q6. What can be done to improve the confidence in the results of a CNN?

Q7. What are the main areas of research that have been applied to inverse problems?

Q8. Why is this critique of CNNs important?

Q9. What is the method for correcting artifacts created by direct or iterative?

Q10. What is the way to compare the results of CNN-based denoising?

Q11. How did one CNN approach achieve an average of 30.5 dB?

Q12. What is the way to build the inverse operator into the network architecture?

Q13. What is the general perspective on CNNs?