scispace - formally typeset
Open AccessJournal ArticleDOI

A Tour of Modern Image Filtering: New Insights and Methods, Both Practical and Theoretical

TLDR
A practical and accessible framework is presented to understand some of the basic underpinnings of algorithms in wide use such as block-matching and three-dimensional filtering (BM3D), and methods for their iterative improvement (or nonexistence thereof) are discussed.
Abstract
In this article, the author presents a practical and accessible framework to understand some of the basic underpinnings of these methods, with the intention of leading the reader to a broad understanding of how they interrelate. The author also illustrates connections between these techniques and more classical (empirical) Bayesian approaches. The proposed framework is used to arrive at new insights and methods, both practical and theoretical. In particular, several novel optimality properties of algorithms in wide use such as block-matching and three-dimensional (3-D) filtering (BM3D), and methods for their iterative improvement (or nonexistence thereof) are discussed. A general approach is laid out to enable the performance analysis and subsequent improvement of many existing filtering algorithms. While much of the material discussed is applicable to the wider class of linear degradation models beyond noise (e.g., blur,) to keep matters focused, we consider the problem of denoising here.

read more

Content maybe subject to copyright    Report

IEEE SIGNAL PROCESSING MAGAZINE [106] JANUARY 2013 1053-5888/13/$31.00©2013IEEE
R
ecent developments in computational imaging and
restoration have heralded the arrival and convergence
of several powerful methods for adaptive processing of
multidimensional data. Examples include moving
least square (from graphics), the bilateral filter (BF)
and anisotropic diffusion (from computer vision), boosting, ker-
nel, and spectral methods (from machine learning), nonlocal
means (NLM) and its variants (from signal processing), Bregman
iterations (from applied math), kernel regression, and iterative
scaling (from statistics). While these approaches found their inspi-
rations in diverse fields of nascence, they are deeply connected.
In this article, I present a practical and accessible framework
to understand some of the basic underpinnings of these meth-
ods, with the intention of leading the reader to a broad under-
standing of how they interrelate. I also illustrate connections
between these techniques and more classical (empirical) Bayes-
ian approaches.
The proposed framework is used to arrive at new insights
and methods, both practical and theoretical. In particular, sev-
eral novel optimality properties of algorithms in wide use such
as block-matching and three-dimensional (3-D) filtering
(BM3D), and methods for their iterative improvement (or non-
existence thereof) are discussed.
A general approach is laid out to enable the performance
analysis and subsequent improvement of many existing filtering
[
Peyman Milanfar
]
[
New insights and methods,
both practical and theoretical
]
A Tour
of Modern
Image Filtering
photo by david wagner
Digital Object Identifier 10.1109/MSP.2011.2179329
Date of publication: 5 December 2012

IEEE SIGNAL PROCESSING MAGAZINE [107] JANUARY 2013
algorithms. While much of the material discussed is applicable
to the wider class of linear degradation models beyond noise
(e.g., blur,) to keep matters focused, we consider the problem of
denoising here.
INTRODUCTION
Multidimensional filtering is the most fundamental operation
in image and video processing, and low-level computer vision.
In particular, the most widely used canonical filtering operation
is one that removes or attenuates the effect of noise. As such,
the basic design and analysis of image filtering operations form
a very large part of the image processing literature; the
resulting techniques often
quickly spreading to the wider
range of restoration and recon-
struction problems in imaging.
Over the years, many approaches
have been tried, but only recently
in the last decade or so, a great
leap forward in performance has
been realized. While largely unac-
knowledged in our community,
this phenomenal progress has been mostly thanks to the adop-
tion and development of nonparametric point estimation proce-
dures adapted to the local structure of the given
multidimensional data. Viewed through the lens of the denois-
ing application, here we develop a general framework for
understanding the basic science and engineering behind these
techniques and their generalizations. Surely this is not the first
article to attempt such an ambitious overview, and it will likely
not be the last; but the aim here is to provide a self-contained
presentation that distills, generalizes, and puts into proper con-
text many other excellent earlier works such as [1]–[5], and,
more recently, [6]. It is fair to say that this article is, by neces-
sity, not completely tutorial. Indeed it does contain several
novel results; yet these are largely novel interpretations, for-
malizations, or generalizations of ideas already known or
empirically familiar to the community. Hence, I hope that the
enterprising reader will find this article not only a good over-
view, but as should be the case with any useful presentation, a
source of new insights and food for thought.
So to begin, let us consider the measurement model for the
denoising problem
,,,,y ze in1for
iii
f=+ = (1)
where
()zzx
ii
= is the underlying latent signal of interest at a
position
,,xxx
,,iii
T
12
=
6
@
y
i
is the noisy measured signal (pixel
value), and
e
i
is zero-mean, white noise with variance
.
2
v
We
make no other distributional assumptions for the noise. The
problem of interest then is to recover the complete set of sam-
ples of
(),zx
which we denote vectorially as
(),(), ,()zx zx zxz
n
T
12
f=
6
@
from the corresponding data set
y. To restate the problem more concisely, the complete mea-
surement model in vector notation is given by (surely a
similar analysis to what follows can and should be carried out
for more general inverse problems such as deconvolution,
interpolation, etc.
(2)
It has been realized for some time now that effective restora-
tion of signals will require methods which either model the
signal a priori (i.e., are Bayesian) or learn the underlying char-
acteristics of the signal from the given data (i.e., learning,
nonparametric, or empirical Bayes’ methods.) Most recently,
the latter category of approaches has become exceedingly pop-
ular. Perhaps the most striking
recent example is the popularity
of patch-based methods [7]–[10].
This new generation of algo-
rithms exploit both local and
nonlocal redundancies or “self-
similarities” in the signals being
treated. Earlier on, the BF [8] was
developed with very much the
same idea in mind, as were its
spiritually close predecessors: the Susan filter [11], normal-
ized convolution [12], and the filters of Yaroslavsky [13]. The
common philosophy among these and related techniques is
the notion of measuring and making use of affinities between
a given data point (or more generally patch or region) of inter-
est, and others in the given measured signal y. These similari-
ties are then used in a filtering context to give higher weights
to contributions from more similar data values, and to prop-
erly discount data points that are less similar. The pattern rec-
ognition literature has also been a source of parallel ideas. In
particular, the celebrated mean-shift algorithm [14], [15] is in
principle an iterated version of point-wise regression as also
described in [1] and [2]. In the machine learning community,
the general regression problem has been carefully studied, and
deep connections between regularization, least-squares regres-
sion, and the support vector formalism have also been estab-
lished [16]–[19].
Despite the voluminous recent literature on techniques
based on these ideas, simply put, the key differences between
the resulting practical filtering methods have been relatively
minor, but rather poorly understood. In particular, the under-
lying framework for each of these methods is distinct only to
the extent that the weights assigned to different data points is
decided upon differently. To be more concrete and mathemati-
cally precise, let us consider the denoising problem (2) again.
The estimate of the signal
()zx
at the position x is found using
a (nonparametric) point estimation framework; specifically,
the weighted least squares problem
() () (, ,,),argminzx yzxKxxyy
()
j
zx
ij ij
ij
i
n
2
1
j
=-
=
t
6
@
/
(3)
where the weight (or kernel) function
()K
$ is a a symmetric
function with respect to the indices i and j.
()K
$ is also a
RECENT DEVELOPMENTS IN
COMPUTATIONAL IMAGING AND
RESTORATION HAVE HERALDED
THE ARRIVAL AND CONVERGENCE
OF SEVERAL POWERFUL METHODS
FOR ADAPTIVE PROCESSING OF
MULTIDIMENSIONAL DATA.

IEEE SIGNAL PROCESSING MAGAZINE [108] JANUARY 2013
positive valued and unimodal function that measures the
“similarity” between the samples
y
i
and
,y
j
at respective posi-
tions
x
i
and
.x
j
If the kernel function is restricted to be only a
function of the spatial locations
x
i
and
j
,x
then the resulting
formulation is what is known as (classical, or not data-adap-
tive) kernel regression in the nonparametric statistics litera-
ture [20], [21]. Perhaps more importantly, the key difference
between local and nonlocal patch-based methods lies essen-
tially in the definition of the range of the sum in (3). Specifi-
cally, indices covering a small spatial region around a pixel of
interest define local methods, and vice versa.
Interestingly, in the early 1980s, the essentially identical
concept of moving least-squares emerged independently [22],
[23] in the graphics community. This idea has since been
widely adopted in computer
graphics [24] as a very effective
tool for smoothing and interpola-
tion of data in three dimensions.
Surprisingly, despite the obvious
connections between moving
least-squares and the adaptive fil-
ters based on similarity, their
kinship has remained largely hid-
den so far.
EXISTING ALGORITHMS
Over the years, the measure of similarity
( ,,, )Kx xyy
ijij
has
been defined in a number of different ways, leading to a
cacophony of filters, including some of the most well-known
recent approaches to image denoising [7]–[9]. Figure 1 gives a
graphical illustration of how different choices of similarity
kernels lead to different classes of filters, some of which we
discuss next.
ClassiCal RegRession FilteRs
Naturally, the most naive way to measure the “distance”
between two pixels is to simply consider their spatial Euclidean
distance; specifically, using a Gaussian kernel,
(, ,,
).
expKx xyy
h
xx
ijij
x
ij
2
2
=
--
eo
Such filters essentially lead to (possibly space-varying) Gauss-
ian filters which are quite familiar from traditional image pro-
cessing [13], [20], [21], [25]. It is possible to adapt the
variance (or bandwidth) parameter
h
x
to the local image sta-
tistics, and obtain a relatively modest improvement in perfor-
mance. But the lack of stronger adaptivity to the underlying
structure of the signal of interest is a major drawback of these
classical approaches.
the BilateRal FilteR
Another manifestation of the for-
mulation in (3) is the BF [8], [13]
where the spatial and photomet-
ric distances between two pixels
are taken into account in separa-
ble fashion as follows:
(, ,,)
()
()
.
exp exp
exp
Kx xyy
h
xx
h
yy
h
xx
h
yy
ijij
x
ij
y
ij
x
ij
y
ij
2
2
2
2
2
2
2
2
=
--
--
=
--
+
--
e
e
o
o
)
3
(4)
As can be observed in the exponent on the right-hand side, and
in Figure 1, the similarity metric here is a weighted Euclidean
distance between the vectors
(, )xy
ii
and
(, ).xy
jj
This approach
has several advantages. Specifically, while the kernel is easy to
construct and computationally simple
to calculate, it yields useful local adap-
tivity to the given data. In addition, it
has only two control parameters
(, ),hh
xy
which make it very convenient
to use. However, as is well known,
this filter does not provide effective
performance in low signal-to-noise
scenarios [3].
nonloCal Means
The NLM algorithm [7], [26], [27], origi-
nally proposed in [28] and [29], has
stirred a great deal of interest in the
community in recent years. At its core,
however, it is a relatively simple general-
ization of the BF; specifically, the photo-
metric term in the bilateral similarity
kernel, which is measured point-wise, is
simply replaced with one that is patch-
wise. A second difference is that (at least
in theory) the geometric distance
The Photometric
Distance
Bilateral Filter
Nonlocal Means
Classic Kernel Regression
The Spatial Distance
x
y
K (y
i
- y)
K (x
i
- x) . K (y
i
- y )
K (x
i
- x)
LARK
The Geodesic
Distance
The Euclidean Distance
dx
2
+ dy
2
dy = |y
i
- y|
dx = |x
i
- x|
:
[FIG1] Similarity metrics and the resulting filters.
THE HUMBLE WEIGHTED
LEAST-SQUARES, WITH LOCALLY
DATA-DEPENDENT WEIGHTS
BASED ON SIMILARITY KERNELS,
EXPLAINS AN ENORMOUS CLASS OF
RECENTLY POPULARIZED AND VERY
EFFECTIVE FILTERS.

IEEE SIGNAL PROCESSING MAGAZINE [109] JANUARY 2013
between the patches (corresponding to the first term in the
bilateral similarity kernel), is essentially ignored, leading to
strong contribution from patches that may not be physically
near the pixel of interest (hence, the name nonlocal). To sum-
marize, the NLM kernel is
(, ,,)
yy
exp expKx xyy
h
xx
h
ijij
x
ij
y
ij
2
2
2
2
=
-- --
eeoo
(5)
with
,h
x
" 3 where
y
i
and
y
j
refer now to patches of pixels cen-
tered at pixels
y
i
and
y
j
, respectively. In practice, two imple-
mentation details should be observed. First, the patch-wise
photometric distance
yy
ij
2
-
in the above is in fact measured
as
()(),yyGy y
ij
T
ij
--
where G is a fixed diagonal matrix con-
taining Gaussian weights, which give higher importance to the
center of the respective patches. Second, it is rather computa-
tionally impractical to compare all the patches
y
i
to
,y
j
so
although the NLM approach in Buades et al. [28] theoretically
forces
h
x
to be infinite, in practice typically the search is limited
to a reasonable spatial neighborhood of
.y
j
Consequently, in
effect, the NLM filter too is more or less local; or said another
way,
h
x
is never infinite in practice. The method in Awate et al.
[29], on the other hand, proposes a Gaussian-distributed sam-
ple that comes closer to the exponential weighting on Euclidean
distances in (5).
Despite its popularity, the performance of the NLM filter
leaves much to be desired. The true potential of this filtering
scheme was demonstrated only later with the optimal spatial
adaptation (OSA) approach of Boulanger and Kervrann [26]. In
their approach, the photometric distance was refined to
include estimates of the local noise variances within each
patch. Specifically, they computed a local diagonal covariance
matrix, and defined the locally adaptive photometric distance
as
()()
yyVy y
ij
T
j
ij
1
--
-
in such a way as to minimize an esti-
mate of the local mean squared error (MSE). Furthermore,
they considered iterative application of the filter as discussed
in the section “Improving the Estimate by Iteration.”
loCally adaptive RegRession
(steeRing) KeRnels
The key idea behind this measure of similarity, originally proposed
in [9], is to robustly obtain the local structure of images by analyz-
ing the photometric (pixel value) differences based on estimated
gradients, and to use this structure information to adapt the shape
and size of a canonical kernel. The locally adaptive regression ker-
nel (LARK) is defined as follows:
(, ,,)()( ),expKx xyyxxxxC
ijij ij
T
ii j
=-
--
"
,
(6)
where the matrix
(, )yyCC
iij
= is estimated from the given
data as
()
() ()
() ()
()
.
zx
zxzx
zxzx
zx
C
i
x
j
xjxj
xj
xj
x
j
j
2
2
,
,,
,,
,
i
ii
ii
i
1
12
12
2
=
>
H
/
Specifically,
()zx
xj
,i
*
are the estimated gradients of the underly-
ing signal at point
,x
i
computed from the given measurements
y
j
in a patch around the point of interest. In particular, the gra-
dients used in the above expression can be estimated from the
given noisy image by applying classical (i.e., nonadaptive)
locally linear kernel regression. Details of this estimation proce-
dure are given in [30]. The reader may recognize the above
matrix as the well-studied “structure tensor” [31]. The advan-
tage of the LARK descriptor is that it is exceedingly robust to
noise and perturbations of the data. The formulation is also the-
oretically well motivated since the quadratic exponent in (6)
essentially encodes the local geodesic distance between the
points (
,xy
i i
) and (
,xy
jj
) on the graph of the function
( , )zxy
thought of as a two-dimensional (2-D) surface (a manifold)
embedded in three dimensions. The geodesic distance was also
used in the context of the Beltrami-flow kernel in [32] and [33]
in an analogous fashion.
GENERALIZATIONS AND PROPERTIES
The above discussion can be naturally generalized by defining
the augmented data variable
,xty
i
i
T
i
T
T
=
6
@
and a general
Gaussian kernel as follows:
(, )()( ),expK tt ttQt t
ij ij
T
ij
=-
--
"
,
(7)
,
0
0
Q
Q
Q
,ij
x
y
=
;
E
(8)
where Q is symmetric positive definite (SPD).
Setting
QI
x
h
1
x
2
=
and
,0Q
y
= we have classical kernel
regression, whereas one obtains the BF framework when
QI
x
h
1
x
2
=
and
[, ,,, ,,].00 100diagQ
h
1
y
y
2
ff=
The latter
diagonal matrix picks out the center pixel in the element-
wise difference of patches
.tt
ij
- When
0Q
x
= and
,QG
h
1
y
y
2
=
we have the NLM filter and its variants. Finally, the LARK
kernel in (6) is obtained when
QC
xi
= and
.Q0
y
= More gen-
erally, the matrix Q can be selected so that it has nonzero
off-diagonal blocks. However, no practical algorithms with
this choice have been proposed so far. As detailed below, with
an SPD Q, this general approach results in valid SPD ker-
nels, a property that is used throughout the rest of our dis-
cussion. The definition of t given here is only one of many
possible choices. Our treatment in this article is equally
valid when, for instance,
,xtT y=
^h
is any feature derived
from a convenient linear or nonlinear transformation of the
original data.
The above concepts can be further extended using the pow-
erful theory of reproducible kernels originated in functional
analysis (and later successfully adopted in machine learning
[34], [35]) to present a significantly more general framework
for selecting the similarity functions. This will help us identify
the wider class of admissible similarity kernels more formally,
and to understand how to produce new kernels from ones
already defined [35]. Formally, a scalar-valued function
( , )K ts
over a compact region of its domain
R
n
is called an admissible
kernel if

IEEE SIGNAL PROCESSING MAGAZINE [110] JANUARY 2013
K is symmetric:
( , )(, )KKts st
=
K is positive definite. That is, for
any collection of points
,t
i
,,,in1
f= the Gram matrix with
elements
(, )KKtt
,ij ij
= is positive
definite.
Such kernels satisfy some useful prop-
erties such as positivity,
( , ) ,K 0tt $
and the Cauchy–Schwartz
inequality
(, )(,) (,).KKKts tt ss
#
With the above definitions in place, there are numerous
ways to construct new valid kernels from existing ones. We list
some of the most useful ones below, without proof [35]. Given
two valid kernels
( , )K ts
1
and
( , ),K ts
2
the following construc-
tions yield admissible kernels:
1)
() ( , )(, )KK K,ts ts ts
12
ab=+
for any pair
,0$
ab
2)
( , )(, )(, )KKKts ts ts
12
=
3)
(, )()(),Kkkts ts
= where
( )k
$ is a scalar-valued function
4)
(, )(,),KpKts ts
1
=
^h
where
()p
$ is a polynomial with posi-
tive coefficients
5)
( , )(( , )) .expKKts ts
1
=
Regardless of the choice of the kernel function, the weighted
least-square optimization problem (3) has a simple solution. In
matrix notation we can write
() () ()
,
argminzx zx zx
11
yKy
()
j
zx
jn
T
jj
n
j
=- -
t
66
@@
(9)
where
,, ,111
n
T
f=
6
@
and
(, ,,), (, ,Kx xyyKxxdiagK
jjjj11 2
=
6
,),,(, ,,).y yKxxyy
jnjnj2
f
@
The closed-form solution to the
above is
()zx 111KKy
j
n
T
jn
n
T
j
1
=
-
t
^h
(10)
(, ,,)(,,,)Kx xyyKxx
yyy
ijij
i
ijiji
i
1
=
-
ccmm
//
K
K
y
ij
i
ij
i
i
=
/
/
Wy
ij i
i
=
/
.wy
j
T
=
(11)
So in general, the estimate
()zx
j
t
of the signal at position
x
j
is
given by a weighted sum of all the given data points
(),y x
i
each contributing a weight commensurate with its similarity
as indicated by
(),K
$ with the measurement
()y x
j
at the position
of interest. Furthermore, as should be apparent in (10), the
weights sum to one. To control computational complexity, or to
design local versus nonlocal filters, we may choose to set the
weight for some “sufficiently far-away” pixels to zero or a small
value, leading to a weighted sum involving a relatively small
number of data points in a properly defined vicinity of the sam-
ple of interest. This is essentially the only distinction between
locally adaptive processing methods (such as BL and LARK) and
so-called nonlocal methods such as NLM. It is worth noting
that in the formulation above, despite the simple form of (10),
in general we have a nonlinear estimator since the weights
(, ,,)WWxxyy
ij ijij
= depend on the noisy data. The
nonparametric approach in
(3) can be further extended
to include a more general
expansion of the signal z(x)
in a desired basis. We briefly
discuss this case in “General-
ization to Arbitrary Bases,
but leave its full treatment for future research.
To summarize the discussion so far, we have presented a
general framework that absorbs many existing algorithms as
special cases. This was done in several ways, including a general
description of the set of admissible similarity kernels, which
allows the construction of a wide variety of new kernels not con-
sidered before in the image processing literature. Next, we turn
our attention to the matrix formulation of the nonparametric
filtering approach. As we shall see, this provides a framework for
more in-depth and intuitive understanding of the resulting fil-
ters, their subsequent improvement, and for their respective
asymptotic and numerical properties.
Before we end this section, it is worth saying a few words
about computational complexity. In general, patch-based meth-
ods are quite computationally intensive. Recently, many works
have aimed at both efficiently searching for similar patches, and
more cleverly computing the resulting weights. Among these,
notable recent work appears in [36] and [37].
THE MARTIX FORMULATION AND ITS PROPERTIES
In this section, we analyze the filtering problems posed earlier
in the language of linear algebra and make several theoretical
and practical observations. In particular, we are able to not only
study the numerical/algebraic properties of the resulting filters,
but also to analyze some of their fundamental statistical
properties.
To begin, recall the convenient vector form of the filters:
()
,
zx
wy
j
j
T
=
t
(12)
where
= (, ,,), (, ,,), ,(,,,Wx xyyWxxyy Wx xyw
j
T
jj jj njn11 22
f
6
)y
j
@
is a vector of weights for each j. Writing the above at once
for all j we have
.
z
w
w
w
yW
y
T
T
n
T
1
2
h
==
t
R
T
S
S
S
S
S
V
X
W
W
W
W
W
(13)
As such, the filters defined by the above process can be ana-
lyzed as the product of a (square,
nn#
) matrix of weights W
with the vector of the given data y. First, a notational matter:
W is in general a function of the data, so strictly speaking, the
notation
()W y
would be more descriptive. But as we will
describe later in more detail, the typical process for computing
these weights in practice involves first computing a prelimi-
nary denoised “pilot,” or “prefiltered” version of the image,
from which the weights are then calculated. This preprocess-
ing, done only for the purposes of computing the parameters
THE PERFORMANCE OF ANY
KERNEL-BASED DENOISING METHOD
CAN BE IMPROVED BY SOME TYPE
OF ITERATION. THE KEY IS TO USE
THE RIGHT ITERATION.

Citations
More filters
Journal ArticleDOI

Graph Signal Processing: Overview, Challenges, and Applications

TL;DR: An overview of core ideas in GSP and their connection to conventional digital signal processing are provided, along with a brief historical perspective to highlight how concepts recently developed build on top of prior research in other areas.
Proceedings ArticleDOI

MemNet: A Persistent Memory Network for Image Restoration

TL;DR: A very deep persistent memory network (MemNet) is proposed that introduces a memory block, consisting of a recursive unit and a gate unit, to explicitly mine persistent memory through an adaptive learning process.
Journal ArticleDOI

Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration

TL;DR: This work proposes a dynamic nonlinear reaction diffusion model with time-dependent parameters, which preserves the structural simplicity of diffusion models and take only a small number of diffusion steps, which makes the inference procedure extremely fast.
Proceedings Article

Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections

TL;DR: This paper proposes to symmetrically link convolutional and de-convolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum, making training deep networks easier and achieving restoration performance gains consequently.
Proceedings ArticleDOI

Plug-and-Play priors for model based reconstruction

TL;DR: This paper demonstrates with some simple examples how Plug-and-Play priors can be used to mix and match a wide variety of existing denoising models with a tomographic forward model, thus greatly expanding the range of possible problem solutions.
References
More filters
Journal ArticleDOI

Scale-space and edge detection using anisotropic diffusion

TL;DR: A new definition of scale-space is suggested, and a class of algorithms used to realize a diffusion process is introduced, chosen to vary spatially in such a way as to encourage intra Region smoothing rather than interregion smoothing.
Journal ArticleDOI

Mean shift: a robust approach toward feature space analysis

TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Journal ArticleDOI

A tutorial on support vector regression

TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Journal ArticleDOI

Matching pursuits with time-frequency dictionaries

TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Journal ArticleDOI

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions in this paper?

In this article, I present a practical and accessible framework to understand some of the basic underpinnings of these methods, with the intention of leading the reader to a broad understanding of how they interrelate. In particular, several novel optimality properties of algorithms in wide use such as block-matching and three-dimensional ( 3-D ) filtering ( BM3D ), and methods for their iterative improvement ( or nonexistence thereof ) are discussed. 

If the orthonormal basis V contains a constant vector ,1vi n n1=l one can easily make W doubly stochastic by setting its corresponding shrinkage factor .1im =l(a)0.50.60.70.80.911.11.2M 

One way to remedy the lack of optimality of the choice of kernel is to apply the resulting filters iteratively, and that is the subject of this section. 

In particular, the gradients used in the above expression can be estimated from the given noisy image by applying classical (i.e., nonadaptive) locally linear kernel regression. 

expanding the regression function ( )z x in a desired basis ,lz the authors can formulate the following optimization problem:( ) ( ) ( , ) ( , , , ),arg minx y x x x K y y x xz ( ) j xi l j l i j lNini i j 021l j b {= - b ==t = G// (S1)where N is the model (or regression) order. 

While largely unacknowledged in their community, this phenomenal progress has been mostly thanks to the adoption and development of nonparametric point estimation procedures adapted to the local structure of the given multidimensional data. 

From a practical point of view, and insofar as the computation of the matrix W is concerned, it is always reasonable to assume that the noise variance is relatively small, because in practice the authors typically compute W on a “prefiltered” version of the noisy image y anyway. 

On the other hand, one may legitimately worry that the effect of noise on the computation of these weights may be dramatic, resulting in too much sensitivity for the resulting filters to be effective. 

jANuARy 2013and their symmetrized versions computed by Monte-Carlo simulations are also shown, where in each simulation 100 independent noise realizations are averaged.