scispace - formally typeset
Open AccessJournal ArticleDOI

Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model

TLDR
This paper presents an alternative, analysis-based model, where an analysis operator-hereafter referred to as the analysis dictionary-multiplies the signal, leading to a sparse outcome.
Abstract
The synthesis-based sparse representation model for signals has drawn considerable interest in the past decade. Such a model assumes that the signal of interest can be decomposed as a linear combination of a few atoms from a given dictionary. In this paper we concentrate on an alternative, analysis-based model, where an analysis operator-hereafter referred to as the analysis dictionary-multiplies the signal, leading to a sparse outcome. Our goal is to learn the analysis dictionary from a set of examples. The approach taken is parallel and similar to the one adopted by the K-SVD algorithm that serves the corresponding problem in the synthesis model. We present the development of the algorithm steps: This includes tailored pursuit algorithms-the Backward Greedy and the Optimized Backward Greedy algorithms, and a penalty function that defines the objective for the dictionary update stage. We demonstrate the effectiveness of the proposed dictionary learning in several experiments, treating synthetic data and real images, and showing a successful and meaningful recovery of the analysis dictionary.

read more

Content maybe subject to copyright    Report

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
1
Analysis K-SVD: A Dictionary-Learning Algorithm
for the Analysis Sparse Model
Ron Rubinstein, Member, IEEE, Tomer Peleg, Student Member, IEEE and Michael Elad, Fellow, IEEE
Abstract—The synthesis-based sparse representation model for
signals has drawn considerable interest in the past decade. Such
a model assumes that the signal of interest can be decomposed
as a linear combination of a few atoms from a given dictionary.
In this paper we concentrate on an alternative, analysis-based
model, where an analysis operator hereafter referred to as
the analysis dictionary multiplies the signal, leading to a sparse
outcome. Our goal is to learn the analysis dictionary from a set of
examples. The approach taken is parallel and similar to the one
adopted by the K-SVD algorithm that serves the corresponding
problem in the synthesis model. We present the development of
the algorithm steps: This includes tailored pursuit algorithms
the Backward Greedy and the Optimized Backward Greedy
algorithms, and a penalty function that defines the objective for
the dictionary update stage. We demonstrate the effectiveness of
the proposed dictionary learning in several experiments, treating
synthetic data and real images, and showing a successful and
meaningful recovery of the analysis dictionary.
Index Terms—Sparse Representations, Synthesis Model, Anal-
ysis Model, Backward Greedy (BG) Pursuit, Optimized Back-
ward Greedy Pursuit (OBG), Dictionary Learning, K-SVD,
Image Denosing.
I. INTRODUCTION
A. Synthesis and Analysis Signal Models
Signal models are fundamental for handling various pro-
cessing tasks, such as denoising, solving inverse problems,
compression, interpolation, sampling, and more. Among the
many ways we have to model signals, one approach that has
found great popularity in the past decade is the synthesis-
based sparse representation model. In this model, a signal
x R
d
is modeled as being the outcome of the multiplication
x = Dα, where D R
d×n
is a dictionary its columns are
signal prototypes (atoms) that are used to build the signal. We
typically consider a redundant dictionary with n > d. The
vector α R
n
is the redundant signal’s representation, and a
fundamental feature in this model is the expectation that this
vector is sparse, i.e. kαk
0
= k d. This implies that the
signals we work on are assumed to be composed as linear
combinations of a few atoms from the dictionary [1], [2].
Vast work on the synthesis model during the past decade
has been invested in an attempt to better understand it, and
Copyright (c) 2012 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
R. Rubinstein and M. Elad are with the Computer Science Department,
Technion Israel Institute of Technology, Haifa 32000, Israel (e-mail: {ron-
rubin,elad}@cs.technion.ac.il). T. Peleg is with the Department of Electrical
Engineering, Technion Israel Institute of Technology, Haifa 32000, Israel
(e-mail: tomerfa@tx.technion.ac.il).
This work was supported by the European Commissions FP7-FET program,
SMALL project (grant agreement no. 225913).
build practical tools for its use. The main activity concentrated
on problems such as methods to estimate or approximate
the sparse representation from the possibly corrupted signal,
deriving theoretical success guarantees for such algorithms,
and techniques to learn the dictionary D from signal examples.
Referring specifically to the last point of dictionary learning,
two popular techniques for this task are the MOD and K-SVD
algorithms [3]–[5], whose deployment has led to state-of-the-
art results in various image processing applications [2].
While the synthesis model has been extensively studied,
there is a dual analysis viewpoint to sparse representations
that has been left aside almost untouched [6]. The analysis
model relies on a linear operator (a matrix) R
p×d
,
which we will refer to as the analysis dictionary, and whose
rows constitute analysis atoms. The key property of this
model is our expectation that the analysis representation vector
x R
p
should be sparse with ` zeros. These zeros carve out
the low-dimensional subspace that this signal belongs to. We
shall assume that the dimension of this subspace, which is
denoted by r is indeed small, namely r d.
While this description may seem similar to the synthesis
counterpart approach, it is in-fact very different when dealing
with a redundant dictionary p > d. More on this model will be
given below, contrasting it with the synthesis alternative. Until
recently, relatively little was known about the analysis model,
and little attention has been given to it in the literature, com-
pared to the synthesis counterpart model. In the past few years
there is a growing interest in the analysis model, as we gain
more understanding and insight to its interesting viewpoint.
See [7]–[16] for some work that has already commenced on
this model.
In this paper we focus on the analysis model and more
specifically, on the development of an algorithm that would
learn the analysis dictionary from a set of signal examples
X = [x
1
, x
2
, . . . , x
R
]. The objective is to find a suitable
dictionary so that the analysis coefficients X are sparse.
We note that when dealing with a square (and invertible)
matrix , the analysis model is completely equivalent to the
synthesis one with
1
= D [6], and in such a case, the
synthesis-dictionary-learning methods can be used to build .
In this work, though, we concentrate on the redundant case
(p > d), where the two models depart, and where the analysis
model becomes more interesting and powerful. This case of
analysis dictionary training is a challenging problem, which
has recently started to attract attention [13]–[16].

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
2
B. Related Work
One of the first attempts to train an analysis model was
the pioneering work of Black and Roth, who adopted a very
different point of view in their quest for [17], [18]. Roth and
Black trained an image prior, termed Field-of-Experts (FoE),
for the purpose of regularizing inverse problems in image
processing. Their work originates from the area of probabilistic
image modeling [19], [20], which aims at representing image
patches through the notion of constraint violation and filter re-
sponses. When applying the analysis model on image patches,
each row in can be viewed as a local filter operating on a
patch, and once included in the co-support of the patch, this
row serves as a constraint. Reviewing this area of research is
not the intention of this paper. Instead we focus on the FoE
approach and refer the readers to [18] for a comprehensive
review.
The FoE prior derives its value from the sparsity of the anal-
ysis representations computed for overlapping image patches.
The measure of sparsity used is a relaxed `
0
-norm. Our
approach will be based on a different sparsity measure the
co-rank, which in turn will define a different objective for
dictionary learning. Black and Roth use contrastive divergence
for learning the analysis atoms, which approximates the max-
imum likelihood estimator and requires a high computational
load as it relies heavily on Monte Carlo sampling. As such, this
learning framework differs substantially from our work, which
will take a block-coordinate relaxation approach, alternating
between an analysis pursuit stage for each signal example and
a simple update rule for each of the learned atoms.
The training set used by the FoE approach is a large
database of image regions (each consisting of a set of over-
lapping patches) and the learning algorithm runs “offline”
resulting in one generic prior that will be suitable for any
natural image. In the context of image denoising, previous
work on the synthesis model [21] has shown that adapting
the dictionary to a given noisy image can lead to improved
image denoising performance with respect to denoising with
a “global” dictionary that was trained “offline”. The approach
we are about to suggest in this paper is capable of learning
an adaptive analysis dictionary from a given noisy set of
examples.
More recently, three interesting attempts to learn a redun-
dant analysis dictionary have been proposed [13]–[16]. The
first work, reported in [13], suggests to incrementally learn
one row at a time, exploiting the fact that a considerable
set of examples is expected to be orthogonal to such a row.
Assuming knowledge of this set of signals, the eigenvector
that corresponds to the smallest eigenvalue of these examples’
autocorrelation matrix is the desired row. For each row, the
proposed algorithm thus alternates between the computation
of this row from the current subset of chosen examples, and
an update of this subset to reject outlier signals. This algorithm
relies heavily on a randomized initialization strategy, both for
enabling the detection of a variety of rows, and for surpassing
deadlock situations in the iterative process. As the dimension
of the signal d grows (and with it p, the number of rows
in ), this approach loses its efficiency rapidly, requiring
(too) many rounds of attempts before a new row is detected.
Moreover, this method might suffer from a poor detection
quality, since a row that was admitted to the accumulated set
cannot be replaced. These two limitations will be demonstrated
in Section V-A when comparing our approach with [13] in the
task of recovering the dictionary in a synthetic setup.
The work reported in [14], [15] takes a different route
towards the task of learning , posing it as a constrained opti-
mization problem. The goal of sparsifying the representations
X is formulated by an `
1
-norm penalty function on these
representations. In order to avoid the trivial solution = 0 and
solutions like an orthonormal basis in the first d rows followed
by zero rows, this work proposes to constrain the dictionary
to be a uniform normalized tight frame. However, this choice
limits the possible to be learned, and puts a rather arbitrary
constraint for regularizing the learning problem. In our work
we aim at handling the most general setup of redundant
analysis dictionaries and therefore we would like to be less
restrictive as possible with respect to the learned atoms.
The work in [16] proposes to learn such that it optimizes
the denoising performance on a given set of example pairs
(clean and noisy versions of example signals). The learning
is achieved by formulating the learning task as a bilevel-
programming optimization problem, which in turn is handled
using gradient descent. The main focus of [16] is on learning
an analysis operator that takes the form of a convolution, which
is equivalent to learning one filter (convolution kernel). This
is very different from our main goal learning a set of p
analysis atoms, which can be viewed as local filters operating
on image patches.
In the process of preparing this paper for publication,
another relevant work on analysis dictionary learning was
brought to our attention [22]. The approach suggested in this
paper shares some basic ideas with the FoE approach, such
as learning a “global” dictionary for natural image patches
and inserting it to a regularization term of an image recovery
formulation. However, the authors of [22] take a very different
route towards the dictionary learning problem, posing it in
terms of an optimization over manifolds. This allows them
to update the analysis dictionary as a whole, in contrast to
the separate atom updates practiced in our approach, thus
explicitly enforcing basic dictionary properties, such as having
distinct rows and full row rank into the learning procedure.
Using these optimization tools, they learn a redundant analysis
dictionary that obtains competitive results with respect to
the synthesis counterpart model for various image processing
applications.
C. This Work
In this paper we adopt a different approach to the anal-
ysis training problem, based on a co-rank measure which
determines the dimension of the analysis subspace. This co-
rank measure allows us to develop a novel training algorithm,
whose uniqueness is in the relations it exhibits with the
synthesis formulation. Specifically, the proposed dictionary-
training method is parallel to the synthesis-model K-SVD in
its rationale and computational steps. Similar to the work in

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
3
[14], [15], we consider the learning process as a solution of
a constrained optimization task. However, as we shall show
next, the constraint we employ uses our knowledge on the
signals and their relation to in a more direct way. The atom
update rule in our proposed approach will be similar to the
one suggested in [13]. However, it will be better justified by
deriving it directly from the constrained optimization problem.
Moreover, the set of signals orthogonal to each row in will
be determined in a more effective fashion, resulting in a more
efficient algorithm.
The analysis model gives rise to a series of research
questions, which are far from being solved: (i) What are the
desired properties of an analysis dictionary? (ii) How can
a signal and its sparse analysis representation be recovered,
given the dictionary and a noisy version of the signal? (iii)
Can the analysis dictionary be learned from a given data-set of
examples? In this work we aim at providing some answers to
these questions. Our main contribution is an efficient algorithm
for obtaining the analysis dictionary from a given data-set in
a K-SVD-like manner. We demonstrate the potential of this
approach in a series of experiments on synthetic and real data
(images), showing the ability of the algorithm to recover a
meaningful result in all cases. We note that our main goal in
this work is to highlight the potential and capability of the co-
rank analysis approach, and we do not focus here on specific
applications.
This paper is organized as follows: In Section II we present
the core concept of the co-rank analysis model, and character-
ize the signals that belong to it. In Section III we consider
the analysis pursuit problem of denoising a signal using
the analysis model, which serves as an important building-
block in the Analysis K-SVD algorithm described in detail in
Section IV. Finally, Section V provides several experiments
that demonstrate the performance of the Analysis K-SVD
algorithm.
II. A CLOSER LOOK AT THE ANALYSIS MODEL
In this section we briefly review the co-rank analysis model,
and characterize the signals that belong to it, and which our
learning algorithm is to operate on. The content of this section
relies in part on explanations found in [8], [9], [13].
The analysis model for the signal x R
d
uses the possibly
redundant analysis dictionary R
p×d
(redundancy here
implies p d), and assumes that the analysis representa-
tion vector x should be sparse. In this work we consider
specifically `
0
sparsity, which implies that x contains many
zeros. The co-sparsity ` of the analysis model is defined as
the number of zeros in the vector x,
kxk
0
= p `. (1)
In the synthesis model the representation α is obtained
by a complex and non-linear pursuit process that seeks (or
approximates) the sparsest solution to the linear system of
equations Dα = x. This representation can be arbitrarily
sparse, kαk
0
= k d. The signal x is characterized by the
k non-zero indices in the representation vector α, and their
associated atoms define the subspace this signal belongs to.
The dimension of this subspace equals k and as we mentioned
before, it is small with respect to the signal dimension d.
In contrast, in the analysis model the computation of the
representation is trivial, obtained by the multiplication x. In
this model we put an emphasis on the zeros of x, and define
the co-support Λ of x as the set of ` = |Λ| rows that are
orthogonal to it. In other words,
Λ
x = 0, where
Λ
is a
sub-matrix of that contains only the rows indexed in Λ.
For a given analysis dictionary , we define the co-rank of a
signal x with co-support Λ as the rank of
Λ
. The signal x
is thus characterized by its co-support, which determines the
subspace it is orthogonal to, and consequently the complement
space to which it belongs. Just like in the synthesis model, we
assume that the dimension of the subspace the signal belongs
to, denoted by r, is small, namely r d. The co-rank of such
an analysis signal is d r.
How sparse can the analysis representation vector be? Let
us first assume that the rows in are in general-position,
implying that every subset of d or less rows are necessarily
linearly independent. This is equivalent to the claim that the
spark of
T
is full [2]. Naturally, for this case, ` < d, since
otherwise there would be d independent rows orthogonal to
x, implying x = 0. Thus, in this case the analysis model
leads necessarily to a mild sparsity, kxk
0
> p d, and
for a highly redundant analysis operator, the cardinality of
the analysis representation vector x is expected to be quite
high. In this case, the dimension of the subspace the signal
belongs to is r = d `.
A more interesting case is when
T
has non-full spark,
implying that linear dependencies exist between the dictionary
atoms. The immediate implication is that ` could go beyond
d, and yet the signal would not necessarily be nulled. An
example of such a dictionary is the set of cyclic horizontal
and vertical one-sided derivatives, applied on a 2D signal of
size
d×
d. The corresponding analysis dictionary, denoted
DIF
, is of size 2d × d, thus twice redundant. Figure 1
shows this dictionary for d = 25. In [9] this dictionary is
discussed in detail, showing that its rows exhibit strong linear
dependencies.
Generating a random analysis signal amounts to the follow-
ing process: Choose a set of row indices Λ {1, . . . , p}
this will be the signal’s co-support. Starting with a random
vector u, project it onto the subspace orthogonal to
Λ
:
x = (I
Λ
Λ
)u, (2)
and x is an analysis signal that satisfies our sparsity assump-
tion. For a general-positioned we choose ` rows from
at random. Otherwise we choose d r linearly independent
rows from . This choice is still done in a random fashion,
but is naturally more restricted. In the experiments that follow
we shall use such randomly generated signals, when dealing
with synthetic experiments.
As mentioned above, when the rows in are not in general-
position, the co-sparsity ` can be greater than d. In this case,
once a signal x has been generated using the process (2),
computation of its analysis representation x, could reveal
additional rows that are orthogonal to the signal, due to linear
dependence on the chosen subset Λ. To demonstrate this

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
4
Figure 1. Left: The dictionary
DI F
of size 50 × 25, corresponding to horizontal and vertical cyclic one-sided derivatives of image patches of size 5 × 5
pixels. Right: Examples of sparse analysis signals (5 × 5 patches) residing in 4-dimensional subspaces related to the dictionary
DI F
, i.e., orthogonal to 21
linearly-independent atoms each.
0 10 20 30 40 50 60
0
20
40
60
80
100
120
140
cosparsity l
# of signals
Figure 2. A histogram of the effective co-sparsities of the 1000 analysis
signals generated from
DI F
of size 50× 25. The reference value of ` = 21
is indicated by the thin vertical line. As can be seen, the effective co-sparsities
are all strictly higher.
behavior, we generated 1000 unit-norm analysis signals resid-
ing in 4-dimensional subspaces related to
DIF
R
50×25
.
Figure 1 presents a set of such randomly created signals. For
these signals, Figure 2 presents a histogram of the effective
co-sparsities. As can be seen, though the signals are each
orthogonal to subspaces of rank 21, their actual co-sparsities
are much higher, varying in the range 23 to 40. Thus, we see
that by allowing linear dependencies between the rows in ,
co-sparsities much higher than the signal dimension can be
achieved.
An equivalent way to interpret the co-rank analysis and
`
0
synthesis models is as Unions of Subspaces (UoS) signal
models [23]. As we have seen, in both cases the sparse signals
reside within some UoS defined by the dictionary atoms. In
the synthesis case, these subspaces are formed by the spans
of all sets of atoms with rank k for some choice of
k. In contrast, in the analysis case these subspaces are the
orthogonal complements of the sets of atoms with rank = dr.
We note that when the dictionaries are in general position,
the number of such subsets is
n
k
and
p
dr
, respectively.
In general, the UoS’s associated with the two models will be
very different. For example, if p = n = 2d, k = r d and
the rows in are in general-position, the subspaces united
by the two models are of the same dimension (k or r), but
their number is entirely different, with many more subspaces
included in the analysis model.
III. ANALYSIS SPARSE-CODING
A. Defining the Pursuit Problem
Before we study the problem of learning the analysis dic-
tionary , we have to consider a simpler task called analysis
sparse-coding or analysis pursuit. As we shall see in the
next section, this is an important building-block in the overall
dictionary-learning procedure.
A convenient property of the analysis approach is that given
a signal x, we can readily compute its analysis coefficients
x, and thus determine the cardinality of its analysis repre-
sentation. However, if we assume an additive contamination
of the signal, y = x + v, then computation of the analysis
representation x is no longer simple. We shall assume
that v is a zero-mean white-Gaussian additive noise vector.
Recovering the correct signal x from its noisy version (and
thereby computing the analysis representation), y, requires
solving a problem of the form
n
ˆ
x,
ˆ
Λ
o
= Argmin
x, Λ
kx yk
2
Subject To (3)
Λ
x = 0
Rank (
Λ
) = d r
or
n
ˆ
x,
ˆ
Λ
o
= Argmin
x, Λ
Rank(
Λ
) Subject To (4)
Λ
x = 0
kx yk
2
.
In Equation (3) we require a co-rank of d r for the obtained
solution, while in Equation (4), we constrain the solution to
be -close to the given noisy signal, where this error tolerance
is derived from the noise power. The above problems can
be considered as denoising schemes, as
ˆ
x is an attempt to

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
5
estimate the true noiseless signal x. The two problems (3) and
(4) are equivalent, of course, given the correct correspondence
between r and , and the choice between them depends on the
available information regarding the process that generated y.
We refer to these problems as the analysis sparse-coding or
analysis-pursuit problems.
In principle, denoising is possible with the analysis model
because, once the co-support has been detected, projection
on the complement subspace attenuates the additive noise in
the co-support subspace, thus cleaning the signal. Indeed, the
higher the dimension of the true co-support, the better the
denoising is expected to perform.
In an oracle setup, the true co-support Λ is known, and thus
can be used for obtaining a signal recovery,
ˆ
x =
I
Λ
Λ
y. (5)
The mean denoising error in the oracle setup is given by
Ekx
ˆ
xk
2
2
= tr
I
Λ
Λ
σ
2
= rσ
2
, (6)
where tr(·) denotes the trace of a matrix. In the first equality
we used the fact that the noise is white and Gaussian and
that
Λ
x = 0. The last equality holds since I
Λ
Λ
is a
projection matrix onto a r-dimensional space, so that it has
two eigenvalues a zero eigenvalue of multiplicity d r and
an eigenvalue 1 with multiplicity r. This should remind the
reader of the oracle error in the synthesis case, as described
in [2].
Similar to the synthesis sparse approximation problem, the
problems posed in Equations (3) and (4) are combinatorial
in nature and can thus only be approximated in general. One
approach to approximating the solution is to relax the `
0
norm
and replace it with an `
1
penalty function, producing
ˆ
x = Argmin
x
kx yk
2
Subject To kxk
1
T. (7)
This approach is parallel to the basis-pursuit approach for
synthesis approximation [24], and the resulting problem may
be solved via an iterated re-weighted least squares (IRLS)
method [25], or using standard quadratic or conic optimization
methods.
B. The Backward-Greedy Algorithm
A second approach parallels the synthesis greedy pursuit
algorithms [26], [27], and is the one we shall use in this
work. It suggests selecting rows from one-by-one in a
greedy fashion. The solution can be built by either detecting
the rows that correspond to the non-zeros in x, or by
detecting the zeros. The first approach is the one taken by
the Greedy-Analysis-Pursuit (GAP) algorithm, described in
[9]. We shall take the alternative (and simpler) approach of
finding the co-support Λ one element at a time. We refer
to this algorithm as the Backward-Greedy (BG) Algorithm,
as it is gathering the zeros in the representation. A detailed
description of this algorithm, is given below in Algorithm 1.
Note that this algorithm takes as input the co-rank d r of the
desired co-support rather than the exact number of zeros, and
thus the actual number of vanishing coefficients in the output
representation may be larger than d r.
The process begins by setting
ˆ
x = y and initializing the
co-support to be an empty set of rows. In each iteration, the
inner-products
ˆ
x are computed, and the row with the smallest
non-zero inner-product is selected and added to the set. The
solution
ˆ
x is then updated by projecting y on the orthogonal
space to the selected rows. Finally, the co-support is refined
by recalculating the representation vector
ˆ
x and finding the
additional coefficients that fall below some small threshold
0
. This can reveal additional rows that are orthogonal to
the current estimate of the signal, namely the rows that are
spanned by the existing set of rows
Λ
i
. The process described
above repeats until the target subspace dimension is achieved.
Algorithm 1 BACKWARD-GREEDY
1: Input: Analysis dictionary R
p×d
, signal y R
d
, and
target co-rank d r
2: Output: Signal
ˆ
x R
d
with co-rank dr and minimizing
ky
ˆ
xk
2
3: Initialization: Set i = 0, Λ
0
:= ,
ˆ
x
0
:= y
4: for i = 1 . . . d r do
5: Sweep:
ˆ
k
i
:= Argmin
k /Λ
i1
|w
T
k
ˆ
x
i1
|
6: Update Co-Support: Λ
i
:= Λ
i1
{
ˆ
k
i
}
7: Project:
ˆ
x
i
:=
I
Λ
i
Λ
i
y
8: Refine Co-Support:
Λ
i
:= {k |1 k p, |w
T
k
ˆ
x
i
| <
0
}
9: end for
10: return
ˆ
x =
ˆ
x
dr
In practice, the above algorithm can be implemented ef-
ficiently by accumulating an orthogonalized set of the co-
support rows. This means that once
ˆ
k
i
has been found and
the row w
T
ˆ
k
i
is about to join the co-support, it is first orthog-
onalized with respect to the already accumulated rows using
a modified Gram-Schmidt process. Denoting by {q
j
}
i1
j=1
the
orthogonal set accumulated so far (as column vectors), the
orthogonalization of w
T
ˆ
k
i
is obtained by
q
i
= w
ˆ
k
i
i1
X
j=1
(q
T
j
w
ˆ
k
i
)q
j
. (8)
This should be followed by a normalization of this vector,
q
i
= q
i
/kq
i
k
2
.
The above-described orthogonalization process is done for
one purpose: avoiding the matrix inversion in the update of
ˆ
x
i
.
The “Projection” step in Algorithm 1 translates comfortably
to
ˆ
x
i
=
I
Λ
i
Λ
i
y =
I
i
X
j=1
q
j
q
T
j
y. (9)

Figures
Citations
More filters
Journal ArticleDOI

High Capacity Reversible Data Hiding in Encrypted Images by Patch-Level Sparse Representation

TL;DR: This paper proposes to consider the patch-level sparse representation when hiding the secret data, and significantly outperforms the state-of-the-art methods in terms of the embedding rate and the image quality.
Journal ArticleDOI

Multi-Scale Patch-Based Image Restoration

TL;DR: This scheme comes to alleviate another shortcoming existing in patch-based restoration algorithms-the fact that a local (patch-based) prior is serving as a model for a global stochastic phenomenon.
Proceedings ArticleDOI

Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation

TL;DR: The results show that the proposed JCAS method outperforms state-of-the-arts in these applications in terms of both quantitative measure and visual perception quality.
Journal ArticleDOI

Deep dictionary learning

TL;DR: It is shown how deeper architectures can be built using the layers of dictionary learning and postulate that the proposed formulation can pave the path for a new class of deep learning tools.
Journal ArticleDOI

Greedy-like algorithms for the cosparse analysis model

TL;DR: This work proposes a new family of pursuit algorithms for the cosparse analysis model, mimicking the greedy-like methods – compressive sampling matching pursuit (CoSaMP), subspace pursuit (SP), iterative hard thresholding (IHT) andhard thresholding pursuit (HTP).
References
More filters
Journal ArticleDOI

Atomic Decomposition by Basis Pursuit

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Journal ArticleDOI

Matching pursuits with time-frequency dictionaries

TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Journal ArticleDOI

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Journal ArticleDOI

Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering

TL;DR: An algorithm based on an enhanced sparse representation in transform domain based on a specially developed collaborative Wiener filtering achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.
Journal ArticleDOI

Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

TL;DR: This work addresses the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image, and uses the K-SVD algorithm to obtain a dictionary that describes the image content effectively.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Analysis k-svd: a dictionary-learning algorithm for the analysis sparse model" ?

In this paper the authors concentrate on an alternative, analysis-based model, where an analysis operator – hereafter referred to as the analysis dictionary – multiplies the signal, leading to a sparse outcome. The authors present the development of the algorithm steps: The authors demonstrate the effectiveness of the proposed dictionary learning in several experiments, treating synthetic data and real images, and showing a successful and meaningful recovery of the analysis dictionary. 

Their work gives rise to several questions which are left open for future research. Further work is required to reveal additional dictionary properties and design efficient algorithms for encouraging these properties. In this work the authors have seen evidence that having linear dependencies between sets of rows in the dictionary can improve the recovery quality of pursuit algorithms. Having strong linear dependencies is only one desired property of the analysis dictionary and their experiments on natural images imply that other properties, such as ROPP, can be very useful as well. 

The co-sparsity ` of the analysis model is defined as the number of zeros in the vector Ωx,‖Ωx‖0 = p− `. (1)In the synthesis model the representation α is obtained by a complex and non-linear pursuit process that seeks (or approximates) the sparsest solution to the linear system of equations Dα = x. 

The training set used by the FoE approach is a large database of image regions (each consisting of a set of overlapping patches) and the learning algorithm runs “offline” resulting in one generic prior that will be suitable for any natural image. 

The work in [16] proposes to learn Ω such that it optimizes the denoising performance on a given set of example pairs (clean and noisy versions of example signals). 

In this paper the authors focus on the analysis model and more specifically, on the development of an algorithm that would learn the analysis dictionary Ω from a set of signal examples X = [x1, x2, . . . , xR]. 

Adopting the approach taken by the K-SVD, the authors maintain the constraint on Ωxi by constraining each xi to remain orthogonal to the rows in Ω it has been found to already be orthogonal to. 

Using Equation (11) the authors get that the provisional steps (lines 6–8) in Algorithm 2 can be replaced by computing q(k)i for every k /∈ Λi−1 using (8) and the eventual “Sweep” step can be replaced byk̂i := 

Referring specifically to the last point of dictionary learning, two popular techniques for this task are the MOD and K-SVD algorithms [3]–[5], whose deployment has led to state-of-theart results in various image processing applications [2] 

The signal x is thus characterized by its co-support, which determines the subspace it is orthogonal to, and consequently the complement space to which it belongs. 

Generating a random analysis signal amounts to the following process: Choose a set of row indices Λ ⊆ {1, . . . , p} — this will be the signal’s co-support. 

For each row, the proposed algorithm thus alternates between the computation of this row from the current subset of chosen examples, and an update of this subset to reject outlier signals. 

The authors note that, similar to the K-SVD algorithm [4], the update of wj should be affected only by those columns of X̂ that are orthogonal to it, while the remaining signal examples should have no influence.