What future works have the authors mentioned in the paper "Analysis k-svd: a dictionary-learning algorithm for the analysis sparse model" ?

Their work gives rise to several questions which are left open for future research. Further work is required to reveal additional dictionary properties and design efficient algorithms for encouraging these properties. In this work the authors have seen evidence that having linear dependencies between sets of rows in the dictionary can improve the recovery quality of pursuit algorithms. Having strong linear dependencies is only one desired property of the analysis dictionary and their experiments on natural images imply that other properties, such as ROPP, can be very useful as well.

What is the ieee's approach to the constraint on xi?

Adopting the approach taken by the K-SVD, the authors maintain the constraint on Ωxi by constraining each xi to remain orthogonal to the rows in Ω it has been found to already be orthogonal to.

How can the authors replace the provisional steps in Algorithm 2?

Using Equation (11) the authors get that the provisional steps (lines 6–8) in Algorithm 2 can be replaced by computing q(k)i for every k /∈ Λi−1 using (8) and the eventual “Sweep” step can be replaced byk̂i :=

What is the ieee's view on the update of wj?

The authors note that, similar to the K-SVD algorithm [4], the update of wj should be affected only by those columns of X̂ that are orthogonal to it, while the remaining signal examples should have no influence.

(Open Access) Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model (2013) | Ron Rubinstein

Q: What have the authors contributed in "Analysis k-svd: a dictionary-learning algorithm for the analysis sparse model" ?

In this paper the authors concentrate on an alternative, analysis-based model, where an analysis operator – hereafter referred to as the analysis dictionary – multiplies the signal, leading to a sparse outcome. The authors present the development of the algorithm steps: The authors demonstrate the effectiveness of the proposed dictionary learning in several experiments, treating synthetic data and real images, and showing a successful and meaningful recovery of the analysis dictionary.

Q: What is the co-sparsity of the analysis model?

The co-sparsity ` of the analysis model is defined as the number of zeros in the vector Ωx,‖Ωx‖0 = p− `. (1)In the synthesis model the representation α is obtained by a complex and non-linear pursuit process that seeks (or approximates) the sparsest solution to the linear system of equations Dα = x.

Q: What is the purpose of this paper?

In this paper the authors focus on the analysis model and more specifically, on the development of an algorithm that would learn the analysis dictionary Ω from a set of signal examples X = [x1, x2, . . . , xR].

Q: What are the popular techniques for learning the analysis dictionary?

Referring specifically to the last point of dictionary learning, two popular techniques for this task are the MOD and K-SVD algorithms [3]–[5], whose deployment has led to state-of-theart results in various image processing applications [2]

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Analysis K-SVD: A Dictionary-Learning Algorithm

for the Analysis Sparse Model

Ron Rubinstein, Member, IEEE, Tomer Peleg, Student Member, IEEE and Michael Elad, Fellow, IEEE

Abstract—The synthesis-based sparse representation model for

signals has drawn considerable interest in the past decade. Such

a model assumes that the signal of interest can be decomposed

as a linear combination of a few atoms from a given dictionary.

In this paper we concentrate on an alternative, analysis-based

model, where an analysis operator – hereafter referred to as

the analysis dictionary – multiplies the signal, leading to a sparse

outcome. Our goal is to learn the analysis dictionary from a set of

examples. The approach taken is parallel and similar to the one

adopted by the K-SVD algorithm that serves the corresponding

problem in the synthesis model. We present the development of

the algorithm steps: This includes tailored pursuit algorithms

– the Backward Greedy and the Optimized Backward Greedy

algorithms, and a penalty function that deﬁnes the objective for

the dictionary update stage. We demonstrate the effectiveness of

the proposed dictionary learning in several experiments, treating

synthetic data and real images, and showing a successful and

meaningful recovery of the analysis dictionary.

Index Terms—Sparse Representations, Synthesis Model, Anal-

ysis Model, Backward Greedy (BG) Pursuit, Optimized Back-

ward Greedy Pursuit (OBG), Dictionary Learning, K-SVD,

Image Denosing.

I. INTRODUCTION

A. Synthesis and Analysis Signal Models

Signal models are fundamental for handling various pro-

cessing tasks, such as denoising, solving inverse problems,

compression, interpolation, sampling, and more. Among the

many ways we have to model signals, one approach that has

found great popularity in the past decade is the synthesis-

based sparse representation model. In this model, a signal

x ∈ R

is modeled as being the outcome of the multiplication

x = Dα, where D ∈ R

d×n

is a dictionary – its columns are

signal prototypes (atoms) that are used to build the signal. We

typically consider a redundant dictionary with n > d. The

vector α ∈ R

is the redundant signal’s representation, and a

fundamental feature in this model is the expectation that this

vector is sparse, i.e. kαk

= k  d. This implies that the

signals we work on are assumed to be composed as linear

combinations of a few atoms from the dictionary [1], [2].

Vast work on the synthesis model during the past decade

has been invested in an attempt to better understand it, and

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

R. Rubinstein and M. Elad are with the Computer Science Department,

Technion – Israel Institute of Technology, Haifa 32000, Israel (e-mail: {ron-

rubin,elad}@cs.technion.ac.il). T. Peleg is with the Department of Electrical

Engineering, Technion – Israel Institute of Technology, Haifa 32000, Israel

(e-mail: tomerfa@tx.technion.ac.il).

This work was supported by the European Commissions FP7-FET program,

SMALL project (grant agreement no. 225913).

build practical tools for its use. The main activity concentrated

on problems such as methods to estimate or approximate

the sparse representation from the possibly corrupted signal,

deriving theoretical success guarantees for such algorithms,

and techniques to learn the dictionary D from signal examples.

Referring speciﬁcally to the last point of dictionary learning,

two popular techniques for this task are the MOD and K-SVD

algorithms [3]–[5], whose deployment has led to state-of-the-

art results in various image processing applications [2].

While the synthesis model has been extensively studied,

there is a dual analysis viewpoint to sparse representations

that has been left aside almost untouched [6]. The analysis

model relies on a linear operator (a matrix) Ω ∈ R

p×d

which we will refer to as the analysis dictionary, and whose

rows constitute analysis atoms. The key property of this

model is our expectation that the analysis representation vector

Ωx ∈ R

should be sparse with ` zeros. These zeros carve out

the low-dimensional subspace that this signal belongs to. We

shall assume that the dimension of this subspace, which is

denoted by r is indeed small, namely r  d.

While this description may seem similar to the synthesis

counterpart approach, it is in-fact very different when dealing

with a redundant dictionary p > d. More on this model will be

given below, contrasting it with the synthesis alternative. Until

recently, relatively little was known about the analysis model,

and little attention has been given to it in the literature, com-

pared to the synthesis counterpart model. In the past few years

there is a growing interest in the analysis model, as we gain

more understanding and insight to its interesting viewpoint.

See [7]–[16] for some work that has already commenced on

this model.

In this paper we focus on the analysis model and more

speciﬁcally, on the development of an algorithm that would

learn the analysis dictionary Ω from a set of signal examples

X = [x

, x

, . . . , x

]. The objective is to ﬁnd a suitable

dictionary Ω so that the analysis coefﬁcients ΩX are sparse.

We note that when dealing with a square (and invertible)

matrix Ω, the analysis model is completely equivalent to the

synthesis one with Ω

−1

= D [6], and in such a case, the

synthesis-dictionary-learning methods can be used to build Ω.

In this work, though, we concentrate on the redundant case

(p > d), where the two models depart, and where the analysis

model becomes more interesting and powerful. This case of

analysis dictionary training is a challenging problem, which

has recently started to attract attention [13]–[16].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

B. Related Work

One of the ﬁrst attempts to train an analysis model was

the pioneering work of Black and Roth, who adopted a very

different point of view in their quest for Ω [17], [18]. Roth and

Black trained an image prior, termed Field-of-Experts (FoE),

for the purpose of regularizing inverse problems in image

processing. Their work originates from the area of probabilistic

image modeling [19], [20], which aims at representing image

patches through the notion of constraint violation and ﬁlter re-

sponses. When applying the analysis model on image patches,

each row in Ω can be viewed as a local ﬁlter operating on a

patch, and once included in the co-support of the patch, this

row serves as a constraint. Reviewing this area of research is

not the intention of this paper. Instead we focus on the FoE

approach and refer the readers to [18] for a comprehensive

review.

The FoE prior derives its value from the sparsity of the anal-

ysis representations computed for overlapping image patches.

The measure of sparsity used is a relaxed `

-norm. Our

approach will be based on a different sparsity measure – the

co-rank, which in turn will deﬁne a different objective for

dictionary learning. Black and Roth use contrastive divergence

for learning the analysis atoms, which approximates the max-

imum likelihood estimator and requires a high computational

load as it relies heavily on Monte Carlo sampling. As such, this

learning framework differs substantially from our work, which

will take a block-coordinate relaxation approach, alternating

between an analysis pursuit stage for each signal example and

a simple update rule for each of the learned atoms.

The training set used by the FoE approach is a large

database of image regions (each consisting of a set of over-

lapping patches) and the learning algorithm runs “ofﬂine”

resulting in one generic prior that will be suitable for any

natural image. In the context of image denoising, previous

work on the synthesis model [21] has shown that adapting

the dictionary to a given noisy image can lead to improved

image denoising performance with respect to denoising with

a “global” dictionary that was trained “ofﬂine”. The approach

we are about to suggest in this paper is capable of learning

an adaptive analysis dictionary from a given noisy set of

examples.

More recently, three interesting attempts to learn a redun-

dant analysis dictionary have been proposed [13]–[16]. The

ﬁrst work, reported in [13], suggests to incrementally learn

Ω one row at a time, exploiting the fact that a considerable

set of examples is expected to be orthogonal to such a row.

Assuming knowledge of this set of signals, the eigenvector

that corresponds to the smallest eigenvalue of these examples’

autocorrelation matrix is the desired row. For each row, the

proposed algorithm thus alternates between the computation

of this row from the current subset of chosen examples, and

an update of this subset to reject outlier signals. This algorithm

relies heavily on a randomized initialization strategy, both for

enabling the detection of a variety of rows, and for surpassing

deadlock situations in the iterative process. As the dimension

of the signal d grows (and with it p, the number of rows

in Ω), this approach loses its efﬁciency rapidly, requiring

(too) many rounds of attempts before a new row is detected.

Moreover, this method might suffer from a poor detection

quality, since a row that was admitted to the accumulated set

cannot be replaced. These two limitations will be demonstrated

in Section V-A when comparing our approach with [13] in the

task of recovering the dictionary in a synthetic setup.

The work reported in [14], [15] takes a different route

towards the task of learning Ω, posing it as a constrained opti-

mization problem. The goal of sparsifying the representations

ΩX is formulated by an `

-norm penalty function on these

representations. In order to avoid the trivial solution Ω = 0 and

solutions like an orthonormal basis in the ﬁrst d rows followed

by zero rows, this work proposes to constrain the dictionary

to be a uniform normalized tight frame. However, this choice

limits the possible Ω to be learned, and puts a rather arbitrary

constraint for regularizing the learning problem. In our work

we aim at handling the most general setup of redundant

analysis dictionaries and therefore we would like to be less

restrictive as possible with respect to the learned atoms.

The work in [16] proposes to learn Ω such that it optimizes

the denoising performance on a given set of example pairs

(clean and noisy versions of example signals). The learning

is achieved by formulating the learning task as a bilevel-

programming optimization problem, which in turn is handled

using gradient descent. The main focus of [16] is on learning

an analysis operator that takes the form of a convolution, which

is equivalent to learning one ﬁlter (convolution kernel). This

is very different from our main goal – learning a set of p

analysis atoms, which can be viewed as local ﬁlters operating

on image patches.

In the process of preparing this paper for publication,

another relevant work on analysis dictionary learning was

brought to our attention [22]. The approach suggested in this

paper shares some basic ideas with the FoE approach, such

as learning a “global” dictionary for natural image patches

and inserting it to a regularization term of an image recovery

formulation. However, the authors of [22] take a very different

route towards the dictionary learning problem, posing it in

terms of an optimization over manifolds. This allows them

to update the analysis dictionary as a whole, in contrast to

the separate atom updates practiced in our approach, thus

explicitly enforcing basic dictionary properties, such as having

distinct rows and full row rank into the learning procedure.

Using these optimization tools, they learn a redundant analysis

dictionary that obtains competitive results with respect to

the synthesis counterpart model for various image processing

applications.

C. This Work

In this paper we adopt a different approach to the anal-

ysis training problem, based on a co-rank measure which

determines the dimension of the analysis subspace. This co-

rank measure allows us to develop a novel training algorithm,

whose uniqueness is in the relations it exhibits with the

synthesis formulation. Speciﬁcally, the proposed dictionary-

training method is parallel to the synthesis-model K-SVD in

its rationale and computational steps. Similar to the work in

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

[14], [15], we consider the learning process as a solution of

a constrained optimization task. However, as we shall show

next, the constraint we employ uses our knowledge on the

signals and their relation to Ω in a more direct way. The atom

update rule in our proposed approach will be similar to the

one suggested in [13]. However, it will be better justiﬁed by

deriving it directly from the constrained optimization problem.

Moreover, the set of signals orthogonal to each row in Ω will

be determined in a more effective fashion, resulting in a more

efﬁcient algorithm.

The analysis model gives rise to a series of research

questions, which are far from being solved: (i) What are the

desired properties of an analysis dictionary? (ii) How can

a signal and its sparse analysis representation be recovered,

given the dictionary and a noisy version of the signal? (iii)

Can the analysis dictionary be learned from a given data-set of

examples? In this work we aim at providing some answers to

these questions. Our main contribution is an efﬁcient algorithm

for obtaining the analysis dictionary from a given data-set in

a K-SVD-like manner. We demonstrate the potential of this

approach in a series of experiments on synthetic and real data

(images), showing the ability of the algorithm to recover a

meaningful result in all cases. We note that our main goal in

this work is to highlight the potential and capability of the co-

rank analysis approach, and we do not focus here on speciﬁc

applications.

This paper is organized as follows: In Section II we present

the core concept of the co-rank analysis model, and character-

ize the signals that belong to it. In Section III we consider

the analysis pursuit problem of denoising a signal using

the analysis model, which serves as an important building-

block in the Analysis K-SVD algorithm described in detail in

Section IV. Finally, Section V provides several experiments

that demonstrate the performance of the Analysis K-SVD

algorithm.

II. A CLOSER LOOK AT THE ANALYSIS MODEL

In this section we brieﬂy review the co-rank analysis model,

and characterize the signals that belong to it, and which our

learning algorithm is to operate on. The content of this section

relies in part on explanations found in [8], [9], [13].

The analysis model for the signal x ∈ R

uses the possibly

redundant analysis dictionary Ω ∈ R

p×d

(redundancy here

implies p ≥ d), and assumes that the analysis representa-

tion vector Ωx should be sparse. In this work we consider

speciﬁcally `

sparsity, which implies that Ωx contains many

zeros. The co-sparsity ` of the analysis model is deﬁned as

the number of zeros in the vector Ωx,

kΩxk

= p − `. (1)

In the synthesis model the representation α is obtained

by a complex and non-linear pursuit process that seeks (or

approximates) the sparsest solution to the linear system of

equations Dα = x. This representation can be arbitrarily

sparse, kαk

= k  d. The signal x is characterized by the

k non-zero indices in the representation vector α, and their

associated atoms deﬁne the subspace this signal belongs to.

The dimension of this subspace equals k and as we mentioned

before, it is small with respect to the signal dimension d.

In contrast, in the analysis model the computation of the

representation is trivial, obtained by the multiplication Ωx. In

this model we put an emphasis on the zeros of Ωx, and deﬁne

the co-support Λ of x as the set of ` = |Λ| rows that are

orthogonal to it. In other words, Ω

x = 0, where Ω

is a

sub-matrix of Ω that contains only the rows indexed in Λ.

For a given analysis dictionary Ω, we deﬁne the co-rank of a

signal x with co-support Λ as the rank of Ω

. The signal x

is thus characterized by its co-support, which determines the

subspace it is orthogonal to, and consequently the complement

space to which it belongs. Just like in the synthesis model, we

assume that the dimension of the subspace the signal belongs

to, denoted by r, is small, namely r  d. The co-rank of such

an analysis signal is d − r.

How sparse can the analysis representation vector be? Let

us ﬁrst assume that the rows in Ω are in general-position,

implying that every subset of d or less rows are necessarily

linearly independent. This is equivalent to the claim that the

spark of Ω

is full [2]. Naturally, for this case, ` < d, since

otherwise there would be d independent rows orthogonal to

x, implying x = 0. Thus, in this case the analysis model

leads necessarily to a mild sparsity, kΩxk

> p − d, and

for a highly redundant analysis operator, the cardinality of

the analysis representation vector Ωx is expected to be quite

high. In this case, the dimension of the subspace the signal

belongs to is r = d − `.

A more interesting case is when Ω

has non-full spark,

implying that linear dependencies exist between the dictionary

atoms. The immediate implication is that ` could go beyond

d, and yet the signal would not necessarily be nulled. An

example of such a dictionary is the set of cyclic horizontal

and vertical one-sided derivatives, applied on a 2D signal of

size

√

d×

√

d. The corresponding analysis dictionary, denoted

Ω

DIF

, is of size 2d × d, thus twice redundant. Figure 1

shows this dictionary for d = 25. In [9] this dictionary is

discussed in detail, showing that its rows exhibit strong linear

dependencies.

Generating a random analysis signal amounts to the follow-

ing process: Choose a set of row indices Λ ⊆ {1, . . . , p} —

this will be the signal’s co-support. Starting with a random

vector u, project it onto the subspace orthogonal to Ω

x = (I − Ω

†

Ω

)u, (2)

and x is an analysis signal that satisﬁes our sparsity assump-

tion. For a general-positioned Ω we choose ` rows from Ω

at random. Otherwise we choose d − r linearly independent

rows from Ω. This choice is still done in a random fashion,

but is naturally more restricted. In the experiments that follow

we shall use such randomly generated signals, when dealing

with synthetic experiments.

As mentioned above, when the rows in Ω are not in general-

position, the co-sparsity ` can be greater than d. In this case,

once a signal x has been generated using the process (2),

computation of its analysis representation Ωx, could reveal

additional rows that are orthogonal to the signal, due to linear

dependence on the chosen subset Λ. To demonstrate this

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Figure 1. Left: The dictionary Ω

DI F

of size 50 × 25, corresponding to horizontal and vertical cyclic one-sided derivatives of image patches of size 5 × 5

pixels. Right: Examples of sparse analysis signals (5 × 5 patches) residing in 4-dimensional subspaces related to the dictionary Ω

DI F

, i.e., orthogonal to 21

linearly-independent atoms each.

0 10 20 30 40 50 60

100

120

140

cosparsity l

# of signals

Figure 2. A histogram of the effective co-sparsities of the 1000 analysis

signals generated from Ω

DI F

of size 50× 25. The reference value of ` = 21

is indicated by the thin vertical line. As can be seen, the effective co-sparsities

are all strictly higher.

behavior, we generated 1000 unit-norm analysis signals resid-

ing in 4-dimensional subspaces related to Ω

DIF

∈ R

50×25

Figure 1 presents a set of such randomly created signals. For

these signals, Figure 2 presents a histogram of the effective

co-sparsities. As can be seen, though the signals are each

orthogonal to subspaces of rank 21, their actual co-sparsities

are much higher, varying in the range 23 to 40. Thus, we see

that by allowing linear dependencies between the rows in Ω,

co-sparsities much higher than the signal dimension can be

achieved.

An equivalent way to interpret the co-rank analysis and

synthesis models is as Unions of Subspaces (UoS) signal

models [23]. As we have seen, in both cases the sparse signals

reside within some UoS deﬁned by the dictionary atoms. In

the synthesis case, these subspaces are formed by the spans

of all sets of atoms with rank ≤ k for some choice of

k. In contrast, in the analysis case these subspaces are the

orthogonal complements of the sets of atoms with rank = d−r.

We note that when the dictionaries are in general position,

the number of such subsets is





and



d−r



, respectively.

In general, the UoS’s associated with the two models will be

very different. For example, if p = n = 2d, k = r  d and

the rows in Ω are in general-position, the subspaces united

by the two models are of the same dimension (k or r), but

their number is entirely different, with many more subspaces

included in the analysis model.

III. ANALYSIS SPARSE-CODING

A. Deﬁning the Pursuit Problem

Before we study the problem of learning the analysis dic-

tionary Ω, we have to consider a simpler task called analysis

sparse-coding or analysis pursuit. As we shall see in the

next section, this is an important building-block in the overall

dictionary-learning procedure.

A convenient property of the analysis approach is that given

a signal x, we can readily compute its analysis coefﬁcients

Ωx, and thus determine the cardinality of its analysis repre-

sentation. However, if we assume an additive contamination

of the signal, y = x + v, then computation of the analysis

representation Ωx is no longer simple. We shall assume

that v is a zero-mean white-Gaussian additive noise vector.

Recovering the correct signal x from its noisy version (and

thereby computing the analysis representation), y, requires

solving a problem of the form

= Argmin

x, Λ

kx − yk

Subject To (3)

Ω

x = 0

Rank (Ω

) = d − r

= Argmin

x, Λ

Rank(Ω

) Subject To (4)

Ω

x = 0

kx − yk

≤ .

In Equation (3) we require a co-rank of d −r for the obtained

solution, while in Equation (4), we constrain the solution to

be -close to the given noisy signal, where this error tolerance

is derived from the noise power. The above problems can

be considered as denoising schemes, as

x is an attempt to

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

estimate the true noiseless signal x. The two problems (3) and

(4) are equivalent, of course, given the correct correspondence

between r and , and the choice between them depends on the

available information regarding the process that generated y.

We refer to these problems as the analysis sparse-coding or

analysis-pursuit problems.

In principle, denoising is possible with the analysis model

because, once the co-support has been detected, projection

on the complement subspace attenuates the additive noise in

the co-support subspace, thus cleaning the signal. Indeed, the

higher the dimension of the true co-support, the better the

denoising is expected to perform.

In an oracle setup, the true co-support Λ is known, and thus

can be used for obtaining a signal recovery,

x =



I − Ω

†

Ω



y. (5)

The mean denoising error in the oracle setup is given by

Ekx −

= tr



I − Ω

†

Ω



= rσ

, (6)

where tr(·) denotes the trace of a matrix. In the ﬁrst equality

we used the fact that the noise is white and Gaussian and

that Ω

x = 0. The last equality holds since I − Ω

†

Ω

is a

projection matrix onto a r-dimensional space, so that it has

two eigenvalues – a zero eigenvalue of multiplicity d −r and

an eigenvalue 1 with multiplicity r. This should remind the

reader of the oracle error in the synthesis case, as described

in [2].

Similar to the synthesis sparse approximation problem, the

problems posed in Equations (3) and (4) are combinatorial

in nature and can thus only be approximated in general. One

approach to approximating the solution is to relax the `

norm

and replace it with an `

penalty function, producing

x = Argmin

kx − yk

Subject To kΩxk

≤ T. (7)

This approach is parallel to the basis-pursuit approach for

synthesis approximation [24], and the resulting problem may

be solved via an iterated re-weighted least squares (IRLS)

method [25], or using standard quadratic or conic optimization

methods.

B. The Backward-Greedy Algorithm

A second approach parallels the synthesis greedy pursuit

algorithms [26], [27], and is the one we shall use in this

work. It suggests selecting rows from Ω one-by-one in a

greedy fashion. The solution can be built by either detecting

the rows that correspond to the non-zeros in Ωx, or by

detecting the zeros. The ﬁrst approach is the one taken by

the Greedy-Analysis-Pursuit (GAP) algorithm, described in

[9]. We shall take the alternative (and simpler) approach of

ﬁnding the co-support Λ one element at a time. We refer

to this algorithm as the Backward-Greedy (BG) Algorithm,

as it is gathering the zeros in the representation. A detailed

description of this algorithm, is given below in Algorithm 1.

Note that this algorithm takes as input the co-rank d −r of the

desired co-support rather than the exact number of zeros, and

thus the actual number of vanishing coefﬁcients in the output

representation may be larger than d − r.

The process begins by setting

x = y and initializing the

co-support to be an empty set of rows. In each iteration, the

inner-products Ω

x are computed, and the row with the smallest

non-zero inner-product is selected and added to the set. The

solution

x is then updated by projecting y on the orthogonal

space to the selected rows. Finally, the co-support is reﬁned

by recalculating the representation vector Ω

x and ﬁnding the

additional coefﬁcients that fall below some small threshold



. This can reveal additional rows that are orthogonal to

the current estimate of the signal, namely the rows that are

spanned by the existing set of rows Ω

. The process described

above repeats until the target subspace dimension is achieved.

Algorithm 1 BACKWARD-GREEDY

1: Input: Analysis dictionary Ω ∈ R

p×d

, signal y ∈ R

, and

target co-rank d − r

2: Output: Signal

x ∈ R

with co-rank d−r and minimizing

ky −

3: Initialization: Set i = 0, Λ

:= ∅,

:= y

4: for i = 1 . . . d − r do

5: Sweep:

:= Argmin

k /∈Λ

i−1

6: Update Co-Support: Λ

:= Λ

i−1

∪ {

}

7: Project:



I − Ω

†

Ω



8: Reﬁne Co-Support:

:= {k |1 ≤ k ≤ p, |w

| < 

}

9: end for

10: return

x =

d−r

In practice, the above algorithm can be implemented ef-

ﬁciently by accumulating an orthogonalized set of the co-

support rows. This means that once

has been found and

the row w

is about to join the co-support, it is ﬁrst orthog-

onalized with respect to the already accumulated rows using

a modiﬁed Gram-Schmidt process. Denoting by {q

}

i−1

j=1

the

orthogonal set accumulated so far (as column vectors), the

orthogonalization of w

is obtained by

= w

−

i−1

j=1

. (8)

This should be followed by a normalization of this vector,

= q

/kq

The above-described orthogonalization process is done for

one purpose: avoiding the matrix inversion in the update of

The “Projection” step in Algorithm 1 translates comfortably



I − Ω

†

Ω



y =





I −

j=1





y. (9)

Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model

Figures

Citations

High Capacity Reversible Data Hiding in Encrypted Images by Patch-Level Sparse Representation

Multi-Scale Patch-Based Image Restoration

Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation

Deep dictionary learning

Greedy-like algorithms for the cosparse analysis model

References

Atomic Decomposition by Basis Pursuit

Matching pursuits with time-frequency dictionaries

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering

Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

Related Papers (5)

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

Robust Face Recognition via Sparse Representation

Matching pursuits with time-frequency dictionaries

Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Analysis k-svd: a dictionary-learning algorithm for the analysis sparse model" ?

Q2. What future works have the authors mentioned in the paper "Analysis k-svd: a dictionary-learning algorithm for the analysis sparse model" ?

Q3. What is the co-sparsity of the analysis model?

Q4. What is the training set used by the FoE approach?

Q5. What is the main goal of the work in [16]?

Q6. What is the purpose of this paper?

Q7. What is the ieee's approach to the constraint on xi?

Q8. How can the authors replace the provisional steps in Algorithm 2?

Q9. What are the popular techniques for learning the analysis dictionary?

Q10. What is the co-support of a signal x?

Q11. What is the simplest way to generate a random analysis signal?

Q12. What is the proposed algorithm for learning the analysis atoms?

Q13. What is the ieee's view on the update of wj?