scispace - formally typeset
Open AccessProceedings ArticleDOI

Image super-resolution as sparse representation of raw image patches

Reads0
Chats0
TLDR
It is shown that a small set of randomly chosen raw patches from training images of similar statistical nature to the input image generally serve as a good dictionary, in the sense that the computed representation is sparse and the recovered high-resolution image is competitive or even superior in quality to images produced by other SR methods.
Abstract
This paper addresses the problem of generating a super-resolution (SR) image from a single low-resolution input image. We approach this problem from the perspective of compressed sensing. The low-resolution image is viewed as downsampled version of a high-resolution image, whose patches are assumed to have a sparse representation with respect to an over-complete dictionary of prototype signal-atoms. The principle of compressed sensing ensures that under mild conditions, the sparse representation can be correctly recovered from the downsampled signal. We will demonstrate the effectiveness of sparsity as a prior for regularizing the otherwise ill-posed super-resolution problem. We further show that a small set of randomly chosen raw patches from training images of similar statistical nature to the input image generally serve as a good dictionary, in the sense that the computed representation is sparse and the recovered high-resolution image is competitive or even superior in quality to images produced by other SR methods.

read more

Content maybe subject to copyright    Report

Image Super-Resolution as Sparse Representation of Raw Image Patches
Jianchao Yang
, John Wright
, Yi Ma
, Thomas Huang
University of Illinois at Urbana-Champagin
Beckman Institute
and Coordinated Science Laboratory
{jyang29, jnwright, yima, huang@uiuc.edu}
Abstract
This paper addresses the problem of generating a super-
resolution (SR) image from a single low-resolution input
image. We approach this problem from the perspective of
compressed sensing. The low-resolution image is viewed
as downsampled version of a high-resolution image, whose
patches are assumed to have a sparse representation with
respect to an over-complete dictionary of prototype signal-
atoms. The principle of compressed sensing ensures that
under mild conditions, the sparse representation can be
correctly recovered from the downsampled signal. We will
demonstrate the effectiveness of sparsity as a prior for reg-
ularizing the otherwise ill-posed super-resolution problem.
We further show that a small set of randomly chosen raw
patches from training images of similar statistical nature to
the input image generally serve as a good dictionary, in the
sense that the computed representation is sparse and the
recovered high-resolution image is competitive or even su-
perior in quality to images produced by other SR methods.
1. Introduction
Conventional approaches to generating a super-
resolution (SR) image require multiple low-resolution
images of the same scene, typically aligned with sub-pixel
accuracy. The SR task is cast as the inverse problem of
recovering the original high-resolution image by fusing
the low-resolution images, based on assumptions or
prior knowledge about the generation model from the
high-resolution image to the low-resolution images. The
basic reconstruction constraint is that applying the image
formation model to the recovered image should produce
the same low-resolution images. However, because much
information is lost in the high-to-low generation process,
the reconstruction problem is severely underdetermined,
and the solution is not unique. Various methods have been
proposed to further regularize the problem. For instance,
one can choose a MAP (maximum a-posteriori) solution
under generic image priors such as Huber MRF (Markov
Random Field) and Bilateral Total Variation [
14, 11, 25].
However, the performance of these reconstruction-based
super-resolution algorithms degrades rapidly if the mag-
nification factor is large or if there are not enough low-
resolution images to constrain the solution, as in the ex-
treme case of only a single low-resolution input image [
2].
Another class of super-resolution methods that can over-
come this difficulty are learning based approaches, which
use a learned co-occurrence prior to predict the correspon-
dence between low-resolution and high-resolution image
patches [
12, 26, 16, 5, 20].
In [
12], the authors propose an example-based learn-
ing strategy that applies to generic images where the low-
resolution to high-resolution prediction is learned via a
Markov Random Field (MRF) solved by belief propaga-
tion. [
23] extends this approach by using the Primal Sketch
priors to enhance blurred edges, ridges and corners. Nev-
ertheless, the above methods typically require enormous
databases of millions of high-resolution and low-resolution
patch pairs to make the databases expressiveenough. In [
5],
the authors adopt the philosophy of LLE [
22] from manifold
learning, assuming similarity between the two manifolds in
the high-resolution patch space and the low-resolution patch
space. Their algorithm maps the local geometry of the low-
resolution patch space to the high-resolution patch space,
generating high-resolution patch as a linear combination of
neighbors. Using this strategy, more patch patterns can be
represented using a smaller training database. However, us-
ing a fixed number K neighbors for reconstruction often re-
sults in blurring effects, due to over- or under-fitting.
In this paper, we focus on the problem of recovering
the super-resolution version of a given low-resolution im-
age. Although our method can be readily extended to han-
dle multiple input images, we mostly deal with a single in-
put image. Like the aforementioned learning-based meth-
ods, we will rely on patches from example images. Our
method does not require any learning on the high-resolution
patches, instead working directly with the low-resolution
training patches or their features. Our approach is motivated
1

Figure 1. Reconstruction of a raccoon face with magnification fac-
tor 2. Left: result by our method. Right: the original image. There
is little noticeable difference.
by recent results in sparse signal representation, which en-
sure that linear relationships among high-resolution signals
can be precisely recovered from their low-dimensional pro-
jections [3, 9].
To be more precise, let D R
n×K
be an overcomplete
dictionary of K prototype signal-atoms, and suppose a sig-
nal x R
n
can be represented as a sparse linear combi-
nation of these atoms. That is, the signal vector x can be
written as x = Dα
0
where α
0
R
K
is a vector with very
few ( K) nonzero entries. In practice, we might observe
only a small set of measurements y of x:
y
.
= Lx = LDα
0
, (1)
where L R
k×n
with k < n. In the super-resolution
context, x is a high-resolution image (patch), while y is
its low-resolution version (or features extracted from it). If
the dictionary D is overcomplete, the equation x = is
underdetermined for the unknown coefficients α. The equa-
tion y = LDα is even more dramatically underdetermined.
Nevertheless, under mild conditions, the sparsest solution
α
0
to this equation is unique. Furthermore, if D satisfies
an appropriate near-isometry condition, then for a wide va-
riety of matrices L, any sufficiently sparse linear represen-
tation of a high-resolution image x in terms of the D can be
recovered (almost) perfectly from the low-resolution image
[
9, 21]. Figure 1 shows an example that demonstrates the
capabilities of our method derived from this principle. Even
for this complicated texture, sparse representation recovers
a visually appealing reconstruction of the original signal.
Recently sparse representation has been applied to many
other related inverse problems in image processing, such
as compression, denoising [
10], and restoration [17], often
improving on the state-of-the-art. For example in [10], the
authors use the K-SVD algorithm [1] to learn an overcom-
plete dictionary from natural image patches and success-
fully apply it to the image denoising problem. In our set-
ting, we do not directly compute the sparse representation
of the high-resolution patch. Instead, we will work with two
coupled dictionaries, D
~
for high-resolution patches, and
D
= LD
~
for low-resolution patches. The sparse repre-
sentation of a low-resolution patch in terms of D
will be
directly used to recover the corresponding high-resolution
patch from D
~
. We obtain a locally consistent solution by
allowing patches to overlap and demanding that the recon-
structed high-resolution patches agree on the overlapped ar-
eas. Finally, we apply global optimization to eliminate the
reconstruction errors in the recovered high-resolution im-
age from local sparse representation, suppressing noise and
ensuring consistency with the low-resolution input.
Compared to the aforementioned learning-based meth-
ods, our algorithm requires a much smaller database. The
online recovery of the sparse representation uses the low-
resolution dictionary only the high-resolution dictionary
is used only to calculate the final high-resolution image.
The computation, mainly based on linear programming, is
reasonably efficient and scalable. In addition, the computed
sparse representation adaptively selects the most relevant
patches in the dictionary to best represent each patch of the
given low-resolution image. This leads to superior perfor-
mance, both qualitatively and quantitatively, compared to
methods [
5] that use a fixed number of nearest neighbors,
generating sharper edges and clearer textures.
The remainder of this paper is organized as follows. Sec-
tion
2 details our formulation and solution to the image
super-resolution problem based on sparse representation. In
Section 3, we discuss how to prepare a dictionary from sam-
ple images and what features to use. Various experimental
results in Section
4 demonstrate the efficacy of sparsity as a
prior for image super-resolution.
2. Super-resolution from Sparsity
The single-image super-resolution problem asks: given a
low-resolution image Y , recover a higher-resolution image
X of the same scene. The fundamental constraint is that the
recovered X should be consistent with the input, Y :
Reconstruction constraint. The observed low-resolution
image Y is a blurred and downsampled version of the solu-
tion X:
Y = D HX (2)
Here, H represents a blurring filter, and D the downsam-
pling operator.
Super-resolution remains extremely ill-posed, since for
a given low-resolution input Y , infinitely many high-
resolution images X satisfy the above reconstruction con-
straint. We regularize the problem via the following prior
on small patches x of X:
Sparse representation prior. The patches x of the high-
resolution image X can be represented as a sparse linear
combination in a dictionary D
~
of high-resolution patches
sampled from training images:
1
x D
~
α for some α R
K
with kαk
0
K. (3)
1
Similar mechanisms sparse coding with an overcomplete dictionary
are also believed to be employed by the human visual system [
19].

To address the super-resolution problem using the sparse
representation prior, we divide the problem into two steps.
First, using the sparse prior (
3), we find the sparse repre-
sentation for each local patch, respecting spatial compati-
bility between neighbors. Next, using the result from this
local sparse representation, we further regularize and refine
the entire image using the reconstruction constraint (
2). In
this strategy, a local model from the sparse prior is used
to recover lost high-frequency for local details. The global
model from the reconstruction constraint is then applied to
remove possible artifacts from the first step and make the
image more consistent and natural.
2.1. Local Model from Sparse Representation
As in the patch-based methods mentioned previously,
we try to infer the high-resolution patch for each low-
resolution patch from the input. For this local model, we
havetwo dictionaries D
and D
~
: D
~
is composed of high-
resolution patches and D
is composed of corresponding
low-resolution patches. We subtract the mean pixel value
for each patch, so that the dictionary represents image tex-
tures rather than absolute intensities.
For each input low-resolution patch y, we find a sparse
representation with respect to D
. The corresponding high-
resolution patches D
~
will be combined according to these
coefficients to generate the output high-resolution patch x.
The problem of finding the sparsest representation of y can
be formulated as:
min kαk
0
s.t. kF D
α F yk
2
2
ǫ, (4)
where F is a (linear) feature extraction operator. The main
role of F in (
4) is to provide a perceptually meaningful con-
straint
2
on how closely the coefficients α must approximate
y. We will discuss the choice of F in Section
3.
Although the optimization problem (4) is NP-hard in
general, recent results [7, 8] indicate that as long as the
desired coefficients α are sufficiently sparse, they can be
efficiently recovered by instead minimizing the
1
-norm, as
follows:
min kαk
1
s.t. kF D
α F yk
2
2
ǫ. (5)
Lagrange multipliers offer an equivalent formulation
min λkαk
1
+
1
2
kF D
α F yk
2
2
, (6)
where the parameter λ balances sparsity of the solution and
fidelity of the approximation to y. Notice that this is es-
sentially a linear regression regularized with
1
-norm on the
coefficients, known in statistical literature as the Lasso [
24].
2
Traditionally, one would seek the sparsest α s.t. kD
α yk
2
ǫ.
For super-resolution, it is more appropriate to replace this 2-norm with a
quadratic norm k · k
F
T
F
that penalizes visually salient high-frequency
errors.
Solving (
6) individually for each patch does not guar-
antee compatibility between adjacent patches. We enforce
compatibility between adjacent patches using a one-pass
algorithm similar to that of [
13].
3
The patches are pro-
cessed in raster-scan order in the image, from left to right
and top to bottom. We modify (
5) so that the super-
resolution reconstruction D
~
α of patch y is constrained to
closely agree with the previously computed adjacent high-
resolution patches. The resulting optimization problem is
min kαk
1
s.t. kF D
α F yk
2
2
ǫ
1
kP D
~
α wk
2
2
ǫ
2
,
(7)
where the matrix P extracts the region of overlap be-
tween current target patch and previously reconstructed
high-resolution image, and w contains the values of the pre-
viously reconstructed high-resolution image on the overlap.
The constrained optimization (
7) can be similarly reformu-
lated as:
min λkαk
1
+
1
2
k
˜
D α ˜yk
2
2
, (8)
where
˜
D =
F D
βP D
~
and ˜y =
F y
βw
. The parameter β
controls the tradeoff between matching the low-resolution
input and finding a high-resolution patch that is compatible
with its neighbors. In all our experiments, we simply set
β = 1. Given the optimal solution α
to (
8), the high-
resolution patch can be reconstructed as x = D
~
α
.
2.2. Enforcing Global Reconstruction Constraint
Notice that (
5) and (7) do not demand exact equality
between the low-resolution patch y and its reconstruction
D
α. Because of this, and also because of noise, the
high-resolution image X
0
produced by the sparse repre-
sentation approach of the previous section may not satisfy
the reconstruction constraint (
2) exactly. We eliminate this
discrepency by projecting X
0
onto the solution space of
DHX = Y , computing
X
= arg min
X
kX X
0
k s.t. DHX = Y . (9)
The solution to this optimization problem can be efficiently
computed using the back-projection method, originally de-
veloped in computer tomography and applied to super-
resolution in [
15, 4]. The update equation for this iterative
method is
X
t+1
= X
t
+ ((Y DHX
t
) s) p, (10)
where X
t
is the estimate of the high-resolution image af-
ter the t-th iteration, p is a “backprojection” filter, and s
denotes upsampling by a factor of s.
3
There are different ways to enforce compatibility. In [
5], the values in
the overlapped regions are simply averaged, which will result in blurring
effects. The one-pass algorithm [
13] is shown to work almost as well as
the use of a full MRF model [
12].

Algorithm 1 (Super-resolution via Sparse Representation).
1: Input: training dictionaries D
~
and D
, a low-
resolution image Y .
2: for each 3 × 3 patch y of Y , taken starting from the
upper-left corner with 1 pixel overlap in each direction,
Solve the optimization problem with
˜
D and ˜y de-
fined in (
8): min λkαk
1
+
1
2
k
˜
D α ˜yk
2
2
.
Generate the high-resolution patch x = D
~
α
.
Put the patch x into a high-resolution image X
0
.
3: end
4: Using back-projection, find the closest image to X
0
which satisfies the reconstruction constraint:
X
= arg min
X
kX X
0
k s.t. DHX = Y .
5: Output: super-resolution image X
.
We take result X
from backprojection as our final es-
timate of the high-resolution image. This image is as close
as possible to the initial super-resolution X
0
given by spar-
sity, while satisfying the reconstruction constraint. The en-
tire super-resolution process is summarized as Algorithm
1.
2.3. Global Optimization Interpretation
The simple SR algorithm outlined above can be viewed
as a special case of a general sparse representation frame-
work for inverse problems in image processing. Related
ideas have been profitably applied in image compression,
denoising [
10], and restoration [17]. These connections
provide context for understanding our work, and also sug-
gest means of further improving the performance,at the cost
of increased computational complexity.
Given sufficient computational resources, one could
in principle solve for the coefficients associated with
all patches simultaneously. Moreover, the entire high-
resolution image X itself can be treated as a variable.
Rather than demanding that X be perfectly reproduced by
the sparse coefficients α, we can penalize the difference be-
tween X and the high-resolution image given by these co-
efficients, allowing solutions that are not perfectly sparse,
but better satisfy the reconstruction constraints. This leads
to a large optimization problem:
X
= arg min
X,{α
ij
}
kDHX Y k
2
2
+ η
X
i,j
kα
ij
k
0
+ γ
X
i,j
kD
~
α
ij
P
ij
Xk
2
2
+ τ ρ(X)
.
(11)
Here, α
ij
denotes the representation coefficients for the
(i, j)
th
patch of X, and P
ij
is a projection matrix that se-
lects the (i, j)
th
patch from X. ρ(X) is a penalty function
that encodes prior knowledge about the high-resolution im-
age. This function may depend on the image category, or
may take the form of a generic regularization term (e.g.,
Huber MRF, Total Variation, Bilateral Total Variation).
Algorithm
1 can be interpreted as a computationally effi-
cient approximation to (
11). The sparse representation step
recovers the coefficients α by approximately minimizing
the sum of the second and third terms of (
11). The sparsity
term kα
ij
k
0
is relaxed to kα
ij
k
1
, while the high-resolution
fidelity term kD
~
α
ij
P
ij
Xk
2
is approximated by its low-
resolution version kF D
α
ij
F y
ij
k
2
.
Notice, that if the sparse coefficients α are fixed, the
third term of (
11) essentially penalizes the difference be-
tween the super-resolution image X and the reconstruc-
tion given by the coefficients:
P
i,j
kD
~
α
ij
P
ij
Xk
2
2
kX
0
Xk
2
2
. Hence, for small γ, the back-projection step
of Algorithm
1 approximatelyminimizes the sum of the first
and third terms of (
11).
Algorithm
1 does not, however, incorporate any prior be-
sides sparsity of the representation coefficients the term
ρ(X) is absent in our approximation. In Section
4 we will
see that sparsity in a relevant dictionary is a strong enough
prior that we can already achieve good super-resolution per-
formance. Nevertheless, in settings where further assump-
tions on the high-resolution signal are available, these pri-
ors can be incorperated into the global reconstruction step
of our algorithm.
3. Dictionary Preparation
3.1. Random Raw Patches from Training Images
Learning an over-complete dictionary capable of opti-
mally representing broad classes of image patches is a dif-
ficult problem. Rather than trying to learn such a dictionary
[19, 1] or using a generic set of basis vectors [21] (e.g.,
Fourier, Haar, curvelets etc.), we generate dictionaries by
simply randomly sampling raw patches from training im-
ages of similar statistical nature. We will demonstrate that
so simply prepared dictionaries are already capable of gen-
erating high-quality reconstructions,
4
when used together
with the sparse representation prior.
Figure
2 shows several training images and the patches
sampled from them. For our experiments, we prepared
two dictionaries: one sampled from flowers (Figure
2 top),
which will be applied to generic images with relative sim-
ple textures, and one sampled from animal images (Figure
2 bottom), with fine furry or fractal textures. For each high-
resolution training image X, we generate the correspond-
ing low-resolution image Y by blurring and downsampling.
For each category of images, we sample only about 100,000
patches from about 30 training images to form each dic-
tionary, which is considerably smaller than that needed by
4
The competitiveness of such random patches has also been noticed
empirically in the context of content-based image classification [
18].

Figure 2. Left: three out of the 30 training images we use in our
experiments. Right: the training patches extracted from them.
other learning-based methods [
12, 23]. Empirically, we find
such a small dictionary is more than sufficient.
3.2. Derivative Features
In (
4), we use a feature transformation F to ensure that
the computed coefficients fit the most relevant part of the
low-resolution signal. Typically, F is chosen as some kind
of high-pass filter. This is reasonable from a perceptual
viewpoint, since people are more sensitive to the high-
frequency content of the image. The high-frequency com-
ponents of the low-resolution image are also arguably the
most important for predicting the lost high-frequency con-
tent in the target high-resolution image.
Freeman et al. [12] use a high-pass filter to extract the
edge information from the low-resolution input patches as
the feature. Sun et al. [23] use a set of Gaussian derivative
filters to extract the contours in the low-resolution patches.
Chang et al. [
5] use the first-order and second-order gradi-
ents of the patches as the representation. For our algorithm,
we also use the first-order and second-order derivatives as
the feature for the low-resolution patch. While simple, these
features turn out to work very well. To be precise, the four
1-D filters used to extract the derivatives are:
f
1
= [1, 0, 1], f
2
= f
T
1
,
f
3
= [1, 0, 2, 0, 1], f
4
= f
T
3
,
(12)
where the superscript T means transpose. Applying these
four filters, we get four description feature vectors for each
patch, which are concatenated as one vector as the nal rep-
resentation of the low-resolution patch.
4. Experiments
Experimental settings: In our experiments, we will
mostly magnify the input image by a factor of 3. In the
low-resolution images, we always use 3 × 3 low-resolution
patches, with overlap of 1 pixel between adjacent patches,
corresponding to 9 × 9 patches with overlap of 3 pixels for
the high-resolution patches. The features are not extracted
directly from the 3×3 low-resolution patch, but rather from
an upsampled version produced by bicubic interpolation.
For color images, we apply our algorithm to the illuminance
0 50 100 150 200 250 300
0
5
10
15
20
25
30
35
Patch Index
Number of Supports
Figure 3. Number of nonzero coefficients in the sparse representa-
tion computed for 300 typical patches in a test image.
component only, since humans are more sensitive to illumi-
nance changes. Our algorithm has only one free parameter
λ, which balances sparsity of the solution with fidelity to
the reconstruction constraint. In our experience, the recon-
struction quality is stable over a large range of λ. The rule
of thumb, λ = 50 × dim(patch feature), gives good results
for all the test cases in this paper.
One advantage of our approach over methods such as
neighbor embedding [
5] is that it selects the number of rel-
evant dictionary elements adaptively for each patch. Figure
3 demonstrates this for 300 typical patches in one test im-
age. Notice that the recovered coefficients are always sparse
(< 35 nonzero entries), but the level of sparsity varies de-
pending on the complexity of each test patch. However, em-
pirically, we find the support of the recovered coefficients
typically is neither a superset nor subset of the K nearest
neighbors [
5]. The chosen patches are more informative for
recovering the high-resolution patch, leading to more faith-
ful texture reconstruction in the experiments below.
Experimental results: We first apply our algorithm to
generic images including flower, human face, and architec-
ture, all using the same dictionary sampled from training
images of flowers (first row of Figure
2). We will further
demonstrate our algorithm’s ability to handle complicated
textures in animal images, with the second dictionary sam-
pled from training animal images (second row of Figure
2).
Figure
4 compares our results with neighbor embed-
ding [
5]
5
on two test images of a flower and a girl. In
both cases, our method givessharper edges and reconstructs
more clearly the details of the scene. There are noticeable
differences in the texture of the leaves, the fuzz on the leaf-
stalk, and also the freckles on the face of the girl.
In Figure
5, we compare our method with several other
methods on an image of the Parthenon used in [6], including
back projection, neighbor embedding [5], and the recently
5
Our implementation of the neighbor embedding method [
5] differs
slightly from the original. The feature for the low-resolution patch is not
extracted from the original 3 × 3 patch, which will give smoother results,
but on the upsampled low-resolution patch. We find that setting K = 15
gives the best performance. This is approximately the average number of
coefficients recovered by sparse representation (see Figure
3).

Citations
More filters
Proceedings ArticleDOI

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Book ChapterDOI

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

TL;DR: In this paper, the authors combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image style transfer, where a feedforward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
Journal ArticleDOI

Image Super-Resolution Using Deep Convolutional Networks

TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Posted Content

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

TL;DR: This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
Journal ArticleDOI

Image Super-Resolution Via Sparse Representation

TL;DR: This paper presents a new approach to single-image superresolution, based upon sparse signal representation, which generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book

Compressed sensing

TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Journal ArticleDOI

Nonlinear dimensionality reduction by locally linear embedding.

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Journal ArticleDOI

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Journal ArticleDOI

Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

TL;DR: This work addresses the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image, and uses the K-SVD algorithm to obtain a dictionary that describes the image content effectively.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions in "Image super-resolution as sparse representation of raw image patches" ?

This paper addresses the problem of generating a superresolution ( SR ) image from a single low-resolution input image. The authors will demonstrate the effectiveness of sparsity as a prior for regularizing the otherwise ill-posed super-resolution problem. The authors further show that a small set of randomly chosen raw patches from training images of similar statistical nature to the input image generally serve as a good dictionary, in the sense that the computed representation is sparse and the recovered high-resolution image is competitive or even superior in quality to images produced by other SR methods. 

However, one of the most important questions for future investigation is to determine, in terms of the within-category variation, the number of raw sample patches required to generate a dictionary satisfying the sparse representation prior. 

ifD satisfies an appropriate near-isometry condition, then for a wide variety of matricesL, any sufficiently sparse linear representation of a high-resolution imagex in terms of theD can be recovered (almost) perfectly from the low-resolution image [9, 21]. 

The high-frequency components of the low-resolution image are also arguably the most important for predicting the lost high-frequency content in the target high-resolution image. 

The authors eliminate this discrepency by projectingX0 onto the solution space of DHX = Y , computingX∗ = arg min X ‖X − X0‖ s.t. DHX = Y . (9)The solution to this optimization problem can be efficiently computed using the back-projection method, originally developed in computer tomography and applied to superresolution in [15, 4]. 

The authors obtain a locally consistent solution by allowing patches to overlap and demanding that the reconstructed high-resolution patches agree on the overlapped areas. 

The sparse representation of a low-resolution patch in terms ofDℓ will bedirectly used to recover the corresponding high-resolution patch fromD~. 

Because of this, and also because of noise, the high-resolution imageX0 produced by the sparse representation approach of the previous section may not satisfy the reconstruction constraint (2) exactly. 

αij denotes the representation coefficients for the (i, j)th patch ofX, andPij is a projection matrix that selects the(i, j)th patch fromX. ρ(X) is a penalty function that encodes prior knowledge about the high-resolution image. 

One advantage of their approach over methods such as neighbor embedding [5] is that it selects the number of relevant dictionary elementsadaptively for each patch. 

In [5], the authors adopt the philosophy of LLE [22] from manifold learning, assuming similarity between the two manifolds in the high-resolution patch space and the low-resolution patch space. 

The authors now conduct more challenging experiments on more intricate textures found in animal images, using the animal dictionary with merely 100,000 training patches (second row of Figure2). 

For each category of images, the authors sample only about 100,000 patches from about 30 training images to form each dictionary, which is considerably smaller than that needed by4 

Applying these four filters, the authors get four description feature vectors for each patch, which are concatenated as one vector as the final representation of the low-resolution patch.