scispace - formally typeset
Open AccessProceedings ArticleDOI

Fast Direct Super-Resolution by Simple Functions

Reads0
Chats0
TLDR
This paper proposes to split the feature space into numerous subspaces and collect exemplars to learn priors for each subspace, thereby creating effective mapping functions and facilitating both feasibility of using simple functions for super-resolution, and efficiency of generating high-resolution results.
Abstract
The goal of single-image super-resolution is to generate a high-quality high-resolution image based on a given low-resolution input. It is an ill-posed problem which requires exemplars or priors to better reconstruct the missing high-resolution image details. In this paper, we propose to split the feature space into numerous subspaces and collect exemplars to learn priors for each subspace, thereby creating effective mapping functions. The use of split input space facilitates both feasibility of using simple functions for super-resolution, and efficiency of generating high-resolution results. High-quality high-resolution images are reconstructed based on the effective learned priors. Experimental results demonstrate that the proposed algorithm performs efficiently and effectively over state-of-the-art methods.

read more

Content maybe subject to copyright    Report

Fast Direct Super-Resolution by Simple Functions
Chih-Yuan Yang and Ming-Hsuan Yang
Electrical Engineering and Computer Science, University of California at Merced
{cyang35,mhyang}@ucmerced.edu
Abstract
The goal of single-image super-resolution is to gener-
ate a high-quality high-resolution image based on a given
low-resolution input. It is an ill-posed problem which re-
quires exemplars or priors to better reconstruct the missing
high-resolution image details. In this paper, we propose to
split the feature space into numerous subspaces and col-
lect exemplars to learn priors for each subspace, thereby
creating effective mapping functions. The use of split in-
put space facilitates both feasibility of using simple func-
tions for super-resolution, and efficiency of generating high-
resolution results. High-quality high-resolution images are
reconstructed based on the effective learned priors. Experi-
mental results demonstrate that the proposed algorithm per-
forms efficiently and effectively over state-of-the-art meth-
ods.
1. Introduction
Single-image super-resolution (SISR) aims to generate a
visually pleasing high-resolution (HR) image from a given
low-resolution (LR) input. It is a challenging and ill-posed
problem because numerous pixel intensities need to be pre-
dicted from limited input data. To alleviate this ill-posed
problem, it is imperative for most SISR algorithms to ex-
ploit additional information such as exemplar images or sta-
tistical priors. Exemplar images contain abundant visual
information which can be exploited to enrich the super-
resolution (SR) image details [4, 1, 5, 19, 6, 17, 3, 18].
However, numerous challenging factors make it difficult to
generate SR images efficiently and robustly. First, there ex-
ist fundamental ambiguities between the LR and HR data as
significantly different HR image patches may generate very
similar LR patches as a result of downsampling process.
That is, the mapping between HR and LR data is many to
one and the reverse process from one single LR image patch
alone is inherently ambiguous. Second, the success of this
approach hinges on the assumption that a high-fidelity HR
patch can be found from the LR one (aside from ambiguity
which can be alleviated with statistical priors), thereby re-
quiring a large and adequate dataset at our disposal. Third,
the ensuing problem with a large dataset is how to determine
similar patches efficiently.
In contrast, statistical SISR approaches [2, 16, 15, 9, 24,
20] have the marked advantage of performance stability and
low computational cost. Since the priors are learned from
numerous examples, they are statistically effective to rep-
resent the majority of the training data. The computational
load of these algorithms is relatively low, as it is not neces-
sary to search exemplars. Although the process of learning
statistical priors is time consuming, it can be computed of-
fline and only once for SR applications. However, statisti-
cal SISR algorithms are limited by specific image structures
modeled by their priors (e.g., edges) and ineffective to re-
construct other details (e.g., textures). In addition, it is not
clear what statistical models or features best suit this learn-
ing task from a large number of training examples.
In this paper, we propose a divide-and-conquer ap-
proach [25, 23] to learn statistical priors directly from ex-
emplar patches using a large number of simple functions.
We show that while sufficient amount of data is collected,
the ambiguity problem of the source HR patches is allevi-
ated. While LR feature space is properly divided, simple
linear functions are sufficient to map LR patches to HR ef-
fectively. The use of simple functions also facilitates the
process to generate high-quality HR images efficiently.
The contributions of this work are summarized as fol-
lows. First, we demonstrate a direct single-image super-
resolution algorithm can be simple and fast when effective
exemplars are available in the training phase. Second, we
effectively split the input domain of low-resolution patches
based on exemplar images, thereby facilitating learning
simple functions for effective mapping. Third, the proposed
algorithm generates favorable results with low computa-
tional load against existing methods. We demonstrate the
merits of the proposed algorithm in terms of image quality
and computational load by numerous qualitative and quan-
titative comparisons with the state-of-the-art methods.
2. Related Work and Problem Context
The SISR problem has been intensively studied in com-
puter vision, image processing, and computer graphics.
Classic methods render HR images from LR ones through
2013 IEEE International Conference on Computer Vision
1550-5499/13 $31.00 © 2013 IEEE
DOI 10.1109/ICCV.2013.75
561

certain mathematical formulations [13, 11] such as bicubic
interpolation and back-projection [8]. While these algo-
rithms can be executed efficiently, they are less effective to
reconstruct high-frequency details, which are not modeled
in the mathematical formulations.
Recent methods exploit rich visual information con-
tained in a set of exemplar images. However, there are many
challenges to exploit exemplar images properly, and many
methods have been proposed to address them. To reduce the
ambiguity between LR and HR patches, spacial correlation
is exploited to minimize the difference of overlapping HR
patches [4, 1, 19]. For improving the effectiveness of ex-
emplar images, user guidance is required to prepare precise
ones [19, 6]. In order to increase the efficiency of recon-
structed HR edges, small scaling factors and a compact ex-
emplar patch set are proposed by generating from the input
frame [5, 3]. For increasing the chance to retrieve effective
patches, segments are introduced for multiple-level patch
searching [17, 6].
Statistical SISR algorithms learn priors from numerous
feature vectors to generate a function mapping features
from LR to HR. A significant advantage of this approach
is the low computational complexity as the load of search-
ing exemplars is alleviated. Global distributions of gradi-
ents are used [15] to regularize a deconvolution process for
generating HR images. Edge-specific priors focus on re-
constructing sharp edges because they are important visual
cues for image quality [2, 16]. In addition, priors of patch
mapping from LR to HR are developed based on dictionar-
ies via sparse representation [24, 21], support vector regres-
sion [12], or kernel ridge regression [9].
Notwithstanding much demonstrated success of the al-
gorithms in the literature, existing methods require com-
putationally expensive processes in either searching exem-
plars [4, 1, 5] or extracting complex features [12, 16, 17, 6].
In contrast, we present a fast algorithm based on simple fea-
tures. Instead of using one or a few mapping functions, we
learn a large number of them. We show this divide-and-
conquer algorithm is effective and efficient for SISR when
the right components are properly integrated.
3. Proposed Algorithm
One motivation of this work is to generate SR images
efficiently but also to handle the ambiguity problem. To
achieve efficiency, we adopt the approach of statistical SISR
methods, e.g., we do not search for a large exemplar set at
the test phase. Compared with existing statistical methods
using complicated features [16, 24, 21], we propose simple
features to reduce computational load. To handle the am-
biguity problem using simple features, we spend intensive
computational load during the training phase. We collect a
large set of LR patches and their corresponding HR source
patches. We divide the input space into a large set of sub-
spaces from which simple functions are capable to map LR
Figure 1. Training LR and HR pairs (four corner pixels are dis-
carded). A set of functions is learned to map a LR patch to a set
of pixels at the central (shaded) region of the corresponding HR
patch (instead of the entire HR patch).
features to HR effectively. Although the proposed algorithm
entails processing a large set of training images, it is only
carried out offline in batch mode.
We generate a LR image I
l
from a HR one I
h
by
I
l
=(I
h
G)
s
, (1)
where is a convolution operator, G is a Gaussian kernel,
is a downsampling operator and s is the scaling factor.
From each I
h
and the corresponding I
l
image, a large set of
corresponding HR and LR patch pairs can be cropped. Let
P
h
and P
l
be two paired patches. We compute the patch
mean of P
l
as μ, and extract the features of P
h
and P
l
as the
intensities minus μ to present the high-frequency signals.
For HR patch P
h
, we only extract features for pixels at the
central region (e.g., the shaded region in Figure 1) and dis-
card boundary pixels. We do not learn mapping functions
to predict the HR boundary pixels as the LR patch P
l
does
not carry sufficient information to predict those pixels.
We collect a large set of LR patches from natural images
to learn K cluster centers of their extracted features. Fig-
ure 2 shows 4096 cluster centers learned from 2.2 million
natural patches. Similar to the heavy-tailed gradient distri-
bution in natural images [7], more populous cluster centers
correspond to smoother patches as shown in Figure 3. These
K cluster centers can be viewed as anchor points to repre-
sent the feature space of natural image patches.
For some regions in the feature space where natural
patches appear fairly rarely, it is unnecessary to learn map-
ping functions to predict patches of HR from LR. Since each
cluster represents a subspace, we collect a certain number of
exemplar patches in the segmented space to training a map-
ping function. Since natural images are abundant and easily
acquired, we can assume that there are sufficient exemplar
patches available for each cluster center.
Suppose there are l LR exemplar patches belonging to
the same cluster. Let v
i
and w
i
(i =1,...,l) be vector-
ized features of the LR and HR patches respectively, in di-
mensions m and n. We propose to learn a set of n linear
regression functions to individually predict the n feature
values in HR. Let V R
m×l
and W R
n×l
be the ma-
trices of v
i
and w
i
. We compute the regression coefficients
562

Figure 2. A set of 4096 cluster centers learned from 2.2 million natural patches. As the features for clustering are the intensities subtracting
patch means, we show the intensities by adding their mean values for visualization purpose. The order of cluster centers is sorted by the
amounts of clustered patches, as shown in Figure 3. Patches with more high-frequency details appear less frequently in natural images.
Figure 3. Histogram of clustered patches from a set of 2.2 million
natural patches with cluster centers shown in Figure 2. While the
most populous cluster consists of 18489 patches, the 40 least pop-
ulous clusters only have one patch. A cluster has 537 patches on
average.
C
R
n×(m+1)
by
C
=argmin
C
W C
V
1
2
, (2)
where 1 is a 1 × l vector with all values as 1. This linear
least-squares problem is easily solved.
Given a LR test image, we crop each LR patch to com-
pute the LR features and search for the closest cluster cen-
ter. According to the cluster center, we apply the learned
coefficients to compute the HR features by
w = C
v
1
. (3)
The predicted HR patch intensity is then reconstructed by
adding the LR patch mean to the HR features.
The proposed method generates effective HR patches be-
cause each test LR patch and its exemplar LR patches are
highly similar as they belong to the same compact feature
subspace. The computational load for generating a HR im-
age is low as each HR patch can be generated by a LR patch
through a few additions and multiplications. The algorithm
can easily be executed in parallel because all LR patches are
upsampled individually. In addition, the proposed method
is suitable for hardware implementations as only few lines
of code are required.
4. Experimental Results
Implementation: For color images, we apply the proposed
algorithm on brightness channel (Y) and upsample color
channels (UV) by bicubic interpolation as human vision is
more sensitive to brightness change. For a scaling factor 4,
we set the Gaussian kernel width in Eq. 1 to 1.6 as com-
monly used in the literature [16]. The LR patch size is set
as 7 × 7 pixels, and the LR feature dimension is 45 since
four corner pixels are discarded. The central region of a HR
patch is set as 12 × 12 pixels (as illustrated in Figure 1).
Since the central region in LR is 3 × 3 pixels, a pixel in HR
is covered by 9 LR patches and the output intensity is gen-
erated by averaging 9 predicted values, as commonly used
in the literature [5, 24, 3, 21]. We prepare a training set con-
taining 6152 HR natural images collected from the Berke-
ley segmentation and LabelMe datasets [10, 14] to generate
a LR training image set containing 679 million patches.
Number of clusters: Due to the memory limitation on a
machine (24 GB), we randomly select 2.2 million patches
to learn a set of 4096 cluster centers, and use the learned
cluster centers to label all LR patches in training image set.
As the proposed function regresses features from 45 dimen-
sions to one dimension only (each row of C
in Eq. 2 is
assumed to be independent) and most training features are
highly similar, a huge set of training instances is unneces-
sary. We empirically choose a large value, e.g., 1000, as
the size of training instances for each cluster center and col-
563

Figure 4. Numbers of patches used to train regression coefficients
in our experiments. Since some patches are rarely observed in
natural images, there are fewer than 1000 patches in some clusters.
(a) 512 clusters (b) 4096 clusters (c) Difference map
Figure 5. Super resolution results using different cluster numbers.
Images best viewed on a high-resolution display where each image
is shown with at least 512 × 512 pixels (full resolution).
lect training instances randomly from the labeled patches.
Figure 4 shows the actual numbers of training patches.
Since some patches are rarely observed in natural images,
there are fewer than 1000 patches in a few clusters. For
such cases we still compute the regression coefficients if
there is no rank deficiency in Eq. 2, i.e., at least 46 linear in-
dependent training vectors are available. Otherwise, we use
bilinear interpolation to map LR patches for such clusters.
The number of clusters is a trade-off between image
quality and computational load. Figure 5 shows the re-
sults generated by 512 and 4096 clusters with all other
same setup. While the low-frequency regions are almost
the same, the high-frequency regions of the image gener-
ated by more clusters are better in terms of less jaggy arti-
facts along the face contours. With more clusters, the input
feature space can be divided into more compact subspaces
from which the linear mapping functions can be learned
more effectively.
In addition to linear regressors, we also evaluate image
quality generated by support vector regressor (SVR) with
a Radial Basis Function kernel or a linear kernel. With
the same setup, the images generated by SVRs and lin-
ear regressors are similar visually (See the supplementary
material for examples). However, the computational load
of SVRs is much higher due to the cost of computing the
similarity between each support vector and the test vector.
While linear regressors take 14 seconds to generate an im-
age, SVRs take 1.5 hours.
Evaluation and analysis: We implement the proposed al-
gorithm in MATLAB, which takes 14 seconds to upsample
an image of 128 × 128 pixels with a scaling factor 4 on
a 2.7 GHz Quad Core machine. The execution time can
be further reduced by other implementations and GPU. We
Table 1. Average evaluated values of 200 images from the Berke-
ley segmentation dataset [10]. While the generated SR images by
the proposed method are comparable to those by the self-exemplar
SR algorithm [5], the required computational load is much lower
(14 seconds vs. 10 minutes).
Algorithm PSNR SSIM [22]
Bicubic Interpolation 24.27 0.6555
Back Projection [8] 25.01 0.7036
Sun [16] 24.54 0.6695
Shan [15] 23.47 0.6367
Yang [24] 24.31 0.6205
Kim [9] 25.12 0.6970
Wang [21] 24.32 0.6505
Freedman [3] 22.22 0.6173
Glasner [5] 25.20 0.7064
Proposed 25.18 0.7081
use the released code from the authors [15, 24, 21] to gen-
erate HR images, and implement other state-of-the-art al-
gorithms [8, 16, 5, 3] as the source code is not available.
Our code and dataset are available at the project web page
https://eng.ucmerced.edu/people/cyang35.
Figure 6-11 show SR results of the proposed algorithm
and the state-of-the-art methods. More results are available
in the supplementary material. We evaluate the method nu-
merically in terms of PSNR and SSIM index [22] when the
ground truth images are available. Table 1 shows averaged
results for a set of 200 natural images. The evaluations are
presented from the four perspectives with comparisons to
SR methods using statistical priors [9, 16], fast SR algo-
rithms [8, 15], self-exemplar SR algorithms [5, 3], and SR
approaches with dictionary learning [24, 21].
SR methods based on statistical priors: As shown in Fig-
ure 6(b)(c), Figure 8(a), Figure 10(c), and Figure 11(b)(c),
the proposed algorithm generates textures with better con-
trast than existing methods using statistical priors [9, 16].
While a kernel ridge regression function is learned in [9]
and a gradient profile prior is trained in [16] to restore the
edge sharpness based on an intermediate bicubic interpo-
lated image, the high-frequency texture details are not gen-
erated due to the use of the bicubic interpolated interme-
diate image. Furthermore, a post-processing filter is used
in [9] to suppress median gradients in order to reduce noise
generated by the regression function along edges. However,
mid-frequency details at textures may be wrongly reduced
and the filtered textures appear unrealistic. There are sev-
eral differences between the proposed method and the exist-
ing methods based on statistical priors. First, the proposed
method upsamples the LR patches directly rather than us-
ing an intermediate image generated by bicubic interpola-
tion, and thus there is no loss of texture details. Second, the
proposed regressed features can be applied to any type of
patches, while existing methods focus only on edges. Third,
no post-processing filter is required in the proposed method
564

(a) Bicubic Interpolation (b) Kim [9] (c) Sun [16] (d) Proposed
PSNR / SSIM: 29.8 / 0.9043 31.3 / 0.9321 30.4 / 0.9142 31.6 / 0.9422
(e) Back Projection [8] (f) Shan [15] (g) Yang [24] (h) Wang [3]
PSNR / SSIM: 31.1 / 0.9391 27.8 / 0.8554 30.1 / 0.9152 29.5 / 0.8859
Figure 6. Child. Results best viewed on a high-resolution display with adequate zoom level where each image is shown with at least
512 × 512 pixels (full resolution).
to refine the generated HR images. Fourth, existing methods
learn a single regressor for the whole feature space, but the
proposed method learns numerous regressors (one for each
subspace), thereby making the prediction more effective.
Fast SR methods: Compared with existing fast SR meth-
ods [8, 15] and bicubic interpolation, Figure 6(a)(e)(f), Fig-
ure 7, and Figure 11(a)(b) show that the proposed method
generates better edges and textures. Although bicubic inter-
polation is the fastest method, the generated edges and tex-
tures are always over-smoothed. While back-projection [8]
boosts contrast in SR images, displeasing jaggy artifacts are
also generated. Those problems are caused by the fixed
back-projection kernel, which is assumed isotropic. How-
ever, the image structures along sharp edges are highly
anisotropic, and thus an isotropic kernel wrongly compen-
sates the intensities. A global gradient distribution is ex-
ploited as constraints in [15] to achieve fast SR. However,
although the global gradient distribution is reconstructed
by [15] in Figure 6(f) and Figure 7(c), the local gradients are
not constrained. Thus, over-smoothed textures and jaggy
edges are generated by this method. The proposed method
generates better edges and textures as each LR patch is up-
sampled by a specific prior learned from a compact sub-
space of similar patches. Thus, the contrast and local struc-
tures are better preserved with less artifacts.
SR methods based on self exemplars: Figure 8(b)(d), Fig-
ure 9(a)(d), Figure 10(b)(d), and Figure 11(c)(d) show the
results generated by self-exemplar SR methods and the pro-
posed algorithm. Self-exemplar SR algorithms [5, 3] iter-
atively upsample images with a small scaling factor (e.g.,
1.25). Such an approach has an advantage of generating
sharp and clear edges because it is easy to find similar edge
565

Citations
More filters
Journal ArticleDOI

Image Super-Resolution Using Deep Convolutional Networks

TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Book ChapterDOI

Learning a Deep Convolutional Network for Image Super-Resolution

TL;DR: This work proposes a deep learning method for single image super-resolution (SR) that directly learns an end-to-end mapping between the low/high-resolution images and shows that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network.
Proceedings ArticleDOI

Accurate Image Super-Resolution Using Very Deep Convolutional Networks

TL;DR: In this article, a very deep convolutional network inspired by VGG-net was used for image superresolution, which achieved state-of-the-art performance in accuracy.
Posted Content

Accurate Image Super-Resolution Using Very Deep Convolutional Networks

TL;DR: This work presents a highly accurate single-image superresolution (SR) method using a very deep convolutional network inspired by VGG-net used for ImageNet classification and uses extremely high learning rates enabled by adjustable gradient clipping.
Proceedings ArticleDOI

Enhanced Deep Residual Networks for Single Image Super-Resolution

TL;DR: This paper develops an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods, and proposes a new multi-scale deepsuper-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model.
References
More filters
Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Proceedings ArticleDOI

A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics

TL;DR: In this paper, the authors present a database containing ground truth segmentations produced by humans for images of a wide variety of natural scenes, and define an error measure which quantifies the consistency between segmentations of differing granularities.
Journal ArticleDOI

Image Super-Resolution Via Sparse Representation

TL;DR: This paper presents a new approach to single-image superresolution, based upon sparse signal representation, which generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods.
Journal ArticleDOI

LabelMe: A Database and Web-Based Tool for Image Annotation

TL;DR: In this article, a large collection of images with ground truth labels is built to be used for object detection and recognition research, such data is useful for supervised learning and quantitative evaluation.
Journal ArticleDOI

Super-resolution image reconstruction: a technical overview

TL;DR: The goal of this article is to introduce the concept of SR algorithms to readers who are unfamiliar with this area and to provide a review for experts to present the technical review of various existing SR methodologies which are often employed.
Related Papers (5)