Fast Direct Super-Resolution by Simple Functions

doi:10.1109/ICCV.2013.75

Chih-Yuan Yang and Ming-Hsuan Yang

Electrical Engineering and Computer Science, University of California at Merced

{cyang35,mhyang}@ucmerced.edu

Abstract

The goal of single-image super-resolution is to gener-

ate a high-quality high-resolution image based on a given

low-resolution input. It is an ill-posed problem which re-

quires exemplars or priors to better reconstruct the missing

high-resolution image details. In this paper, we propose to

split the feature space into numerous subspaces and col-

lect exemplars to learn priors for each subspace, thereby

creating effective mapping functions. The use of split in-

put space facilitates both feasibility of using simple func-

tions for super-resolution, and efﬁciency of generating high-

resolution results. High-quality high-resolution images are

reconstructed based on the effective learned priors. Experi-

mental results demonstrate that the proposed algorithm per-

forms efﬁciently and effectively over state-of-the-art meth-

ods.

1. Introduction

Single-image super-resolution (SISR) aims to generate a

visually pleasing high-resolution (HR) image from a given

low-resolution (LR) input. It is a challenging and ill-posed

problem because numerous pixel intensities need to be pre-

dicted from limited input data. To alleviate this ill-posed

problem, it is imperative for most SISR algorithms to ex-

ploit additional information such as exemplar images or sta-

tistical priors. Exemplar images contain abundant visual

information which can be exploited to enrich the super-

resolution (SR) image details [4, 1, 5, 19, 6, 17, 3, 18].

However, numerous challenging factors make it difﬁcult to

generate SR images efﬁciently and robustly. First, there ex-

ist fundamental ambiguities between the LR and HR data as

signiﬁcantly different HR image patches may generate very

similar LR patches as a result of downsampling process.

That is, the mapping between HR and LR data is many to

one and the reverse process from one single LR image patch

alone is inherently ambiguous. Second, the success of this

approach hinges on the assumption that a high-ﬁdelity HR

patch can be found from the LR one (aside from ambiguity

which can be alleviated with statistical priors), thereby re-

quiring a large and adequate dataset at our disposal. Third,

the ensuing problem with a large dataset is how to determine

similar patches efﬁciently.

In contrast, statistical SISR approaches [2, 16, 15, 9, 24,

20] have the marked advantage of performance stability and

low computational cost. Since the priors are learned from

numerous examples, they are statistically effective to rep-

resent the majority of the training data. The computational

load of these algorithms is relatively low, as it is not neces-

sary to search exemplars. Although the process of learning

statistical priors is time consuming, it can be computed of-

ﬂine and only once for SR applications. However, statisti-

cal SISR algorithms are limited by speciﬁc image structures

modeled by their priors (e.g., edges) and ineffective to re-

construct other details (e.g., textures). In addition, it is not

clear what statistical models or features best suit this learn-

ing task from a large number of training examples.

In this paper, we propose a divide-and-conquer ap-

proach [25, 23] to learn statistical priors directly from ex-

emplar patches using a large number of simple functions.

We show that while sufﬁcient amount of data is collected,

the ambiguity problem of the source HR patches is allevi-

ated. While LR feature space is properly divided, simple

linear functions are sufﬁcient to map LR patches to HR ef-

fectively. The use of simple functions also facilitates the

process to generate high-quality HR images efﬁciently.

The contributions of this work are summarized as fol-

lows. First, we demonstrate a direct single-image super-

resolution algorithm can be simple and fast when effective

exemplars are available in the training phase. Second, we

effectively split the input domain of low-resolution patches

based on exemplar images, thereby facilitating learning

simple functions for effective mapping. Third, the proposed

algorithm generates favorable results with low computa-

tional load against existing methods. We demonstrate the

merits of the proposed algorithm in terms of image quality

and computational load by numerous qualitative and quan-

titative comparisons with the state-of-the-art methods.

2. Related Work and Problem Context

The SISR problem has been intensively studied in com-

puter vision, image processing, and computer graphics.

Classic methods render HR images from LR ones through

2013 IEEE International Conference on Computer Vision

DOI 10.1109/ICCV.2013.75

561

certain mathematical formulations [13, 11] such as bicubic

interpolation and back-projection [8]. While these algo-

rithms can be executed efﬁciently, they are less effective to

reconstruct high-frequency details, which are not modeled

in the mathematical formulations.

Recent methods exploit rich visual information con-

tained in a set of exemplar images. However, there are many

challenges to exploit exemplar images properly, and many

methods have been proposed to address them. To reduce the

ambiguity between LR and HR patches, spacial correlation

is exploited to minimize the difference of overlapping HR

patches [4, 1, 19]. For improving the effectiveness of ex-

emplar images, user guidance is required to prepare precise

ones [19, 6]. In order to increase the efﬁciency of recon-

structed HR edges, small scaling factors and a compact ex-

emplar patch set are proposed by generating from the input

frame [5, 3]. For increasing the chance to retrieve effective

patches, segments are introduced for multiple-level patch

searching [17, 6].

Statistical SISR algorithms learn priors from numerous

feature vectors to generate a function mapping features

from LR to HR. A signiﬁcant advantage of this approach

is the low computational complexity as the load of search-

ing exemplars is alleviated. Global distributions of gradi-

ents are used [15] to regularize a deconvolution process for

generating HR images. Edge-speciﬁc priors focus on re-

constructing sharp edges because they are important visual

cues for image quality [2, 16]. In addition, priors of patch

mapping from LR to HR are developed based on dictionar-

ies via sparse representation [24, 21], support vector regres-

sion [12], or kernel ridge regression [9].

Notwithstanding much demonstrated success of the al-

gorithms in the literature, existing methods require com-

putationally expensive processes in either searching exem-

plars [4, 1, 5] or extracting complex features [12, 16, 17, 6].

In contrast, we present a fast algorithm based on simple fea-

tures. Instead of using one or a few mapping functions, we

learn a large number of them. We show this divide-and-

conquer algorithm is effective and efﬁcient for SISR when

the right components are properly integrated.

3. Proposed Algorithm

One motivation of this work is to generate SR images

efﬁciently but also to handle the ambiguity problem. To

achieve efﬁciency, we adopt the approach of statistical SISR

methods, e.g., we do not search for a large exemplar set at

the test phase. Compared with existing statistical methods

using complicated features [16, 24, 21], we propose simple

features to reduce computational load. To handle the am-

biguity problem using simple features, we spend intensive

computational load during the training phase. We collect a

large set of LR patches and their corresponding HR source

patches. We divide the input space into a large set of sub-

spaces from which simple functions are capable to map LR

Figure 1. Training LR and HR pairs (four corner pixels are dis-

carded). A set of functions is learned to map a LR patch to a set

of pixels at the central (shaded) region of the corresponding HR

patch (instead of the entire HR patch).

features to HR effectively. Although the proposed algorithm

entails processing a large set of training images, it is only

carried out ofﬂine in batch mode.

We generate a LR image I

l

from a HR one I

h

by

I

l

=(I

h

⊗ G) ↓

s

, (1)

where ⊗ is a convolution operator, G is a Gaussian kernel,

↓ is a downsampling operator and s is the scaling factor.

From each I

h

and the corresponding I

l

image, a large set of

corresponding HR and LR patch pairs can be cropped. Let

P

h

and P

l

be two paired patches. We compute the patch

mean of P

l

as μ, and extract the features of P

h

and P

l

as the

intensities minus μ to present the high-frequency signals.

For HR patch P

h

, we only extract features for pixels at the

central region (e.g., the shaded region in Figure 1) and dis-

card boundary pixels. We do not learn mapping functions

to predict the HR boundary pixels as the LR patch P

l

does

not carry sufﬁcient information to predict those pixels.

We collect a large set of LR patches from natural images

to learn K cluster centers of their extracted features. Fig-

ure 2 shows 4096 cluster centers learned from 2.2 million

natural patches. Similar to the heavy-tailed gradient distri-

bution in natural images [7], more populous cluster centers

correspond to smoother patches as shown in Figure 3. These

K cluster centers can be viewed as anchor points to repre-

sent the feature space of natural image patches.

For some regions in the feature space where natural

patches appear fairly rarely, it is unnecessary to learn map-

ping functions to predict patches of HR from LR. Since each

cluster represents a subspace, we collect a certain number of

exemplar patches in the segmented space to training a map-

ping function. Since natural images are abundant and easily

acquired, we can assume that there are sufﬁcient exemplar

patches available for each cluster center.

Suppose there are l LR exemplar patches belonging to

the same cluster. Let v

i

and w

i

(i =1,...,l) be vector-

ized features of the LR and HR patches respectively, in di-

mensions m and n. We propose to learn a set of n linear

regression functions to individually predict the n feature

values in HR. Let V ∈ R

m×l

and W ∈ R

n×l

be the ma-

trices of v

i

and w

i

. We compute the regression coefﬁcients

562

Figure 2. A set of 4096 cluster centers learned from 2.2 million natural patches. As the features for clustering are the intensities subtracting

patch means, we show the intensities by adding their mean values for visualization purpose. The order of cluster centers is sorted by the

amounts of clustered patches, as shown in Figure 3. Patches with more high-frequency details appear less frequently in natural images.

Figure 3. Histogram of clustered patches from a set of 2.2 million

natural patches with cluster centers shown in Figure 2. While the

most populous cluster consists of 18489 patches, the 40 least pop-

ulous clusters only have one patch. A cluster has 537 patches on

average.

C

∗

∈ R

n×(m+1)

by

C

∗

=argmin

C



W − C



V

1





2

, (2)

where 1 is a 1 × l vector with all values as 1. This linear

least-squares problem is easily solved.

Given a LR test image, we crop each LR patch to com-

pute the LR features and search for the closest cluster cen-

ter. According to the cluster center, we apply the learned

coefﬁcients to compute the HR features by

w = C

∗



v

1



. (3)

The predicted HR patch intensity is then reconstructed by

adding the LR patch mean to the HR features.

The proposed method generates effective HR patches be-

cause each test LR patch and its exemplar LR patches are

highly similar as they belong to the same compact feature

subspace. The computational load for generating a HR im-

age is low as each HR patch can be generated by a LR patch

through a few additions and multiplications. The algorithm

can easily be executed in parallel because all LR patches are

upsampled individually. In addition, the proposed method

is suitable for hardware implementations as only few lines

of code are required.

4. Experimental Results

Implementation: For color images, we apply the proposed

algorithm on brightness channel (Y) and upsample color

channels (UV) by bicubic interpolation as human vision is

more sensitive to brightness change. For a scaling factor 4,

we set the Gaussian kernel width in Eq. 1 to 1.6 as com-

monly used in the literature [16]. The LR patch size is set

as 7 × 7 pixels, and the LR feature dimension is 45 since

four corner pixels are discarded. The central region of a HR

patch is set as 12 × 12 pixels (as illustrated in Figure 1).

Since the central region in LR is 3 × 3 pixels, a pixel in HR

is covered by 9 LR patches and the output intensity is gen-

erated by averaging 9 predicted values, as commonly used

in the literature [5, 24, 3, 21]. We prepare a training set con-

taining 6152 HR natural images collected from the Berke-

ley segmentation and LabelMe datasets [10, 14] to generate

a LR training image set containing 679 million patches.

Number of clusters: Due to the memory limitation on a

machine (24 GB), we randomly select 2.2 million patches

to learn a set of 4096 cluster centers, and use the learned

cluster centers to label all LR patches in training image set.

As the proposed function regresses features from 45 dimen-

sions to one dimension only (each row of C

∗

in Eq. 2 is

assumed to be independent) and most training features are

highly similar, a huge set of training instances is unneces-

sary. We empirically choose a large value, e.g., 1000, as

the size of training instances for each cluster center and col-

563

Figure 4. Numbers of patches used to train regression coefﬁcients

in our experiments. Since some patches are rarely observed in

natural images, there are fewer than 1000 patches in some clusters.

(a) 512 clusters (b) 4096 clusters (c) Difference map

Figure 5. Super resolution results using different cluster numbers.

Images best viewed on a high-resolution display where each image

is shown with at least 512 × 512 pixels (full resolution).

lect training instances randomly from the labeled patches.

Figure 4 shows the actual numbers of training patches.

Since some patches are rarely observed in natural images,

there are fewer than 1000 patches in a few clusters. For

such cases we still compute the regression coefﬁcients if

there is no rank deﬁciency in Eq. 2, i.e., at least 46 linear in-

dependent training vectors are available. Otherwise, we use

bilinear interpolation to map LR patches for such clusters.

The number of clusters is a trade-off between image

quality and computational load. Figure 5 shows the re-

sults generated by 512 and 4096 clusters with all other

same setup. While the low-frequency regions are almost

the same, the high-frequency regions of the image gener-

ated by more clusters are better in terms of less jaggy arti-

facts along the face contours. With more clusters, the input

feature space can be divided into more compact subspaces

from which the linear mapping functions can be learned

more effectively.

In addition to linear regressors, we also evaluate image

quality generated by support vector regressor (SVR) with

a Radial Basis Function kernel or a linear kernel. With

the same setup, the images generated by SVRs and lin-

ear regressors are similar visually (See the supplementary

material for examples). However, the computational load

of SVRs is much higher due to the cost of computing the

similarity between each support vector and the test vector.

While linear regressors take 14 seconds to generate an im-

age, SVRs take 1.5 hours.

Evaluation and analysis: We implement the proposed al-

gorithm in MATLAB, which takes 14 seconds to upsample

an image of 128 × 128 pixels with a scaling factor 4 on

a 2.7 GHz Quad Core machine. The execution time can

be further reduced by other implementations and GPU. We

Table 1. Average evaluated values of 200 images from the Berke-

ley segmentation dataset [10]. While the generated SR images by

the proposed method are comparable to those by the self-exemplar

SR algorithm [5], the required computational load is much lower

(14 seconds vs. 10 minutes).

Algorithm PSNR SSIM [22]

Bicubic Interpolation 24.27 0.6555

Back Projection [8] 25.01 0.7036

Sun [16] 24.54 0.6695

Shan [15] 23.47 0.6367

Yang [24] 24.31 0.6205

Kim [9] 25.12 0.6970

Wang [21] 24.32 0.6505

Freedman [3] 22.22 0.6173

Glasner [5] 25.20 0.7064

Proposed 25.18 0.7081

use the released code from the authors [15, 24, 21] to gen-

erate HR images, and implement other state-of-the-art al-

gorithms [8, 16, 5, 3] as the source code is not available.

Our code and dataset are available at the project web page

https://eng.ucmerced.edu/people/cyang35.

Figure 6-11 show SR results of the proposed algorithm

and the state-of-the-art methods. More results are available

in the supplementary material. We evaluate the method nu-

merically in terms of PSNR and SSIM index [22] when the

ground truth images are available. Table 1 shows averaged

results for a set of 200 natural images. The evaluations are

presented from the four perspectives with comparisons to

SR methods using statistical priors [9, 16], fast SR algo-

rithms [8, 15], self-exemplar SR algorithms [5, 3], and SR

approaches with dictionary learning [24, 21].

SR methods based on statistical priors: As shown in Fig-

ure 6(b)(c), Figure 8(a), Figure 10(c), and Figure 11(b)(c),

the proposed algorithm generates textures with better con-

trast than existing methods using statistical priors [9, 16].

While a kernel ridge regression function is learned in [9]

and a gradient proﬁle prior is trained in [16] to restore the

edge sharpness based on an intermediate bicubic interpo-

lated image, the high-frequency texture details are not gen-

erated due to the use of the bicubic interpolated interme-

diate image. Furthermore, a post-processing ﬁlter is used

in [9] to suppress median gradients in order to reduce noise

generated by the regression function along edges. However,

mid-frequency details at textures may be wrongly reduced

and the ﬁltered textures appear unrealistic. There are sev-

eral differences between the proposed method and the exist-

ing methods based on statistical priors. First, the proposed

method upsamples the LR patches directly rather than us-

ing an intermediate image generated by bicubic interpola-

tion, and thus there is no loss of texture details. Second, the

proposed regressed features can be applied to any type of

patches, while existing methods focus only on edges. Third,

no post-processing ﬁlter is required in the proposed method

564

(a) Bicubic Interpolation (b) Kim [9] (c) Sun [16] (d) Proposed

PSNR / SSIM: 29.8 / 0.9043 31.3 / 0.9321 30.4 / 0.9142 31.6 / 0.9422

(e) Back Projection [8] (f) Shan [15] (g) Yang [24] (h) Wang [3]

PSNR / SSIM: 31.1 / 0.9391 27.8 / 0.8554 30.1 / 0.9152 29.5 / 0.8859

Figure 6. Child. Results best viewed on a high-resolution display with adequate zoom level where each image is shown with at least

512 × 512 pixels (full resolution).

to reﬁne the generated HR images. Fourth, existing methods

learn a single regressor for the whole feature space, but the

proposed method learns numerous regressors (one for each

subspace), thereby making the prediction more effective.

Fast SR methods: Compared with existing fast SR meth-

ods [8, 15] and bicubic interpolation, Figure 6(a)(e)(f), Fig-

ure 7, and Figure 11(a)(b) show that the proposed method

generates better edges and textures. Although bicubic inter-

polation is the fastest method, the generated edges and tex-

tures are always over-smoothed. While back-projection [8]

boosts contrast in SR images, displeasing jaggy artifacts are

also generated. Those problems are caused by the ﬁxed

back-projection kernel, which is assumed isotropic. How-

ever, the image structures along sharp edges are highly

anisotropic, and thus an isotropic kernel wrongly compen-

sates the intensities. A global gradient distribution is ex-

ploited as constraints in [15] to achieve fast SR. However,

although the global gradient distribution is reconstructed

by [15] in Figure 6(f) and Figure 7(c), the local gradients are

not constrained. Thus, over-smoothed textures and jaggy

edges are generated by this method. The proposed method

generates better edges and textures as each LR patch is up-

sampled by a speciﬁc prior learned from a compact sub-

space of similar patches. Thus, the contrast and local struc-

tures are better preserved with less artifacts.

SR methods based on self exemplars: Figure 8(b)(d), Fig-

ure 9(a)(d), Figure 10(b)(d), and Figure 11(c)(d) show the

results generated by self-exemplar SR methods and the pro-

posed algorithm. Self-exemplar SR algorithms [5, 3] iter-

atively upsample images with a small scaling factor (e.g.,

1.25). Such an approach has an advantage of generating

sharp and clear edges because it is easy to ﬁnd similar edge

565

Fast Direct Super-Resolution by Simple Functions

Figures

Citations

Image Super-Resolution Using Deep Convolutional Networks

Learning a Deep Convolutional Network for Image Super-Resolution

Accurate Image Super-Resolution Using Very Deep Convolutional Networks

Accurate Image Super-Resolution Using Very Deep Convolutional Networks

Enhanced Deep Residual Networks for Single Image Super-Resolution

References

Image quality assessment: from error visibility to structural similarity

A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics

Image Super-Resolution Via Sparse Representation

LabelMe: A Database and Web-Based Tool for Image Annotation

Super-resolution image reconstruction: a technical overview

Related Papers (5)

Image Super-Resolution Via Sparse Representation

Learning a Deep Convolutional Network for Image Super-Resolution

On single image scale-up using sparse-representations

Image quality assessment: from error visibility to structural similarity

Accurate Image Super-Resolution Using Very Deep Convolutional Networks