scispace - formally typeset
Open AccessJournal ArticleDOI

Content-Based Photo Quality Assessment

TLDR
This paper proposes content-based photo quality assessment using both regional and global features and proposes an approach of online training an adaptive classifier to combine the proposed features according to the visual content of a test photo without knowing its category.
Abstract
Automatically assessing photo quality from the perspective of visual aesthetics is of great interest in high-level vision research and has drawn much attention in recent years. In this paper, we propose content-based photo quality assessment using both regional and global features. Under this framework, subject areas, which draw the most attentions of human eyes, are first extracted. Then regional features extracted from both subject areas and background regions are combined with global features to assess photo quality. Since professional photographers adopt different photographic techniques and have different aesthetic criteria in mind when taking different types of photos (e.g., landscape versus portrait), we propose to segment subject areas and extract visual features in different ways according to the variety of photo content. We divide the photos into seven categories based on their visual content and develop a set of new subject area extraction methods and new visual features specially designed for different categories. The effectiveness of this framework is supported by extensive experimental comparisons of existing photo quality assessment approaches as well as our new features on different categories of photos. In addition, we propose an approach of online training an adaptive classifier to combine the proposed features according to the visual content of a test photo without knowing its category. Another contribution of this work is to construct a large and diversified benchmark dataset for the research of photo quality assessment. It includes 17,673 photos with manually labeled ground truth. This new benchmark dataset can be down loaded at http://mmlab.ie.cuhk.edu.hk/CUHKPQ/Dataset.htm.

read more

Content maybe subject to copyright    Report

Content-Based Photo Quality Assessment
Wei Luo
1
, Xiaogang Wang
2,3
, and Xiaoou Tang
1,3
1
Department of Information Engineering, The Chinese University of Hong Kong
2
Department of Electronic Engineering, The Chinese University of Hong Kong
3
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China
lw010@ie.cuhk.edu.hk xgwang@ee.cuhk.edu.hk xtang@ie.cuhk.edu.hk
Abstract
Automatically assessing photo quality from the perspec-
tive of visual aesthetics is of great interest in high-level vi-
sion research and has drawn much attention in recent years.
In this paper, we propose content-based photo quality as-
sessment using regional and global features. Under this
framework, subject areas, which draw the most attentions
of huma n eyes, are first extracted. Then regional features
extracted from subject areas and the background regions
are combined with global feature s to assess the photo qual-
ity. Since professional photographers may adopt different
photographic techniques and may have different aesthetic
criteria in mind when taking different types of photos (e.g.
landscape versus portrait), we propose to segment region-
s and extract visual features in different ways according to
the categorization of photo content. Therefore we divide the
photos into seven categories based on their content and de-
velop a set of new subject area extraction methods and new
visual features, which are specially designed for different
categories. This argument is supported by extensive exper-
imental comparisons of existing photo quality assessment
approaches as well as our new regional and global features
over different categories of photos. Our new features sig -
nificantly outperform the state-of-the- art methods. Another
contribution of this work is to construct a large and diver-
sified benchmark database for the research of photo quality
assessment. It includes 17, 613 photos with manually la-
beled grou nd truth.
1. Introduction
Automatic assessment of photo quality based on aes-
thetic perception gains increasing interest in computer vi-
sion community. It has important applications. For ex-
ample, when users search images on the web, they expect
This work is partially supported by the Research Grants Council of
Hong Kong SAR (Grant No. 416510).
(a) (b) (c)
Figure 1. Subject areas of photos. (a) Close-up for a bird. (b)
Architecture. (c) Human portrait.
the search engine to rank the retrieved images according to
their relevance to the queries as well as their quality. Var-
ious methods of automatic photo quality assessment were
proposed in recent years [16, 18, 11, 5, 12, 20, 10]. In ear-
ly works, only global visual features, such as global edge
distribution and exposure, were used [11]. However, lat-
er studies [5, 12, 20] showed that regional features lead to
better performance, since human beings perceive subjec-
t areas differently from the background (see examples in
Figure 1). After extracting the subject areas, which draw
the most attentions of human eyes, regional features are ex-
tracted from the subject areas and the background separate-
ly and are used for assessing photo quality. Both Regional
and global features will be used in our work.
One major problem with the existing methods is that they
treat all photo equally without considering the diversity in
photo content. It is known that professional photographers
adopt different photographic techniques and have different
aesthetical criteria in mind when taking different types of
photos [
2, 19]. For example, for close-up photographs (e.g.
Figure
1 (a)), viewers appreciate the high contrast between
the foreground and background regions. In human portrait-
s photography (e.g. Figure 1 (c)), professional photogra-
phers use special lighting settings [6] to create aesthetically
pleasing patterns on human faces. For landscape photos,
well balanced spatial structure, professional hue composi-
tion, and proper lighting are considered as traits of profes-
sional photography.
Also, the subject areas of different types of photos should
1

landscape plant animal night human static architecture
Figure 2. Photos divided into seven categories according to content. First row: high quality photos; Second row: low quality photos.
be extracted in different ways. In a close-up photo, the sub-
ject area is emphasized using the low depth of field tech-
nique, which leads to blurred background and clear fore-
ground. However, in human portrait photos, the background
does not have to be blurred since the attentions of viewers
are automatically attracted by the presence of human faces.
Their subject areas can be better detected by a face detec-
tor. In landscape photos, it is usually the case that the entire
scene is clear and tidy. Their subject areas, such as moun-
tains, houses, and plants, are often vertical standing objects.
This can be used as a cue to extract subject areas in this type
of photos.
1.1. Our Approach
Motivated by these considerations, we propose content-
based photo quality assessment. Photos are manually divid-
ed into seven categories based on photo content: “animal”,
“plant”, “static”, “architecture”, “landscape”, “human”, and
“night”. See examples in Figure
2. Regional and global
features are selected and combined in different ways when
assessing photos in different categories. More specifically,
we propose three methods of extracting subject areas.
Clarity based region detection combines blur kernel
estimation with image segmentation to accurately ex-
tract the clear region as the subject area.
Layout based region detectio n analyzes the layout
structure of a photo and extracts vertical standing ob-
jects.
Human based dete ction locates faces in the photo with
a face detector or a human detector.
Based on the extracted subject areas, three types of new re-
gional features are proposed.
Dark channel feature measures the clearness and the
colorfulness of the subject areas.
Complexity features use the numbers of segmentations
to measure the spatial complexity of the subject area
and the background.
Human based features capture the clarity, brightness,
and lighting effects of human faces.
In addition, two types of new global features are proposed.
Hue composition feature fits photos with color compo-
sition schemes.
Scene composition features capture the spatial struc-
tures of photos from semantic lines.
These new methods and features are introduced in Sec-
tion
3-5, which emphasize on dark channel feature, hue
composition feature, and human based features, since they
lead to the best performance in most categories. Through
extensive experiments on a large and diverse benchmark
database, the effectiveness of different subject area extrac-
tion methods and different features on different photo cate-
gories are summarized in Table 1. These features are com-
bined by a SVM trained on each of the categories separately.
Experimental comparisons show that our proposed new fea-
tures significantly outperform existing features. To the best
of our knowledge, it is the first systematic study of photo
quality features on different photo categories.
2. Related Work
Existing methods of assessing photo quality from the
aesthetic point of view can be generally classified into using
global features and using regional features. Tong et al. [18]
used boosting to combine global low-level features for the
classification of professional and amateurish photos. How-
ever, these features were not specially designed for photo
quality assessment. To better mimic human aesthetical per-
ception, Ke et al. [11] designed a set of high-level semantic
features based on rules of thumb of photography. They mea-
sured the global distributions of edges, blurriness, hue, and
brightness.
Some approaches employed regional features by detect-
ing subject areas, since human beings percept subject areas
differently from the background. Datta et al. [5] divided a
photo into 3 × 3 blocks and assumed that the central block

(a1)
(b1)
(a2) (c1)
(b2) (c2)
Figure 3. (a1) and (b1) are input photos. (a2) is the subject area
(green rectangle) extracted by the method in [12]. The green rect-
angle cannot accurately represent the subject area. (b2) saliency
map with the subject area (red regions) extracted by the method in
[20]. Because of the very high brightness in the red regions, other
subject area is ignored. (c1) and (c2) are the subject areas (white
regions) extracted by our clarity based region detection method
described in Section 4.1.
is the subject area. Luo et al. [12] assumed that in a high
quality photo the subject area has a higher clarity than the
background. Therefore, clarity based criterions were used
to detect the subject area, which was fitted by a rectangle.
Visual features of clarity contrast, lighting contrast, and ge-
ometry composition extracted from the subject areas and
the background were used as regional features. Although
it worked well on some types of photos, such as “animal”,
“plant”, and “static”, it might fail on the photos of “architec-
ture” and “landscape” whose subject areas and background
both have high clarity. Also a rectangle is not an accurate
representation of the subject area and may decrease the per-
formance. Wong et al. [20] and Nishiyama et al. [14] used
saliency map to extract the subject areas, which were as-
sumed to have higher brightness and contrast than other re-
gions. However, if a certain part of the subject area has very
high brightness and contrast, other parts will be ignored by
this method. See examples in Figure 3.
3. Global Features
Professionals follow certain rules of color composition
and scene composition to produce aesthetically pleasing
photographs. For example, photographers focus on artistic
color combination and properly put color accents to create
unique composition solution and to invoke certain feeling a-
mong the viewers of their artworks. They also try to arrange
objects in the scene according to such empirical guidelines
like “rule of thirds”. Based on these techniques of photog-
raphy composition, we propose two global features to mea-
sure the quality of hue composition and scene composition.
3.1. Hue Composition Feature
Proper arrangement of colors engages viewers and cre-
ates inner sense of order and balance. Major color tem-
plates [13, 17] can be classified as subordination and co-
ordination . Subordination requires the photographer to set
a dominant color spot and to arrange the rest of colors to
correlate with it in harmony or contrast. It includes cer-
tain color schemes, such as the 90
o
color scheme and the
Complementary color scheme, which leads to aesthetically
pleasing images. With coordination, the color composition
is created with help of different gradation of one single col-
or. It includes the Monochromatic color scheme and the
Analogous color scheme. See examples in Figure
4.
Color templates can be mathematically approximated on
the color wheel as shown in Figure 4. A coordination color
scheme can be approximated by a single sector with the cen-
ter (α
1
) and the width (w
1
) (Figure
4 (a)). A subordination
color scheme can be approximated by two sectors with cen-
ters (α
1
, α
2
) and widths (w
1
, w
2
) (Figure 4 (d)). Although it
is possible to assess photo quality by fitting the color distri-
bution of a photo to some manually defined color templates,
our experimental results show that such an approach is sub-
optimal. It cannot automatically adapt to different types of
photos either. We choose to learn the models of hue com-
position from training data. The models of hue composition
for high- and low-quality photos will be learned separately.
The learning steps are described below.
Given an image I, we first decide whether it should be
fitted by a color template with a single sector (T
1
) or two
sectors (T
2
) by computing the following metric,
E
k
(I) = min
T
k
X
iI
D(H(i), T
k
) · S(i) + λA(T
k
)
where k = 1, 2. i is a pixel on I. H(i) and S(i) are the
hue and saturation of pixel i. D(H(i), T
k
) is zero if H(i)
falls in the sector of the template; otherwise it is calculat-
ed as the arc-length distance of H(i ) to the closest sector
border. A(T
k
) is the width of the sectors (A(T
1
) = w
1
and A(T
2
) = w
1
+ w
2
). λ is empirically set as 0.03.
E
k
(I) is calculated by fitting the template T
k
, which has
adjustable parameters, to image I. T
1
is controlled by
parameters (α
1
, w
1
) and T
2
is controlled by parameters
(α
1
, w
1
, α
2
, w
2
). This metric is inspired by the color har-
mony function [
3]. However, we assume that the width of
the sector is changeable and add a penalty on it. The single
sector is chosen if E
1
(I) < E
2
(I) and vice versa.
If I is fitted with a single-sector template, the average
saturation s
1
of pixels inside this sector is computed. s
1
and α
1
, the hue center of the fitting sector, are used as the
hue composition features of this photo. If I is fitted with
a two-sector template, a four dimensional feature vector
(α
1
, s
1
, α
2
, s
2
), which includes average hue and saturation
centers, are extracted from the two sectors. Based on the

α1
ω1
α1
ω1
α2
ω2
Monochromatic Analogous
Complementary 90 degree
(b)
(e)
(a)
(d)
(c)
(f)
Figure 4. Harmonic templates on the hue wheel used in [3]. An
image is considered as harmonic if most of its hue fall within the
gray sectors(s) on the template. The shapes of templates are fixed.
Templates may be rotated by an arbitrary angle. The templates
correspond to different color schemes.
extracted hue composition features, two Gaussian mixture
models are separately trained for the two types of templates.
Examples of training results of high-quality photos in the
category “landscape” are shown in Figure 5. Among 410
training photos, 83 are fitted with single-sector templates
and 327 are fitted with two-sector templates. Three Gaus-
sian mixture components are used to model hue composi-
tion features of photos belonging to single-sector templates.
Two Gaussian mixtures components are used to model the
hue composition features of photos belonging to two-sector
templates. One photo best fitting each of the mixture com-
ponents is shown in Figure
5. We find some interesting
correlations between the learned components and the col-
or schemes. For examples, the components in Figure 5(a)
and (b) correlates more with the monochromatic schemes
centered at red and yellow. The components in Figure 5(c)
and (e) more correlate with the analogous color scheme and
the complementary color scheme.
The likelihood ratio P(I|high)/ P (I|low) of a photo be-
ing high-quality or low-quality can be computed from the
Gaussian mixture models and is used for classification.
3.2. Scene Composition Feature
High quality photos show well-arranged spatial compo-
sition to hold attention of the viewer. Long continuous lines
often bear semantic meanings, such as the horizon and the
surface of water, in those photos. They can be used to com-
pute scene composition features. For example, the location
of the horizon in outdoor photos was used by Bhattacharya
et al. [
1] to assess the visual balance. We characterize scene
composition by analyzing the locations and orientations of
semantic lines. The prominent lines in photos are extract-
ed by the Hough transform and are classified into horizon-
tal lines and vertical lines. Our scene composition features
include the average orientations of horizontal lines and ver-
tical lines, the average vertical position of horizontal lines,
and the average horizontal position of vertical lines.
(a)
(b)
(c)
(d)
(e)
Figure 5. (a),(b),(c): Mixture components for images best tted
with single sector templates. Color wheels on top right side show
the mixture components. The center and width of each gray sector
are set to mean and standard deviation of each mixture component.
Color wheels on down right side show hue histograms of images.
(d),(e): Mixture components for images best fitted with double
sector templates.
4. Subject Area Extraction Methods
The way to detect subject areas in photos depends on
photo content. When taking close-up photos of animals,
plants, and statics, photographers often use a macro lens to
focus on the main subjects, such that photos are clear on
the main subjects and blurred in other areas. For human
portraits, viewers’ attentions are often attracted by human
faces. In outdoor photography, architectures, mountains,
and trees are often the main subjects.
We propose a clarity based method to find clear region-
s in low depth of field images, which take the majority
of high-quality photographs in the categories of “animal”,
“plant”, and “static”. We adopt a layout based method [
9]
to segment vertical standing objects, which are treated as
subject areas by us, in photos from the categories of “land-
scape” and “architecture”. For photos in the category of
“human”, we use human detector and face detector to lo-
cate faces.
4.1. Clarity based region detection
A clarity based subject area detection method was pro-
posed in [12]. Since it used a rectangle to represent the
subject area and fitted it to pixels with high clarity, the de-
tection results were not accurate. We improve the accuracy
by oversegmentation. We first obtain a mask U
0
of the clear
area using a method proposed in [12], which labels each
pixel as clear or blur. The mask is improved by an iterative

(a) (b) (c)
Figure 6. (a): From top downwards: The input photo; result of
clarity based detector (white region); result of layout based de-
tector (red region). (b),(c): First row: face and human detection
result. Second row: clarity based detection results.
procedure. A pixel is labeled as clear if it falls in the con-
vex hull of its neighboring pixels labeled as clear. The step
repeats until convergence. Then a photo is segmented into
super-pixels [15]. A super-pixel is labeled as clear if more
than half of its pixels are labeled as clear. The comparison
of the method in [12] and ours can be found in Figure 3.
4.2. Layout based region detection
Hoiem et al. [9] proposed a method to recover the sur-
face layout from an outdoor image. The scene is segmented
into sky regions, ground regions, and vertical standing ob-
jects as shown in Figure 6. We take vertical standing objects
as subject areas.
4.3. Human based region detection
We employ face detection [21] to extract faces from hu-
man photos. For images where face detection fails, we use
human detection [4] to roughly estimate the locations of
faces. See examples in Figure 6.
5. Regional Features
We have developed new regional features to work togeth-
er with our proposed subject area detectors. We propose a
new dark channel feature to measure both the clarity and
the colorfulness of the subject areas. We also specially de-
sign a set of features for “human” photos to measure clarity,
brightness, and lighting effects of faces. New features are
proposed to measure the complexities of the subject areas
and the background.
5.1. Dark Channel Feature
Dark channel was introduced by He et al. [7, 8] for haze
removal. The dark channel of an image I is defined as:
I
dark
(i) = min
cR,G,B
( min
i
Ω(i)
I
c
(i
))
0.5 1.5 2.5 3.5 4.5
0.085
0.095
0.105
0.115
Figure 7. (a) A close-up on plant and its dark channel. (b) Land-
scape photographs with different color composition. (c) Average
dark channel value of input photo from (a) blurred by Gaussian
kernel. (d) For each point on the circle: its hue is indicated by the
hue wheel, saturation is equal to the radius, and normalized dark
channel value is presented by its pixel intensity.
where I
c
is a color channel of I and Ω(i) is the neighbor-
hood of pixel i. We choose Ω(i) as a 10 × 10 local patch.
We normalize the dark channel value by the sum of RGB
channels to reduce the effect of brightness. The dark chan-
nel feature of a photo I is computed as the average of the
normalized dark channel values in the subject areas:
1
kSk
X
(i)S
I
dark
(i)
P
cR,G,B
I
c
(i)
with S the subject area of I.
The dark channel feature is a combined measurement of
clarity, saturation, and hue composition. Since dark chan-
nel is essentially a minimum filter on RGB channels, blur-
ring the image would average the channel values locally and
thus increase the response of the minimum filter. Figure 7
(c) shows that the dark channel value of an image increases
with the degree it is blurred. Subject area of low depth of
field images show lower dark channel value than the back-
ground as shown in Figure 7 (a). For pixels of the same hue
value, those with higher saturation gives lower dark channel
values (Figure
7 (d)). As shown in Figure 7 (b), low-quality
photograph with dull color gives higher average dark chan-
nel value. In addition, different hue values gives different
dark channel values (Figure 7(d)). So the dark channel fea-
ture also incorporates hue composition information.
5.2. Human based Feature
Faces in high-quality human portraits usually possess a
reasonable portion of the photo, have high clarity, and show
professional employment of lighting. Therefore, we extract
the features of the ratio of face areas, the average lighting
of faces, the ratio of shadow areas, and the face clarity to
assess the quality of human photos.

Citations
More filters
Proceedings ArticleDOI

AVA: A large-scale database for aesthetic visual analysis

TL;DR: A new large-scale database for conducting Aesthetic Visual Analysis: AVA, which contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style is introduced.
Journal ArticleDOI

Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data

TL;DR: A new no-reference (NR) IQA model is developed and a robust image enhancement framework is established based on quality optimization, which can well enhance natural images, low-contrast images,Low-light images, and dehazed images.
Journal ArticleDOI

No-Reference Image Sharpness Assessment in Autoregressive Parameter Space

TL;DR: A new no-reference (NR)/ blind sharpness metric in the autoregressive (AR) parameter space is established via the analysis of AR model parameters, first calculating the energy- and contrast-differences in the locally estimated AR coefficients in a pointwise way, and then quantifying the image sharpness with percentile pooling to predict the overall score.
Journal ArticleDOI

A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping

TL;DR: A neural network that has two branches for attention box prediction (ABP) and aesthetics assessment (AA) that produces high-quality cropping results, even with the limited availability of training data for photo cropping.
Proceedings ArticleDOI

Content-based photo quality assessment

TL;DR: This paper divides the photos into seven categories based on their content and develops a set of new subject area extraction methods and new visual features, which are specially designed for different categories, which significantly outperform the state-of-the-art methods.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Journal ArticleDOI

A model of saliency-based visual attention for rapid scene analysis

TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.

A model of saliency-based visual attention for rapid scene analysis

Laurent Itti
TL;DR: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented, which breaks down the complex problem of scene understanding by rapidly selecting conspicuous locations to be analyzed in detail.
Journal ArticleDOI

Single Image Haze Removal Using Dark Channel Prior

TL;DR: A simple but effective image prior - dark channel prior to remove haze from a single input image is proposed, based on a key observation - most local patches in haze-free outdoor images contain some pixels which have very low intensities in at least one color channel.
Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Content-based photo quality assessment" ?

In this paper, the authors propose content-based photo quality assessment using regional and global features. Under this framework, subject areas, which draw the most attentions of human eyes, are first extracted. Since professional photographers may adopt different photographic techniques and may have different aesthetic criteria in mind when taking different types of photos ( e. g. landscape versus portrait ), the authors propose to segment regions and extract visual features in different ways according to the categorization of photo content. Therefore the authors divide the photos into seven categories based on their content and develop a set of new subject area extraction methods and new visual features, which are specially designed for different categories. Another contribution of this work is to construct a large and diversified benchmark database for the research of photo quality assessment. 

The authors will leave the integration of automatic photo categorization and quality assessment as the future work. 

For landscape photos, well balanced spatial structure, professional hue composition, and proper lighting are considered as traits of professional photography. 

The authors adopt a layout based method [9] to segment vertical standing objects, which are treated as subject areas by us, in photos from the categories of “landscape” and “architecture”. 

Photos are manually divided into seven categories based on photo content: “animal”, “plant”, “static”, “architecture”, “landscape”, “human”, and “night”. 

The clarity of face regions is computed through Fourier transform by measuring ratio of the area of high frequency component area to that of all frequency components. 

Wong et al. [20] and Nishiyama et al. [14] used saliency map to extract the subject areas, which were assumed to have higher brightness and contrast than other regions. 

Based on theextracted hue composition features, two Gaussian mixture models are separately trained for the two types of templates. 

Existing methods of assessing photo quality from the aesthetic point of view can be generally classified into using global features and using regional features. 

In this paper, the authors propose content based photo quality assessment together with a set of new subject area detection methods, new global and regional features. 

The likelihood ratio P (I|high)/P (I|low) of a photo being high-quality or low-quality can be computed from the Gaussian mixture models and is used for classification. 

Since dark channel is essentially a minimum filter on RGB channels, blurring the image would average the channel values locally and thus increase the response of the minimum filter. 

Their proposed face features are very effective for “human” photos and enhanced the best performance (0.78) got by previous features to 0.95.