What have the authors stated for future works in "Content-based photo quality assessment" ?

The authors will leave the integration of automatic photo categorization and quality assessment as the future work.

How is the clarity of face regions calculated?

The clarity of face regions is computed through Fourier transform by measuring ratio of the area of high frequency component area to that of all frequency components.

What are the two types of features that are used to assess photo quality?

Existing methods of assessing photo quality from the aesthetic point of view can be generally classified into using global features and using regional features.

What features are proposed for the composition of photos?

In this paper, the authors propose content based photo quality assessment together with a set of new subject area detection methods, new global and regional features.

What is the performance of the proposed face features?

Their proposed face features are very effective for “human” photos and enhanced the best performance (0.78) got by previous features to 0.95.

(Open Access) Content-Based Photo Quality Assessment (2013) | Xiaoou Tang

Q: What contributions have the authors mentioned in the paper "Content-based photo quality assessment" ?

In this paper, the authors propose content-based photo quality assessment using regional and global features. Under this framework, subject areas, which draw the most attentions of human eyes, are first extracted. Since professional photographers may adopt different photographic techniques and may have different aesthetic criteria in mind when taking different types of photos ( e. g. landscape versus portrait ), the authors propose to segment regions and extract visual features in different ways according to the categorization of photo content. Therefore the authors divide the photos into seven categories based on their content and develop a set of new subject area extraction methods and new visual features, which are specially designed for different categories. Another contribution of this work is to construct a large and diversified benchmark database for the research of photo quality assessment.

Q: What is the method to detect subject areas in photos?

The authors adopt a layout based method [9] to segment vertical standing objects, which are treated as subject areas by us, in photos from the categories of “landscape” and “architecture”.

Q: What are the categories of photos that are manually divided?

Photos are manually divided into seven categories based on photo content: “animal”, “plant”, “static”, “architecture”, “landscape”, “human”, and “night”.

Q: What is the metric for the two types of templates?

Based on theextracted hue composition features, two Gaussian mixture models are separately trained for the two types of templates.

Content-Based Photo Quality Assessment

Wei Luo

, Xiaogang Wang

2,3

, and Xiaoou Tang

1,3

Department of Information Engineering, The Chinese University of Hong Kong

Department of Electronic Engineering, The Chinese University of Hong Kong

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China

lw010@ie.cuhk.edu.hk xgwang@ee.cuhk.edu.hk xtang@ie.cuhk.edu.hk

Abstract

Automatically assessing photo quality from the perspec-

tive of visual aesthetics is of great interest in high-level vi-

sion research and has drawn much attention in recent years.

In this paper, we propose content-based photo quality as-

sessment using regional and global features. Under this

framework, subject areas, which draw the most attentions

of huma n eyes, are ﬁrst extracted. Then regional features

extracted from subject areas and the background regions

are combined with global feature s to assess the photo qual-

ity. Since professional photographers may adopt different

photographic techniques and may have different aesthetic

criteria in mind when taking different types of photos (e.g.

landscape versus portrait), we propose to segment region-

s and extract visual features in different ways according to

the categorization of photo content. Therefore we divide the

photos into seven categories based on their content and de-

velop a set of new subject area extraction methods and new

visual features, which are specially designed for different

categories. This argument is supported by extensive exper-

imental comparisons of existing photo quality assessment

approaches as well as our new regional and global features

over different categories of photos. Our new features sig -

niﬁcantly outperform the state-of-the- art methods. Another

contribution of this work is to construct a large and diver-

siﬁed benchmark database for the research of photo quality

assessment. It includes 17, 613 photos with manually la-

beled grou nd truth.

1. Introduction

Automatic assessment of photo quality based on aes-

thetic perception gains increasing interest in computer vi-

sion community. It has important applications. For ex-

ample, when users search images on the web, they expect

This work is partially supported by the Research Grants Council of

Hong Kong SAR (Grant No. 416510).

(a) (b) (c)

Figure 1. Subject areas of photos. (a) Close-up for a bird. (b)

Architecture. (c) Human portrait.

the search engine to rank the retrieved images according to

their relevance to the queries as well as their quality. Var-

ious methods of automatic photo quality assessment were

proposed in recent years [16, 18, 11, 5, 12, 20, 10]. In ear-

ly works, only global visual features, such as global edge

distribution and exposure, were used [11]. However, lat-

er studies [5, 12, 20] showed that regional features lead to

better performance, since human beings perceive subjec-

t areas differently from the background (see examples in

Figure 1). After extracting the subject areas, which draw

the most attentions of human eyes, regional features are ex-

tracted from the subject areas and the background separate-

ly and are used for assessing photo quality. Both Regional

and global features will be used in our work.

One major problem with the existing methods is that they

treat all photo equally without considering the diversity in

photo content. It is known that professional photographers

adopt different photographic techniques and have different

aesthetical criteria in mind when taking different types of

photos [

2, 19]. For example, for close-up photographs (e.g.

Figure

1 (a)), viewers appreciate the high contrast between

the foreground and background regions. In human portrait-

s photography (e.g. Figure 1 (c)), professional photogra-

phers use special lighting settings [6] to create aesthetically

pleasing patterns on human faces. For landscape photos,

well balanced spatial structure, professional hue composi-

tion, and proper lighting are considered as traits of profes-

sional photography.

Also, the subject areas of different types of photos should

landscape plant animal night human static architecture

Figure 2. Photos divided into seven categories according to content. First row: high quality photos; Second row: low quality photos.

be extracted in different ways. In a close-up photo, the sub-

ject area is emphasized using the low depth of ﬁeld tech-

nique, which leads to blurred background and clear fore-

ground. However, in human portrait photos, the background

does not have to be blurred since the attentions of viewers

are automatically attracted by the presence of human faces.

Their subject areas can be better detected by a face detec-

tor. In landscape photos, it is usually the case that the entire

scene is clear and tidy. Their subject areas, such as moun-

tains, houses, and plants, are often vertical standing objects.

This can be used as a cue to extract subject areas in this type

of photos.

1.1. Our Approach

Motivated by these considerations, we propose content-

based photo quality assessment. Photos are manually divid-

ed into seven categories based on photo content: “animal”,

“plant”, “static”, “architecture”, “landscape”, “human”, and

“night”. See examples in Figure

2. Regional and global

features are selected and combined in different ways when

assessing photos in different categories. More speciﬁcally,

we propose three methods of extracting subject areas.

• Clarity based region detection combines blur kernel

estimation with image segmentation to accurately ex-

tract the clear region as the subject area.

• Layout based region detectio n analyzes the layout

structure of a photo and extracts vertical standing ob-

jects.

• Human based dete ction locates faces in the photo with

a face detector or a human detector.

Based on the extracted subject areas, three types of new re-

gional features are proposed.

• Dark channel feature measures the clearness and the

colorfulness of the subject areas.

• Complexity features use the numbers of segmentations

to measure the spatial complexity of the subject area

and the background.

• Human based features capture the clarity, brightness,

and lighting effects of human faces.

In addition, two types of new global features are proposed.

• Hue composition feature ﬁts photos with color compo-

sition schemes.

• Scene composition features capture the spatial struc-

tures of photos from semantic lines.

These new methods and features are introduced in Sec-

tion

3-5, which emphasize on dark channel feature, hue

composition feature, and human based features, since they

lead to the best performance in most categories. Through

extensive experiments on a large and diverse benchmark

database, the effectiveness of different subject area extrac-

tion methods and different features on different photo cate-

gories are summarized in Table 1. These features are com-

bined by a SVM trained on each of the categories separately.

Experimental comparisons show that our proposed new fea-

tures signiﬁcantly outperform existing features. To the best

of our knowledge, it is the ﬁrst systematic study of photo

quality features on different photo categories.

2. Related Work

Existing methods of assessing photo quality from the

aesthetic point of view can be generally classiﬁed into using

global features and using regional features. Tong et al. [18]

used boosting to combine global low-level features for the

classiﬁcation of professional and amateurish photos. How-

ever, these features were not specially designed for photo

quality assessment. To better mimic human aesthetical per-

ception, Ke et al. [11] designed a set of high-level semantic

features based on rules of thumb of photography. They mea-

sured the global distributions of edges, blurriness, hue, and

brightness.

Some approaches employed regional features by detect-

ing subject areas, since human beings percept subject areas

differently from the background. Datta et al. [5] divided a

photo into 3 × 3 blocks and assumed that the central block

(a1)

(b1)

(a2) (c1)

(b2) (c2)

Figure 3. (a1) and (b1) are input photos. (a2) is the subject area

(green rectangle) extracted by the method in [12]. The green rect-

angle cannot accurately represent the subject area. (b2) saliency

map with the subject area (red regions) extracted by the method in

[20]. Because of the very high brightness in the red regions, other

subject area is ignored. (c1) and (c2) are the subject areas (white

regions) extracted by our clarity based region detection method

described in Section 4.1.

is the subject area. Luo et al. [12] assumed that in a high

quality photo the subject area has a higher clarity than the

background. Therefore, clarity based criterions were used

to detect the subject area, which was ﬁtted by a rectangle.

Visual features of clarity contrast, lighting contrast, and ge-

ometry composition extracted from the subject areas and

the background were used as regional features. Although

it worked well on some types of photos, such as “animal”,

“plant”, and “static”, it might fail on the photos of “architec-

ture” and “landscape” whose subject areas and background

both have high clarity. Also a rectangle is not an accurate

representation of the subject area and may decrease the per-

formance. Wong et al. [20] and Nishiyama et al. [14] used

saliency map to extract the subject areas, which were as-

sumed to have higher brightness and contrast than other re-

gions. However, if a certain part of the subject area has very

high brightness and contrast, other parts will be ignored by

this method. See examples in Figure 3.

3. Global Features

Professionals follow certain rules of color composition

and scene composition to produce aesthetically pleasing

photographs. For example, photographers focus on artistic

color combination and properly put color accents to create

unique composition solution and to invoke certain feeling a-

mong the viewers of their artworks. They also try to arrange

objects in the scene according to such empirical guidelines

like “rule of thirds”. Based on these techniques of photog-

raphy composition, we propose two global features to mea-

sure the quality of hue composition and scene composition.

3.1. Hue Composition Feature

Proper arrangement of colors engages viewers and cre-

ates inner sense of order and balance. Major color tem-

plates [13, 17] can be classiﬁed as subordination and co-

ordination . Subordination requires the photographer to set

a dominant color spot and to arrange the rest of colors to

correlate with it in harmony or contrast. It includes cer-

tain color schemes, such as the 90

color scheme and the

Complementary color scheme, which leads to aesthetically

pleasing images. With coordination, the color composition

is created with help of different gradation of one single col-

or. It includes the Monochromatic color scheme and the

Analogous color scheme. See examples in Figure

Color templates can be mathematically approximated on

the color wheel as shown in Figure 4. A coordination color

scheme can be approximated by a single sector with the cen-

ter (α

) and the width (w

) (Figure

4 (a)). A subordination

color scheme can be approximated by two sectors with cen-

ters (α

, α

) and widths (w

, w

) (Figure 4 (d)). Although it

is possible to assess photo quality by ﬁtting the color distri-

bution of a photo to some manually deﬁned color templates,

our experimental results show that such an approach is sub-

optimal. It cannot automatically adapt to different types of

photos either. We choose to learn the models of hue com-

position from training data. The models of hue composition

for high- and low-quality photos will be learned separately.

The learning steps are described below.

Given an image I, we ﬁrst decide whether it should be

ﬁtted by a color template with a single sector (T

) or two

sectors (T

) by computing the following metric,

(I) = min

i∈I

D(H(i), T

) · S(i) + λA(T

)

where k = 1, 2. i is a pixel on I. H(i) and S(i) are the

hue and saturation of pixel i. D(H(i), T

) is zero if H(i)

falls in the sector of the template; otherwise it is calculat-

ed as the arc-length distance of H(i ) to the closest sector

border. A(T

) is the width of the sectors (A(T

) = w

and A(T

) = w

+ w

). λ is empirically set as 0.03.

(I) is calculated by ﬁtting the template T

, which has

adjustable parameters, to image I. T

is controlled by

parameters (α

, w

) and T

is controlled by parameters

(α

, w

, α

, w

). This metric is inspired by the color har-

mony function [

3]. However, we assume that the width of

the sector is changeable and add a penalty on it. The single

sector is chosen if E

(I) < E

(I) and vice versa.

If I is ﬁtted with a single-sector template, the average

saturation s

of pixels inside this sector is computed. s

and α

, the hue center of the ﬁtting sector, are used as the

hue composition features of this photo. If I is ﬁtted with

a two-sector template, a four dimensional feature vector

(α

, s

, α

, s

), which includes average hue and saturation

centers, are extracted from the two sectors. Based on the

α1

ω1

α1

ω1

α2

ω2

Monochromatic Analogous

Complementary 90 degree

(b)

(e)

(a)

(d)

(c)

(f)

Figure 4. Harmonic templates on the hue wheel used in [3]. An

image is considered as harmonic if most of its hue fall within the

gray sectors(s) on the template. The shapes of templates are ﬁxed.

Templates may be rotated by an arbitrary angle. The templates

correspond to different color schemes.

extracted hue composition features, two Gaussian mixture

models are separately trained for the two types of templates.

Examples of training results of high-quality photos in the

category “landscape” are shown in Figure 5. Among 410

training photos, 83 are ﬁtted with single-sector templates

and 327 are ﬁtted with two-sector templates. Three Gaus-

sian mixture components are used to model hue composi-

tion features of photos belonging to single-sector templates.

Two Gaussian mixtures components are used to model the

hue composition features of photos belonging to two-sector

templates. One photo best ﬁtting each of the mixture com-

ponents is shown in Figure

5. We ﬁnd some interesting

correlations between the learned components and the col-

or schemes. For examples, the components in Figure 5(a)

and (b) correlates more with the monochromatic schemes

centered at red and yellow. The components in Figure 5(c)

and (e) more correlate with the analogous color scheme and

the complementary color scheme.

The likelihood ratio P(I|high)/ P (I|low) of a photo be-

ing high-quality or low-quality can be computed from the

Gaussian mixture models and is used for classiﬁcation.

3.2. Scene Composition Feature

High quality photos show well-arranged spatial compo-

sition to hold attention of the viewer. Long continuous lines

often bear semantic meanings, such as the horizon and the

surface of water, in those photos. They can be used to com-

pute scene composition features. For example, the location

of the horizon in outdoor photos was used by Bhattacharya

et al. [

1] to assess the visual balance. We characterize scene

composition by analyzing the locations and orientations of

semantic lines. The prominent lines in photos are extract-

ed by the Hough transform and are classiﬁed into horizon-

tal lines and vertical lines. Our scene composition features

include the average orientations of horizontal lines and ver-

tical lines, the average vertical position of horizontal lines,

and the average horizontal position of vertical lines.

(a)

(b)

(c)

(d)

(e)

Figure 5. (a),(b),(c): Mixture components for images best ﬁtted

with single sector templates. Color wheels on top right side show

the mixture components. The center and width of each gray sector

are set to mean and standard deviation of each mixture component.

Color wheels on down right side show hue histograms of images.

(d),(e): Mixture components for images best ﬁtted with double

sector templates.

4. Subject Area Extraction Methods

The way to detect subject areas in photos depends on

photo content. When taking close-up photos of animals,

plants, and statics, photographers often use a macro lens to

focus on the main subjects, such that photos are clear on

the main subjects and blurred in other areas. For human

portraits, viewers’ attentions are often attracted by human

faces. In outdoor photography, architectures, mountains,

and trees are often the main subjects.

We propose a clarity based method to ﬁnd clear region-

s in low depth of ﬁeld images, which take the majority

of high-quality photographs in the categories of “animal”,

“plant”, and “static”. We adopt a layout based method [

to segment vertical standing objects, which are treated as

subject areas by us, in photos from the categories of “land-

scape” and “architecture”. For photos in the category of

“human”, we use human detector and face detector to lo-

cate faces.

4.1. Clarity based region detection

A clarity based subject area detection method was pro-

posed in [12]. Since it used a rectangle to represent the

subject area and ﬁtted it to pixels with high clarity, the de-

tection results were not accurate. We improve the accuracy

by oversegmentation. We ﬁrst obtain a mask U

of the clear

area using a method proposed in [12], which labels each

pixel as clear or blur. The mask is improved by an iterative

(a) (b) (c)

Figure 6. (a): From top downwards: The input photo; result of

clarity based detector (white region); result of layout based de-

tector (red region). (b),(c): First row: face and human detection

result. Second row: clarity based detection results.

procedure. A pixel is labeled as clear if it falls in the con-

vex hull of its neighboring pixels labeled as clear. The step

repeats until convergence. Then a photo is segmented into

super-pixels [15]. A super-pixel is labeled as clear if more

than half of its pixels are labeled as clear. The comparison

of the method in [12] and ours can be found in Figure 3.

4.2. Layout based region detection

Hoiem et al. [9] proposed a method to recover the sur-

face layout from an outdoor image. The scene is segmented

into sky regions, ground regions, and vertical standing ob-

jects as shown in Figure 6. We take vertical standing objects

as subject areas.

4.3. Human based region detection

We employ face detection [21] to extract faces from hu-

man photos. For images where face detection fails, we use

human detection [4] to roughly estimate the locations of

faces. See examples in Figure 6.

5. Regional Features

We have developed new regional features to work togeth-

er with our proposed subject area detectors. We propose a

new dark channel feature to measure both the clarity and

the colorfulness of the subject areas. We also specially de-

sign a set of features for “human” photos to measure clarity,

brightness, and lighting effects of faces. New features are

proposed to measure the complexities of the subject areas

and the background.

5.1. Dark Channel Feature

Dark channel was introduced by He et al. [7, 8] for haze

removal. The dark channel of an image I is deﬁned as:

dark

(i) = min

c∈R,G,B

( min

′

∈Ω(i)

′

))

0.5 1.5 2.5 3.5 4.5

0.085

0.095

0.105

0.115

Regularized Dark Channel Value

Blur Kernel size

Dark = 0.0735

Dark = 0.0083

(a) (b)

(c)

(d)

Figure 7. (a) A close-up on plant and its dark channel. (b) Land-

scape photographs with different color composition. (c) Average

dark channel value of input photo from (a) blurred by Gaussian

kernel. (d) For each point on the circle: its hue is indicated by the

hue wheel, saturation is equal to the radius, and normalized dark

channel value is presented by its pixel intensity.

where I

is a color channel of I and Ω(i) is the neighbor-

hood of pixel i. We choose Ω(i) as a 10 × 10 local patch.

We normalize the dark channel value by the sum of RGB

channels to reduce the effect of brightness. The dark chan-

nel feature of a photo I is computed as the average of the

normalized dark channel values in the subject areas:

kSk

(i)∈S

dark

(i)

c∈R,G,B

(i)

with S the subject area of I.

The dark channel feature is a combined measurement of

clarity, saturation, and hue composition. Since dark chan-

nel is essentially a minimum ﬁlter on RGB channels, blur-

ring the image would average the channel values locally and

thus increase the response of the minimum ﬁlter. Figure 7

with the degree it is blurred. Subject area of low depth of

ﬁeld images show lower dark channel value than the back-

ground as shown in Figure 7 (a). For pixels of the same hue

value, those with higher saturation gives lower dark channel

values (Figure

7 (d)). As shown in Figure 7 (b), low-quality

photograph with dull color gives higher average dark chan-

nel value. In addition, different hue values gives different

dark channel values (Figure 7(d)). So the dark channel fea-

ture also incorporates hue composition information.

5.2. Human based Feature

Faces in high-quality human portraits usually possess a

reasonable portion of the photo, have high clarity, and show

professional employment of lighting. Therefore, we extract

the features of the ratio of face areas, the average lighting

of faces, the ratio of shadow areas, and the face clarity to

assess the quality of human photos.

Content-Based Photo Quality Assessment

Figures

Citations

AVA: A large-scale database for aesthetic visual analysis

Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data

No-Reference Image Sharpness Assessment in Autoregressive Parameter Space

A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping

Content-based photo quality assessment

References

Histograms of oriented gradients for human detection

The Pascal Visual Object Classes (VOC) Challenge

A model of saliency-based visual attention for rapid scene analysis

A model of saliency-based visual attention for rapid scene analysis

Single Image Haze Removal Using Dark Channel Prior

Related Papers (5)

Studying aesthetics in photographic images using a computational approach

AVA: A large-scale database for aesthetic visual analysis

The Design of High-Level Features for Photo Quality Assessment

Assessing the aesthetic quality of photographs using generic image descriptors

Photo and Video Quality Evaluation: Focusing on the Subject

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Content-based photo quality assessment" ?

Q2. What have the authors stated for future works in "Content-based photo quality assessment" ?

Q3. What are the characteristics of professional photography?

Q4. What is the method to detect subject areas in photos?

Q5. What are the categories of photos that are manually divided?

Q6. How is the clarity of face regions calculated?

Q7. What were the two methods used to extract the subject areas?

Q8. What is the metric for the two types of templates?

Q9. What are the two types of features that are used to assess photo quality?

Q10. What features are proposed for the composition of photos?

Q11. What is the likelihood ratio of a photo being high-quality or low-quality?

Q12. What is the difference between the dark channel and the light channel?

Q13. What is the performance of the proposed face features?