scispace - formally typeset
Open AccessProceedings ArticleDOI

High level describable attributes for predicting aesthetics and interestingness

TLDR
This paper demonstrates a simple, yet powerful method to automatically select high aesthetic quality images from large image collections and demonstrates that an aesthetics classifier trained on describable attributes can provide a significant improvement over baseline methods for predicting human quality judgments.
Abstract
With the rise in popularity of digital cameras, the amount of visual data available on the web is growing exponentially. Some of these pictures are extremely beautiful and aesthetically pleasing, but the vast majority are uninteresting or of low quality. This paper demonstrates a simple, yet powerful method to automatically select high aesthetic quality images from large image collections. Our aesthetic quality estimation method explicitly predicts some of the possible image cues that a human might use to evaluate an image and then uses them in a discriminative approach. These cues or high level describable image attributes fall into three broad types: 1) compositional attributes related to image layout or configuration, 2) content attributes related to the objects or scene types depicted, and 3) sky-illumination attributes related to the natural lighting conditions. We demonstrate that an aesthetics classifier trained on these describable attributes can provide a significant improvement over baseline methods for predicting human quality judgments. We also demonstrate our method for predicting the “interestingness” of Flickr photos, and introduce a novel problem of estimating query specific “interestingness”.

read more

Content maybe subject to copyright    Report

S
S
S
t
t
t
o
o
o
n
n
n
y
y
y
B
B
B
r
r
r
o
o
o
o
o
o
k
k
k
U
U
U
n
n
n
i
i
i
v
v
v
e
e
e
r
r
r
s
s
s
i
i
i
t
t
t
y
y
y
The official electronic file of this thesis or dissertation is maintained by the University
Libraries on behalf of The Graduate School at Stony Brook University.
©
©
©
A
A
A
l
l
l
l
l
l
R
R
R
i
i
i
g
g
g
h
h
h
t
t
t
s
s
s
R
R
R
e
e
e
s
s
s
e
e
e
r
r
r
v
v
v
e
e
e
d
d
d
b
b
b
y
y
y
A
A
A
u
u
u
t
t
t
h
h
h
o
o
o
r
r
r
.
.
.

High Level Describable Attributes for Predicting Aesthetics and
Interestingness
A Thesis Presented
by
Sagnik Dhar
to
The Graduate School
in Partial Fulfillment of the
Requirements
for the Degree of
Master of Science
in
Computer Science
Stony Brook University
December 2010

Stony Brook University
The Graduate School
Sagnik Dhar
We, the thesis committee for the above candidate for the
Master of Science degree,
hereby recommend acceptance of this thesis.
Professor Tamara Berg - Advisor
Assistant Professor, Department of Computer Science
Professor Dimitris Samaras - Chairperson of Defense
Associate Professor, Department of Computer Science
Professor Alexander Berg
Assistant Professor, Department of Computer Science
This thesis is accepted by the Graduate School
Lawrence Martin
Dean of the Graduate School
ii

Abstract of the Thesis
High Level Describable Attributes for Predicting Aesthetics and
Interestingness
by
Sagnik Dhar
Master of Science
in
Computer Science
Stony Brook University
2010
With the rise in popularity of digital cameras, the amount of visual data available
on the web is growing exponentially. Some of these pictures are extremely beautiful
and aesthetically pleasing. Unfortunately the vast majority are uninteresting or of
low quality. This paper demonstrates a simple, yet powerful method to automatically
select high aesthetic quality images from large image collections with performance
significantly better than the state of the art. We also show significantly better results
on predicting the interestingness of Flickr images, and on a novel problem of predicting
query specific interestingness. Our aesthetic quality estimation method explicitly
predicts some of the possible image cues that a human might use to evaluate an
image and then uses them in a discriminative approach. These cues or high level
describable image attributes fall into three broad types: 1) compositional attributes
related to image layout or configuration, 2) content attributes related to the objects or
scene types depicted, and 3) sky-illumination attributes related to the natural lighting
conditions. We demonstrate that an aesthetics classifier trained on these describable
attributes can provide a significant improvement over state of the art methods for
predicting human quality judgments.
iii

Dedicated to my mother, whose forays into the world of Architecture instilled in me
my earliest notions of aesthetics.

Citations
More filters
Proceedings ArticleDOI

AVA: A large-scale database for aesthetic visual analysis

TL;DR: A new large-scale database for conducting Aesthetic Visual Analysis: AVA, which contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style is introduced.
Journal ArticleDOI

Salient Object Detection: A Survey

TL;DR: A comprehensive review of recent progress in salient object detection is provided and this field is situate among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction.
Proceedings ArticleDOI

RAPID: Rating Pictorial Aesthetics using Deep Learning

TL;DR: The RAPID (RAting PIctorial aesthetics using Deep learning) system is presented, which adopts a novel deep neural network approach to enable automatic feature learning and style attributes of images to help improve the aesthetic quality categorization accuracy.
Journal ArticleDOI

Transient attributes for high-level understanding and editing of outdoor scenes

TL;DR: This work studies "transient scene attributes" -- high level properties which affect scene appearance, such as "snow", "autumn", "dusk", "fog", and defines 40 transient attributes and uses crowdsourcing to annotate thousands of images from 101 webcams to train regressors that can predict the presence of attributes in novel images.
Proceedings ArticleDOI

Streetscore -- Predicting the Perceived Safety of One Million Streetscapes

TL;DR: The predictive power of commonly used image features is studied using support vector regression, finding that Geometric Texton and Color Histograms along with GIST are the best performers when it comes to predict the perceived safety of a streetscape.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Robust real-time face detection

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Proceedings ArticleDOI

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Journal ArticleDOI

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Journal ArticleDOI

Computational modelling of visual attention.

TL;DR: Five important trends have emerged from recent work on computational models of focal visual attention that emphasize the bottom-up, image-based control of attentional deployment, providing a framework for a computational and neurobiological understanding of visual attention.
Frequently Asked Questions (4)
Q1. What are the contributions in this paper?

This paper demonstrates a simple, yet powerful method to automatically select high aesthetic quality images from large image collections with performance significantly better than the state of the art. The authors also show significantly better results on predicting the interestingness of Flickr images, and on a novel problem of predicting query specific interestingness. Their aesthetic quality estimation method explicitly predicts some of the possible image cues that a human might use to evaluate an image and then uses them in a discriminative approach. The authors demonstrate that an aesthetics classifier trained on these describable attributes can provide a significant improvement over state of the art methods for predicting human quality judgments. 

In the future, the authors plan to expand their set of attributes to extract other describable image features, and to apply these attributes to related tasks such as image emotion estimation. The authors also plan to more thoroughly explore ideas of query specific interestingness, including methods for query specific attribute selection, and methods for interestingness transfer. The following applications could potentially be very useful: 43 1. Classification of aesthetic quality is done using a measure of how much a test image deviates from the ideal image. The measure of deviation of the test image from the ideal image could be used to suggest potential changes to the user/photographer to help improve the aesthetic quality of his photograph. 

264.1 Decision-Tree based SVMs . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Radial-Basis-Function Kernel & Polynomial Kernel results . . . . . . 30 4.3 Attribute contribution for classification . . . . . . . . . . . . . . . . . 32vBibliography 46vi 

These cues or high level describable image attributes fall into three broad types: 1) compositional attributes related to image layout or configuration, 2) content attributes related to the objects or scene types depicted, and 3) sky-illumination attributes related to the natural lighting conditions.