scispace - formally typeset
Open AccessJournal ArticleDOI

A survey on representation, composition and application of preferences in database systems

TLDR
The purpose of this survey is to provide a framework for placing existing works in perspective and highlight critical open challenges to serve as a springboard for researchers in database systems.
Abstract
Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision making problems. Recently, they have attracted the attention of researchers in other fields, such as databases where they capture soft criteria for queries. Databases bring a whole fresh perspective to the study of preferences, both computational and representational. From a representational perspective, the central question is how we can effectively represent preferences and incorporate them in database querying. From a computational perspective, we can look at how we can efficiently process preferences in the context of database queries. Several approaches have been proposed but a systematic study of these works is missing. The purpose of this survey is to provide a framework for placing existing works in perspective and highlight critical open challenges to serve as a springboard for researchers in database systems. We organize our study around three axes: preference representation, preference composition, and preference query processing.

read more

Content maybe subject to copyright    Report

19
A Survey on Representation, Composition and Application
of Preferences in Database Systems
KOSTAS STEFANIDIS, Chinese University of Hong Kong
GEORGIA KOUTRIKA, IBM Almaden Research Center
EVAGGELIA PITOURA, University of Ioannina
Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision
making problems. Recently, they have attracted the attention of researchers in other fields, such as databases
where they capture soft criteria for queries. Databases bring a whole fresh perspective to the study of
preferences, both computational and representational. From a representational perspective, the central
question is how we can effectively represent preferences and incorporate them in database querying. From
a computational perspective, we can look at how we can efficiently process preferences in the context of
database queries. Several approaches have been proposed but a systematic study of these works is missing.
The purpose of this survey is to provide a framework for placing existing works in perspective and highlight
critical open challenges to serve as a springboard for researchers in database systems. We organize our study
around three axes: preference representation, preference composition, and preference query processing.
Categories and Subject Descriptors: H.2.4 [Database Management]: Systems—Relational databases
General Terms: Algorithms, Design, Languages
Additional Key Words and Phrases: Preference modeling, preference queries
ACM Reference Format:
Stefanidis, K., Koutrika, G., and Pitoura, E. 2011. A survey on representation, composition and application
of preferences in database systems. ACM Trans. Datab. Syst. 36, 3, Article 19 (August 2011), 45 pages.
DOI = 10.1145/2000824.2000829 http://doi.acm.org/10.1145/2000824.2000829
1. INTRODUCTION
Preferences guide human decision making from early childhood (e.g., “which ice cream
flavor do you prefer?”) up to complex professional and organizational decisions (e.g.,
“which investment funds to choose?”). Preferences have traditionally been studied in
philosophy, psychology, and economics and applied to decision making problems. For
instance, in philosophy, they are used to reason about values and desires [Hansson
2001]. In mathematical decision theory, preferences (or utilities) model economic be-
havior [Fishburn 1999]. The notion of preference has in recent years drawn new atten-
tion from researchers in other fields, such as artificial intelligence, where they capture
agents’ goals [Boutilier et al. 1999; Delgrande et al. 2003; Wellman and Doyle 1991],
and databases, where they capture soft criteria for database queries. Explicit pref-
erence modeling provides a declarative way to choose among alternatives, whether
these are solutions of problems to solve, answers of database queries, decisions of a
Authors’ addresses: K. Stefanidis, Department of Computer Science and Engineering, Chinese University
of Hong Kong, Hong Kong; email: kstef@cs.uoi.gr; G. Koutrika, IBM Almaden Research Center; E. Pitoura,
Department of Computer Science, University of Ioannina, Ioannina, Greece.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or permissions@acm.org.
c
2011 ACM 0362-5915/2011/08-ART19 $10.00
DOI 10.1145/2000824.2000829 http://doi.acm.org/10.1145/2000824.2000829
ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

19:2 K. Stefanidis et al.
computational agent and so on. In this article, we focus on preferences in the context
of database queries.
In databases, interest in preferences was triggered by observing the limitations of
the Boolean database answer model, where query criteria are considered as hard by
default and a nonempty answer is returned only if it satisfies all the query criteria.
In this context, a user can face either of two problems: (i) the empty-answer problem,
where the conditions are too restrictive or the data cannot exactly match the query or
(ii) the too-many-answers problem, where too many results match the query. It is hard
to cope with these problems, especially if a user is not familiar with a structured query
language in order to formulate accurate queries and when accessing Web databases,
whose schema and contents are unknown.
Incorporating soft criteria or preferences in a query can help cope with these prob-
lems. The empty-answer problem can be tackled by relaxing some of the hard con-
straints in the query, that is, considering them as soft or as user wishes or by replacing
them by constraints that capture preferences related to the given query and return-
ing results that are ranked according to how well they match the modified query. The
too-many-answers problem can be tackled by strengthening the query with additional
preferences to rank and possibly focus the query results.
The study of preference queries in databases originated by Lacroix and Lavency
[1987], who proposed a simple extension of the relational calculus in which prefer-
ences for tuples satisfying given logical conditions can be expressed. For instance, one
could say: pick the tuples of R satisfying Q P1 P2; if the result is empty, pick
the tuples satisfying Q P1 ∧¬P2; if the result is empty, pick the tuples satisfying
Q ∧¬P1 P2. Gaasterland and Lobo [1994] introduced a simple formalism, where a
user provides a lattice of domain-independent values that define preferences and a set
of domain-specific user constraints qualified with lattice values. The constraints are
automatically incorporated into a relational or deductive database through a series of
syntactic transformations that produces an annotated deductive database. Query an-
swering procedures for deductive databases are then used, with minor modifications,
to obtain annotated answers to queries. Almost a decade later, the Web has made in-
formation easily accessible and renewed interest in preferences was triggered by the
need to make (Web) databases more user-friendly.
Several approaches have been proposed since t hen but a systematic study of them
is missing. It is the purpose of this article to provide a framework for studying various
approaches that deal with preferences in databases. In particular, our objective is to
survey in a holistic way approaches that: (i) define preferences, (ii) combine preferences,
and (iii) apply preferences to query processing. We organize our study around these
main axes as follows:
Preference Representation. Preferences naturally come into different flavors and peo-
ple may have a mix of different preferences. Works on preference modeling have
focused on different aspects of t he problem but two main philosophies can be dis-
tinguished on the basis of how preferences are formulated: qualitative approaches,
where preferences are expressed by comparing items (“I like westerns better than
comedies”) and quantitative ones, where a preference for a specific item is expressed
as a degree of interest in t his item (“my interest in westerns is 0.8 and in come-
dies 0.4”). We categorize preference representation approaches using the following
dimensions.
(1) Formulation. Preferences are formulated qualitatively or quantitatively.
(2) Granularity. Preferences can be expressed at different levels, that is, for tuples,
relations, relationships, and attributes.
(3) Context. Preferences can be context-free or can hold under specific conditions.
ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

A Survey on Representation, Composition and Application of Preferences 19:3
Fig. 1. Database schema.
Fig. 2. Database instance example.
(4) Aspects. Preferences may vary based on their intensity, elasticity, complexity, and
other aspects.
Preference Composition. Given a set of preferences over a set of tuples, different
composition mechanisms can be applied to infer (e.g., implicit preferences), combine
(e.g., through combining scoring functions), or override preferences (e.g., in prioritized
composition) and finally, derive a ranking of the tuples on the basis of how they match
these preferences. In this survey, we group preference composition mechanisms into
the following categories.
(1) Qualitative composition. These mechanisms combine preferences resulting in a
relative ( i.e., qualitative) ordering of the tuples.
(2) Quantitative composition. These mechanisms combine preferences by assigning
final s cores to the tuples, which are thus ordered in a quantitative way.
(3) Heterogeneous composition. These mechanisms are used to combine preferences
of different granularity, for example, preferences for relationships between tuples
with preferences for tuple attributes.
Preference Query Processing. Preferences are used in query processing to provide
users with customized results typically through ranking. There are roughly two dif-
ferent lines of work on using preferences in query processing. Namely, preferences are
exploited through the following.
(1) Expanding database queries. These methods assume the existence of a number of
user preferences and appropriately rewrite regular database queries to incorporate
them. This process is often referred to as query personalization.
(2) Employing preference operators. These methods use special database operators
(such as top-k or skyline) to explicitly express preferences within queries.
Our survey covers both approaches. We shall also discuss methods for improving
the performance of preferential query processing, for instance, by performing offline
preprocessing steps to construct rankings of database t uples based on preferences.
There is a large number of algorithms for the implementation of preference queries
(especially for top-k and skyline queries). We do not intend to provide an exhaustive
review for special classes of preference queries. We consider that drilling down to the
specifics of different implementations and algorithms is the subject of separate surveys
focusing on algorithms for a specific class of preference queries, such as the survey on
algorithms for top-k queries by Ilyas et al. [2008]. Instead, we aim at providing an
overview of the main approaches for different types of preference queries.
As a running example, we consider a simple database that stores information about
movies, consisting of three relations: movie, play, actor. Figure 1 depicts the schema of
this database. We shall also use the database instance shown in Figure 2.
This survey is organized as follows. We present existing approaches to prefer-
ence representation (Section 2) followed by mechanisms for preference composition
ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

19:4 K. Stefanidis et al.
(Section 3). Then, we study preferential query processing methods (Section 4). In the
final section (Section 5), we discuss other issues such as preference learning and revis-
ing, nonrelational preference models, other preference applications, and connections to
other disciplines that deal with preferences. We conclude with a discussion on critical
open challenges.
2. PREFERENCE REPRESENTATION
Understanding user preferences and finding appropriate representations for them is a
real challenge. There are quite a few approaches (preference models) in the literature
that deal with preference representation and composition and try to reach meaningful
conclusions regarding the desired answers of a database query from different perspec-
tives. In this section, we focus on representing individual preferences and in Section 3,
on mechanisms for preference composition.
We present preference representation based on how preferences are formulated (for-
mulation—Section 2.1), at what level they are expressed (granularity—Section 2.2),
when they hold (context—Section 2.3), and what they express (aspects—Section 2.4).
2.1. Preference Formulation
In general, preferences can be expressed either qualitatively or quantitatively. In the
qualitative approach, preferences between database tuples are specified directly, typ-
ically using binary preference relations. Preference relations may be specified using
logical formulas [Chomicki 2003] or special preference constructors [Kießling 2002]. In
the quantitative approach, preferences are expressed by assigning numerical scores to
database tuples. In this case, a tuple t
i
is preferred over a tuple t
j
, if and only if its
score is higher than the score of t
j
. Scores may be assigned through preference functions
(e.g., Agrawal and Wimmers [2000]) or as degrees of interest associated with specific
conditions that must be satisfied (e.g., Koutrika and Ioannidis [2004]).
In the following, we shall use R(A
1
,...,A
d
) to denote a relational schema with d
attributes A
i
,1 i d, where each attribute A
i
takes values from a domain dom(A
i
).
Let A ={A
1
, A
2
,...,A
d
} be the attribute set of R and dom(A) = dom(A
1
)×...×dom(A
d
)
be its value domain. We use t to denote a tuple (u
1
, u
2
, ..., u
d
) dom(A)ofR and r to
denote an instance (i.e., tuple set) of R.LetB A be a subset of the attribute set, t[B]
stands for the projection of t on B. Finally, P denotes a preference.
2.1.1. Qualitative Preferences.
In the qualitative approach, preferences are defined as
binary relations between two tuples. Given a set S, a binary relation B over S is a
subset of the Cartesian product S × S. For a pair (a, b)ofB, we use the notation a B
b, whereas for a pair (a, b) that does not belong to B, we use the notation ¬(a B b). A
preference relation is defined as follows.
Definition 1. Let R(A
1
,...,A
d
) be a relational schema and dom(A
i
) be the domain
of attribute A
i
,1 i d. A preference relation
P
over R is a subset of (dom(A
1
) × ...×
dom (A
d
)) × (dom(A
1
) × ...× dom (A
d
)).
The interpretation of a preference relation t
i
P
t
j
between two tuples t
i
and t
j
of R
is that t
i
is preferred over t
j
under
P
. We shall also say that t
i
is better than t
j
or that
t
i
dominates t
j
under
P
.
Next, we list several typical properties of binary relations that are useful in classi-
fying preference relations. A binary relation B over a set S is called:
—reflexive, if a S, a B a,
—irreflexive, if a S, ¬ (a B a),
—symmetric, if a, b S, a B b b B a,
—asymmetric, if a, b S, a B b ⇒¬(b B a),
ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

A Survey on Representation, Composition and Application of Preferences 19:5
(a) total order (b) weak order (c) strict partial order
Fig. 3. Examples o f preference graphs.
—antisymmetric, if a, b S,(a B b b B a) a = b,
—transitive, if a, b, c S,(a B b b B c) a B c ,
—negatively transitive, if a, b, c S,(¬(a B b) ∧¬(b B c)) ⇒¬(a B c),
—connected (strongly complete or total), if a, b S,(a B b) (b B a) (a = b).
The preceding properties are not independent. For instance, asymmetry implies ir-
reflexivity, while irreflexivity and transitivity imply asymmetry. In terms of a prefer-
ence relation over a relational schema R, there is a subtle point regarding the set S
over which the conditions of each property are tested. Typically, we should consider as
S the set of all tuples t = (u
1
, u
2
, ..., u
d
), u
i
dom(A
i
)ofR(A
1
, A
2
, ... A
d
).However,
in the presence of integrity constraints, we could apply the conditions only amongst
tuples that all belong to a valid instance r of R, that is, to an instance r of R that does
not violate any integrity constraints.
Based on its properties a preference relation
P
is characterized as follows.
—A binary relation is a preorder or quasiorder if it is reflexive and transitive. If in
addition, it is antisymmetric then it is a partial order.
—A binary relation is a strict partial order (or irreflexive partial order) i f it is irreflexive,
asymmetric, and transitive. A preference relation
P
over a relational schema R is
usually a strict partial order.
—A binary relation is a total order if it is a strict partial order and it is also connected.
If a preference relation
P
is a total order, any two tuples in any instance r of R are
mutually comparable under
P
.
—A binary relation is a weak order if it is a negatively transitive strict partial order.
A preference relation over an instance r of R can be represented through a directed
graph that we call a preference graph. In the preference graph, there is one node for
each tuple t in r and there is a directed edge from the node representing tuple t
i
to
the node representing tuple t
j
if and only if, t
i
P
t
j
. Some properties of the preference
relation have a counterpart graph property.
If the preference relation is transitive, it is common to represent the transitive
reduction of the relation. In particular, there is an edge from t
i
to t
j
if and only if, t
i
P
t
j
and t
k
, such that, t
i
P
t
k
and t
k
P
t
j
. The graph for a partially ordered set is also
known as the Hasse diagram. In the following, we assume that preference relations
are transitive and use the preference graph of their transitive reduction to represent
them, unless stated otherwise. Examples of preference graphs for different types of
preference relations are depicted in Figure 3.
Besides the explicit listing of preference relations between tuples, a convenient
way to express preferences between tuples is by using logical formulas to express
the constraints that two tuples must satisfy so that one is preferred over the other
[Chomicki 2003].
ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

Citations
More filters
Journal Article

ACM Transactions on Database Systems

TL;DR: BLOCKIN BLOCKINÒ BLOCKin× ½¸ÔÔº ¾ßß¿º ¿ ¾ ¾ à ¼ à à 0
Journal ArticleDOI

QueRIE: Collaborative Database Exploration

TL;DR: This work describes an instantiation of the QueRIE framework, where the active user's session is represented by a set of query fragments, and describes a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible.
Journal Article

Corroborating Information from Web Sources.

TL;DR: Preference SQL is a declarative extension of standard SQL by strict partial order preferences, behaving like soft constraints under the BMO query model, enabling a seamless application integration with standard SQL back-end systems.
Journal Article

A context-aware preference model for database querying in an ambient intelligent environment

TL;DR: In this paper, a knowledge-based context-aware query preference model is proposed for both pull and push queries in Ambient Intelligent (AmI) environments, where users' database access will not occur at a single location in a single context as in the traditional stationary desktop computing, but rather span a multitude of contexts like office, home, hotel, plane, etc.
Proceedings Article

Preference-based query answering in datalog+/- ontologies

TL;DR: In this paper, the authors propose the first integration of ontology languages with preferences as in relational databases by developing PrefDatalog+, an extension of the Datalog+/- family of languages with preference management formalisms closely related to those previously studied for relational databases.
References
More filters
Journal ArticleDOI

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

TL;DR: This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches.
Book

Fuzzy Set Theory - and Its Applications

TL;DR: The book updates the research agenda with chapters on possibility theory, fuzzy logic and approximate reasoning, expert systems, fuzzy control, fuzzy data analysis, decision making and fuzzy set models in operations research.
Journal ArticleDOI

Understanding and Using Context

TL;DR: An operational definition of context is provided and the different ways in which context can be used by context-aware applications are discussed, including the features and abstractions in the toolkit that make the task of building applications easier.
Proceedings ArticleDOI

Optimizing search engines using clickthrough data

TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Journal ArticleDOI

Fuzzy Set Theory and Its Applications

TL;DR: In this paper, a new book about fuzzy set theory and its applications is presented, which can be used to explore the knowledge of the knowledge in a new way, even for only few minutes to read a book.
Frequently Asked Questions (11)
Q1. What are the contributions mentioned in the paper "A survey on representation, composition and application of preferences in database systems" ?

Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision making problems. Databases bring a whole fresh perspective to the study of preferences, both computational and representational. From a representational perspective, the central question is how the authors can effectively represent preferences and incorporate them in database querying. Several approaches have been proposed but a systematic study of these works is missing. The purpose of this survey is to provide a framework for placing existing works in perspective and highlight critical open challenges to serve as a springboard for researchers in database systems. The authors organize their study around three axes: preference representation, preference composition, and preference query processing. From a computational perspective, the authors can look at how they can efficiently process preferences in the context of database queries. 

Moving forward, the authors highlight critical open research challenges and directions for future work. While qualitative preferences can express more types of relations than qualitative preferences, with qualitative preference, the authors can not distinguish how much better a query answer is compared to another. Users will be able to query the social graph and be presented with a diversified subgraph relevant to their interests. As the time passes, results will be adapted to the new context of the user. 

Common types of external context include the computing context (e.g., network connectivity, nearby resources), the user context (e.g., profile, location), the physical context (e.g., noise levels, temperature), and time [Chen and Kotz 2000]. 

User preferences are stored as degrees of interest in atomic query elements that can be individual selection or join conditions (called selection and join preferences, respectively). 

Koutrika and Ioannidis [2004] support extrinsic preferences by allowing preferences for tuples in a relation R to be formulated based on values of attributes in different relations that join to R. 

The empty-answer problem can be tackled by relaxing some of the hard constraints in the query, that is, considering them as soft or as user wishes or by replacing them by constraints that capture preferences related to the given query and returning results that are ranked according to how well they match the modified query. 

It has been shown that when the set over which the preference relation is defined is countable, a necessary and sufficient condition for a scoring function fP , such that,ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.ti P tj ⇔ fP(ti) > fP(tj) to exist, is that P is a weak order [Fishburn 1999]. 

Stefanidis et al. [2006] propose using context parameters that take values from hierarchical domains thus allowing the definition of contextual preferences at various levels of detail, for example preferences that hold at the level of a day or a month. 

The study of preference queries in databases originated by Lacroix and Lavency [1987], who proposed a simple extension of the relational calculus in which preferences for tuples satisfying given logical conditions can be expressed. 

The purpose of this survey is to provide a framework for placing existing works in perspective and highlight critical open challenges to serve as a springboard for researchers in database systems. 

A profile of a subset of k tuples is defined as a tuple of features where each feature corresponds to a quantity of interest (i.e., the number of comedies or distinct directors in their example).