What have the authors stated for future works in "A survey on representation, composition and application of preferences in database systems" ?

Moving forward, the authors highlight critical open research challenges and directions for future work. While qualitative preferences can express more types of relations than qualitative preferences, with qualitative preference, the authors can not distinguish how much better a query answer is compared to another. Users will be able to query the social graph and be presented with a diversified subgraph relevant to their interests. As the time passes, results will be adapted to the new context of the user.

What types of external context are used in CP-nets?

Common types of external context include the computing context (e.g., network connectivity, nearby resources), the user context (e.g., profile, location), the physical context (e.g., noise levels, temperature), and time [Chen and Kotz 2000].

What are the types of preferences that are stored in atomic query elements?

User preferences are stored as degrees of interest in atomic query elements that can be individual selection or join conditions (called selection and join preferences, respectively).

How does Koutrika and Ioannidis support extrinsic preferences?

Koutrika and Ioannidis [2004] support extrinsic preferences by allowing preferences for tuples in a relation R to be formulated based on values of attributes in different relations that join to R.

What is the condition for a scoring function fP?

It has been shown that when the set over which the preference relation is defined is countable, a necessary and sufficient condition for a scoring function fP , such that,ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.ti P tj ⇔ fP(ti) > fP(tj) to exist, is that P is a weak order [Fishburn 1999].

What is the common way to define a contextual preference?

Stefanidis et al. [2006] propose using context parameters that take values from hierarchical domains thus allowing the definition of contextual preferences at various levels of detail, for example preferences that hold at the level of a day or a month.

What is the order of a subset of k tuples?

A profile of a subset of k tuples is defined as a tuple of features where each feature corresponds to a quantity of interest (i.e., the number of comedies or distinct directors in their example).

(Open Access) A survey on representation, composition and application of preferences in database systems (2011) | Kostas Stefanidis

A Survey on Representation, Composition and Application

of Preferences in Database Systems

KOSTAS STEFANIDIS, Chinese University of Hong Kong

GEORGIA KOUTRIKA, IBM Almaden Research Center

EVAGGELIA PITOURA, University of Ioannina

Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision

making problems. Recently, they have attracted the attention of researchers in other ﬁelds, such as databases

where they capture soft criteria for queries. Databases bring a whole fresh perspective to the study of

preferences, both computational and representational. From a representational perspective, the central

question is how we can effectively represent preferences and incorporate them in database querying. From

a computational perspective, we can look at how we can efﬁciently process preferences in the context of

database queries. Several approaches have been proposed but a systematic study of these works is missing.

The purpose of this survey is to provide a framework for placing existing works in perspective and highlight

critical open challenges to serve as a springboard for researchers in database systems. We organize our study

around three axes: preference representation, preference composition, and preference query processing.

Categories and Subject Descriptors: H.2.4 [Database Management]: Systems—Relational databases

General Terms: Algorithms, Design, Languages

Additional Key Words and Phrases: Preference modeling, preference queries

ACM Reference Format:

Stefanidis, K., Koutrika, G., and Pitoura, E. 2011. A survey on representation, composition and application

of preferences in database systems. ACM Trans. Datab. Syst. 36, 3, Article 19 (August 2011), 45 pages.

DOI = 10.1145/2000824.2000829 http://doi.acm.org/10.1145/2000824.2000829

1. INTRODUCTION

Preferences guide human decision making from early childhood (e.g., “which ice cream

ﬂavor do you prefer?”) up to complex professional and organizational decisions (e.g.,

“which investment funds to choose?”). Preferences have traditionally been studied in

philosophy, psychology, and economics and applied to decision making problems. For

instance, in philosophy, they are used to reason about values and desires [Hansson

2001]. In mathematical decision theory, preferences (or utilities) model economic be-

havior [Fishburn 1999]. The notion of preference has in recent years drawn new atten-

tion from researchers in other ﬁelds, such as artiﬁcial intelligence, where they capture

agents’ goals [Boutilier et al. 1999; Delgrande et al. 2003; Wellman and Doyle 1991],

and databases, where they capture soft criteria for database queries. Explicit pref-

erence modeling provides a declarative way to choose among alternatives, whether

these are solutions of problems to solve, answers of database queries, decisions of a

Authors’ addresses: K. Stefanidis, Department of Computer Science and Engineering, Chinese University

of Hong Kong, Hong Kong; email: kstef@cs.uoi.gr; G. Koutrika, IBM Almaden Research Center; E. Pitoura,

Department of Computer Science, University of Ioannina, Ioannina, Greece.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted

without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that

copies show this notice on the ﬁrst page or initial screen of a display along with the full citation. Copyrights for

components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.

To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this

work in other works requires prior speciﬁc permission and/or a fee. Permissions may be requested from

Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)

869-0481, or permissions@acm.org.

 2011 ACM 0362-5915/2011/08-ART19 $10.00

DOI 10.1145/2000824.2000829 http://doi.acm.org/10.1145/2000824.2000829

ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

19:2 K. Stefanidis et al.

computational agent and so on. In this article, we focus on preferences in the context

of database queries.

In databases, interest in preferences was triggered by observing the limitations of

the Boolean database answer model, where query criteria are considered as hard by

default and a nonempty answer is returned only if it satisﬁes all the query criteria.

In this context, a user can face either of two problems: (i) the empty-answer problem,

where the conditions are too restrictive or the data cannot exactly match the query or

(ii) the too-many-answers problem, where too many results match the query. It is hard

to cope with these problems, especially if a user is not familiar with a structured query

language in order to formulate accurate queries and when accessing Web databases,

whose schema and contents are unknown.

Incorporating soft criteria or preferences in a query can help cope with these prob-

lems. The empty-answer problem can be tackled by relaxing some of the hard con-

straints in the query, that is, considering them as soft or as user wishes or by replacing

them by constraints that capture preferences related to the given query and return-

ing results that are ranked according to how well they match the modiﬁed query. The

too-many-answers problem can be tackled by strengthening the query with additional

preferences to rank and possibly focus the query results.

The study of preference queries in databases originated by Lacroix and Lavency

[1987], who proposed a simple extension of the relational calculus in which prefer-

ences for tuples satisfying given logical conditions can be expressed. For instance, one

could say: pick the tuples of R satisfying Q ∧ P1 ∧ P2; if the result is empty, pick

the tuples satisfying Q ∧ P1 ∧¬P2; if the result is empty, pick the tuples satisfying

Q ∧¬P1 ∧ P2. Gaasterland and Lobo [1994] introduced a simple formalism, where a

user provides a lattice of domain-independent values that deﬁne preferences and a set

of domain-speciﬁc user constraints qualiﬁed with lattice values. The constraints are

automatically incorporated into a relational or deductive database through a series of

syntactic transformations that produces an annotated deductive database. Query an-

swering procedures for deductive databases are then used, with minor modiﬁcations,

to obtain annotated answers to queries. Almost a decade later, the Web has made in-

formation easily accessible and renewed interest in preferences was triggered by the

need to make (Web) databases more user-friendly.

Several approaches have been proposed since t hen but a systematic study of them

is missing. It is the purpose of this article to provide a framework for studying various

approaches that deal with preferences in databases. In particular, our objective is to

survey in a holistic way approaches that: (i) deﬁne preferences, (ii) combine preferences,

and (iii) apply preferences to query processing. We organize our study around these

main axes as follows:

Preference Representation. Preferences naturally come into different ﬂavors and peo-

ple may have a mix of different preferences. Works on preference modeling have

focused on different aspects of t he problem but two main philosophies can be dis-

tinguished on the basis of how preferences are formulated: qualitative approaches,

where preferences are expressed by comparing items (“I like westerns better than

comedies”) and quantitative ones, where a preference for a speciﬁc item is expressed

as a degree of interest in t his item (“my interest in westerns is 0.8 and in come-

dies 0.4”). We categorize preference representation approaches using the following

dimensions.

(1) Formulation. Preferences are formulated qualitatively or quantitatively.

(2) Granularity. Preferences can be expressed at different levels, that is, for tuples,

relations, relationships, and attributes.

(3) Context. Preferences can be context-free or can hold under speciﬁc conditions.

ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

A Survey on Representation, Composition and Application of Preferences 19:3

Fig. 1. Database schema.

Fig. 2. Database instance example.

(4) Aspects. Preferences may vary based on their intensity, elasticity, complexity, and

other aspects.

Preference Composition. Given a set of preferences over a set of tuples, different

composition mechanisms can be applied to infer (e.g., implicit preferences), combine

(e.g., through combining scoring functions), or override preferences (e.g., in prioritized

composition) and ﬁnally, derive a ranking of the tuples on the basis of how they match

these preferences. In this survey, we group preference composition mechanisms into

the following categories.

(1) Qualitative composition. These mechanisms combine preferences resulting in a

relative ( i.e., qualitative) ordering of the tuples.

(2) Quantitative composition. These mechanisms combine preferences by assigning

ﬁnal s cores to the tuples, which are thus ordered in a quantitative way.

(3) Heterogeneous composition. These mechanisms are used to combine preferences

of different granularity, for example, preferences for relationships between tuples

with preferences for tuple attributes.

Preference Query Processing. Preferences are used in query processing to provide

users with customized results typically through ranking. There are roughly two dif-

ferent lines of work on using preferences in query processing. Namely, preferences are

exploited through the following.

(1) Expanding database queries. These methods assume the existence of a number of

user preferences and appropriately rewrite regular database queries to incorporate

them. This process is often referred to as query personalization.

(2) Employing preference operators. These methods use special database operators

(such as top-k or skyline) to explicitly express preferences within queries.

Our survey covers both approaches. We shall also discuss methods for improving

the performance of preferential query processing, for instance, by performing ofﬂine

preprocessing steps to construct rankings of database t uples based on preferences.

There is a large number of algorithms for the implementation of preference queries

(especially for top-k and skyline queries). We do not intend to provide an exhaustive

review for special classes of preference queries. We consider that drilling down to the

speciﬁcs of different implementations and algorithms is the subject of separate surveys

focusing on algorithms for a speciﬁc class of preference queries, such as the survey on

algorithms for top-k queries by Ilyas et al. [2008]. Instead, we aim at providing an

overview of the main approaches for different types of preference queries.

As a running example, we consider a simple database that stores information about

movies, consisting of three relations: movie, play, actor. Figure 1 depicts the schema of

this database. We shall also use the database instance shown in Figure 2.

This survey is organized as follows. We present existing approaches to prefer-

ence representation (Section 2) followed by mechanisms for preference composition

ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

19:4 K. Stefanidis et al.

(Section 3). Then, we study preferential query processing methods (Section 4). In the

ﬁnal section (Section 5), we discuss other issues such as preference learning and revis-

ing, nonrelational preference models, other preference applications, and connections to

other disciplines that deal with preferences. We conclude with a discussion on critical

open challenges.

2. PREFERENCE REPRESENTATION

Understanding user preferences and ﬁnding appropriate representations for them is a

real challenge. There are quite a few approaches (preference models) in the literature

that deal with preference representation and composition and try to reach meaningful

conclusions regarding the desired answers of a database query from different perspec-

tives. In this section, we focus on representing individual preferences and in Section 3,

on mechanisms for preference composition.

We present preference representation based on how preferences are formulated (for-

mulation—Section 2.1), at what level they are expressed (granularity—Section 2.2),

when they hold (context—Section 2.3), and what they express (aspects—Section 2.4).

2.1. Preference Formulation

In general, preferences can be expressed either qualitatively or quantitatively. In the

qualitative approach, preferences between database tuples are speciﬁed directly, typ-

ically using binary preference relations. Preference relations may be speciﬁed using

logical formulas [Chomicki 2003] or special preference constructors [Kießling 2002]. In

the quantitative approach, preferences are expressed by assigning numerical scores to

database tuples. In this case, a tuple t

is preferred over a tuple t

, if and only if its

score is higher than the score of t

. Scores may be assigned through preference functions

(e.g., Agrawal and Wimmers [2000]) or as degrees of interest associated with speciﬁc

conditions that must be satisﬁed (e.g., Koutrika and Ioannidis [2004]).

In the following, we shall use R(A

,...,A

) to denote a relational schema with d

attributes A

,1≤ i ≤ d, where each attribute A

takes values from a domain dom(A

Let A ={A

, A

,...,A

} be the attribute set of R and dom(A) = dom(A

)×...×dom(A

)

be its value domain. We use t to denote a tuple (u

, u

, ..., u

) ∈ dom(A)ofR and r to

denote an instance (i.e., tuple set) of R.LetB ⊆ A be a subset of the attribute set, t[B]

stands for the projection of t on B. Finally, P denotes a preference.

2.1.1. Qualitative Preferences.

In the qualitative approach, preferences are deﬁned as

binary relations between two tuples. Given a set S, a binary relation B over S is a

subset of the Cartesian product S × S. For a pair (a, b)ofB, we use the notation a B

b, whereas for a pair (a, b) that does not belong to B, we use the notation ¬(a B b). A

preference relation is deﬁned as follows.

Deﬁnition 1. Let R(A

,...,A

) be a relational schema and dom(A

) be the domain

of attribute A

,1≤ i ≤ d. A preference relation 

over R is a subset of (dom(A

) × ...×

dom (A

)) × (dom(A

) × ...× dom (A

)).

The interpretation of a preference relation t



between two tuples t

and t

of R

is that t

is preferred over t

under 

. We shall also say that t

is better than t

or that

dominates t

under 

Next, we list several typical properties of binary relations that are useful in classi-

fying preference relations. A binary relation B over a set S is called:

—reﬂexive, if ∀a ∈ S, a B a,

—irreﬂexive, if ∀a ∈ S, ¬ (a B a),

—symmetric, if ∀a, b ∈ S, a B b ⇒ b B a,

—asymmetric, if ∀a, b ∈ S, a B b ⇒¬(b B a),

ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

A Survey on Representation, Composition and Application of Preferences 19:5

(a) total order (b) weak order (c) strict partial order

Fig. 3. Examples o f preference graphs.

—antisymmetric, if ∀a, b ∈ S,(a B b ∧ b B a) ⇒ a = b,

—transitive, if ∀a, b, c ∈ S,(a B b ∧ b B c) ⇒ a B c ,

—negatively transitive, if ∀a, b, c ∈ S,(¬(a B b) ∧¬(b B c)) ⇒¬(a B c),

—connected (strongly complete or total), if ∀a, b ∈ S,(a B b) ∨ (b B a) ∨ (a = b).

The preceding properties are not independent. For instance, asymmetry implies ir-

reﬂexivity, while irreﬂexivity and transitivity imply asymmetry. In terms of a prefer-

ence relation over a relational schema R, there is a subtle point regarding the set S

over which the conditions of each property are tested. Typically, we should consider as

S the set of all tuples t = (u

, u

, ..., u

), u

∈ dom(A

)ofR(A

, A

, ... A

).However,

in the presence of integrity constraints, we could apply the conditions only amongst

tuples that all belong to a valid instance r of R, that is, to an instance r of R that does

not violate any integrity constraints.

Based on its properties a preference relation 

is characterized as follows.

—A binary relation is a preorder or quasiorder if it is reﬂexive and transitive. If in

addition, it is antisymmetric then it is a partial order.

—A binary relation is a strict partial order (or irreﬂexive partial order) i f it is irreﬂexive,

asymmetric, and transitive. A preference relation 

over a relational schema R is

usually a strict partial order.

—A binary relation is a total order if it is a strict partial order and it is also connected.

If a preference relation 

is a total order, any two tuples in any instance r of R are

mutually comparable under 

—A binary relation is a weak order if it is a negatively transitive strict partial order.

A preference relation over an instance r of R can be represented through a directed

graph that we call a preference graph. In the preference graph, there is one node for

each tuple t in r and there is a directed edge from the node representing tuple t

the node representing tuple t

if and only if, t



. Some properties of the preference

relation have a counterpart graph property.

If the preference relation is transitive, it is common to represent the transitive

reduction of the relation. In particular, there is an edge from t

to t

if and only if, t



and  t

, such that, t



and t



. The graph for a partially ordered set is also

known as the Hasse diagram. In the following, we assume that preference relations

are transitive and use the preference graph of their transitive reduction to represent

them, unless stated otherwise. Examples of preference graphs for different types of

preference relations are depicted in Figure 3.

Besides the explicit listing of preference relations between tuples, a convenient

way to express preferences between tuples is by using logical formulas to express

the constraints that two tuples must satisfy so that one is preferred over the other

[Chomicki 2003].

ACM Transactions on Database Systems, Vol. 36, No. 3, Article 19, Publication date: August 2011.

A survey on representation, composition and application of preferences in database systems

Citations

ACM Transactions on Database Systems

QueRIE: Collaborative Database Exploration

Corroborating Information from Web Sources.

A context-aware preference model for database querying in an ambient intelligent environment

Preference-based query answering in datalog+/- ontologies

References

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

Fuzzy Set Theory - and Its Applications

Understanding and Using Context

Optimizing search engines using clickthrough data

Fuzzy Set Theory and Its Applications

Related Papers (5)

Preference formulas in relational queries

The Skyline operator

Foundations of preferences in database systems

Preferences; Putting More Knowledge into Queries

A survey of top-k query processing techniques in relational database systems

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "A survey on representation, composition and application of preferences in database systems" ?

Q2. What have the authors stated for future works in "A survey on representation, composition and application of preferences in database systems" ?

Q3. What types of external context are used in CP-nets?

Q4. What are the types of preferences that are stored in atomic query elements?

Q5. How does Koutrika and Ioannidis support extrinsic preferences?

Q6. How can a user solve the empty-answer problem?

Q7. What is the condition for a scoring function fP?

Q8. What is the common way to define a contextual preference?

Q9. What is the definition of preference in a database?

Q10. What is the purpose of this survey?

Q11. What is the order of a subset of k tuples?