scispace - formally typeset
Open AccessBook ChapterDOI

Mining preferences from OLAP query logs for proactive personalization

Reads0
Chats0
TLDR
A proactive approach that couples an MDX-based language for expressing OLAP preferences to a mining technique for automatically deriving preferences is proposed, which proves the effectiveness and efficiency of the approach.
Abstract
The goal of personalization is to deliver information that is relevant to an individual or a group of individuals in the most appropriate format and layout. In the OLAP context personalization is quite beneficial, because queries can be very complex and they may return huge amounts of data. Aimed at making the user's experience with OLAP as plain as possible, in this paper we propose a proactive approach that couples an MDX-based language for expressing OLAP preferences to a mining technique for automatically deriving preferences. First, the log of past MDX queries issued by that user is mined to extract a set of association rules that relate sets of frequent query fragments; then, given a specific query, a subset of pertinent and effective rules is selected; finally, the selected rules are translated into a preference that is used to annotate the user's query. A set of experimental results proves the effectiveness and efficiency of our approach.

read more

Content maybe subject to copyright    Report

Mining Preferences from OLAP Query Logs for
Proactive Personalization
Julien Aligon
1
, Matteo Golfarelli
2
,
Patrick Marcel
1
, Stefano Rizzi
2
, and Elisa Turricchia
2
1
Laboratoire d’Informatique Universit´eFran¸cois Rabelais Tours, France
{julien.aligon,patrick.marcel}@univ-tours.fr
2
DEIS University of Bologna, Italy
{matteo.golfarelli,stefano.rizzi,elisa.turricchia2}@unibo.it
Abstract. The goal of personalization is to deliver information that is
relevant to an individual or a group of individuals in the most appropriate
format and layout. In the OLAP context personalization is quite bene-
ficial, because queries can be very complex and they may return huge
amounts of data. Aimed at making the user’s experience with OLAP as
plain as possible, in this paper we propose a proactive approach that
couples an MDX-based language for expressing OLAP preferences to a
mining technique for automatically deriving preferences. First, the log of
past MDX queries issued by that user is mined to extract a set of asso-
ciation rules that relate sets of frequent query fragments; then, given a
specific query, a subset of pertinent and effective rules is selected; finally,
the selected rules are translated into a preference that is used to annotate
the user’s query. A set of experimental results proves the effectiveness
and efficiency of our approach.
1 Introduction and Motivation
Personalization has attracted a lot of attention in the database community dur-
ing the last few years, and also raised plenty of interest in the OLAP area. The
goal of personalization is to deliver information that is relevant to an individual
or a group of individuals in the most appropriate format and layout, and in the
OLAP area it has been pursued using different approaches:
Query recommendation: Based on the current query and on the past sessions,
the system suggests further queries to help users navigating the cube [1].
Personalized visualization: Users specify a set of constraints that are used to
determine a preferred visualization [2].
Result ranking: Query results are organized in a total or partial order so that
the user visualizes the most relevant data first [3].
Query contextualization: The query is enhanced by adding preference predi-
cates that depend on the query context [4].
These approaches differ from different points of view, in particular:
J. Eder, M. Bielikova, and A.M. Tjoa (Eds.): ADBIS 2011, LNCS 6909, pp. 84–97, 2011.
c
Springer-Verlag Berlin Heidelberg 2011

Mining Preferences from OLAP Query Logs for Proactive Personalization 85
Formulation effort: personalization criteria for queries may be either manu-
ally specified by users, or transparently inferred from the context and from
the user profile.
Prescriptiveness: personalization criteria may either be used as “hard” con-
straints that are added to queries, or be meant as “soft” constraints, i.e.,
preferences.
Proactiveness: some approaches propose new queries to the user based on
the query log and on the context, while others change the current query or
post-process its results before returning them to the user.
With reference to the above, the user’s experience with OLAP can be made as
plain as possible by decreasing the formulation effort (i.e., having query per-
sonalization criteria inferred), providing low prescriptiveness (i.e., annotating
queries with preferences rather than constraints), and enhancing proactiveness
(i.e., transparently changing the current query). The result ranking approach we
propose in this paper goes in this direction by coupling an MDX-based language
for expressing OLAP preferences to a mining technique for automatically de-
riving a set of preferences for a user’s query from the log of past MDX queries
issued by that user. This is done in four steps:
1. The user’s query log is mined off-line to extract a set of association rules
that relate sets of frequent query fragments (such as group-by attributes,
returned measures, selection predicates).
2. When the user formulates a query q, among the rules whose antecedent
matches with q, a subset of rules is selected whose cardinality depends on a
parameter set by the user to express the desired personalization degree, i.e.,
the complexity of the preference that will be formulated.
3. The selected rules are translated into an OLAP preference p concerning the
group-by set for aggregating data, the measures to be returned, and the
values of levels or measures.
4. Query q is annotated with p and executed. The results returned are ranked
according to p, so that the user can more effectively explore them by focusing
on the most relevant data rst.
Remarkably, like in the other result ranking approaches, the overall set of tuples
returned by q annotated with p is the same set of tuples that would be returned
by q without annotation, because p expresses a soft constraint. This guarantees
that the user’s intentions are preserved, and makes our approach non-invasive.
The paper outline is as follows. After summarizing the related work in Section
2, we introduce a formal setting to manipulate multidimensional data in Section
3. In Section 4 we describe the main features of the myMDX language we adopt
to express OLAP preferences, while Section 5 describes in detail our approach.
Section 6 shows an implementation and reports the results of some experimental
tests we performed to test our approach for effectiveness and efficiency.
2 Related Work
Several approaches to personalization were devised in the OLAP context.

86 J. Aligon et al.
In the field of profile-based personalization, we mention [2], that presents a
framework for providing personalized visualization of OLAP results based on
user profiles in form of constraints, and [4], that achieves OLAP personalization
by dynamically enhancing queries with context-aware user preferences. Both ap-
proaches are proactive and demand low formulation effort, but in both cases the
user profile is given, nothing being said on its construction. A recommendation
framework for OLAP systems is presented in [5]; new queries are suggested to
users based on the current analysis context and on the user’s profile. Though the
authors mention that the profile could be mined from the user’s previous behav-
ior, no specific suggestion is given to this end. A non-prescriptive approach is
presented in [3,6], where the myOLA P algebra for formulating and evaluating
OLAP preferences is introduced; the proposed algebra is very expressive, but at
the cost of a substantial formulation effort.
The term history-based personalization is borrowed from [7], and refers to
approaches that suggest a new database query based on the past actions recorded
in a log file. The following approaches fall into this category and do not rely
on a user profile; they are proactive and demand no formulation effort —like
our approach—, but they are prescriptive. The approaches in [1,8] are aimed at
suggesting OLAP queries based on a comparison between the current session and
former sessions stored in a query log. Also [9] has a similar goal in the context
of SPJ queries; here, recommendations are computed based on the presence of
tuples in sessions. This approach is further improved in [10] by relying on query
fragments instead of tuples. A query log is exploited in [11] to support users in
writing new SQL queries; the log is transformed into a graph of query fragments,
where edges are labelled with the conditional probability of having one fragment
given another fragment. Noticeably, all these work generally assume that history
is taken from a query log shared by all users.
To the best of our knowledge, our work is the first that proposes to extract
preferences from database query logs. However, the same idea has been used in
other contexts. In the context of information retrieval, [12] presents algorithms
to extract association rules at query time from a set of documents. These rules
are used to associate the documents retrieved by a query to a relevance class and
eventually to rank them. In the context of the web, [13] introduces algorithms
for preference extraction from web logs, with a targeted preference language.
Extraction is based on the frequency of the terms appearing in the log, and clus-
tering is used for identifying preference constructs. A comprehensive overview of
the techniques using data mining for personalization can be found in [14].
3 Preliminaries
3.1 Schemata and Instances
Our datacube formalization involves hierarchies; however, to keep the formalism
simpler, and without actually restricting the validity of our approach, we will
consider hierarchies without branches, i.e., consisting of chains of levels.

Mining Preferences from OLAP Query Logs for Proactive Personalization 87
State
Region
AllCities
City
Race
RaceGroup
Mrn
AllRaces
Year
AllYears
RESIDENCE RACE TIME
Occ
AllOccs
OCCUPATION
Sex
AllSexes
SEX
Fig. 1. Roll-up orders for the five hierarchies in the CENSUS schema (Mrn stands for
MajorRacesNumber)
Definition 1 (Multidimensional Schema). A multidimensional schema (or,
briefly, a schema)isatripleM = A, H, M where:
A is a finite set of levels, each defined on a categorical domain Dom(a);
H = {h
1
,...,h
n
} is a finite set of hierarchies, each characterized by (1) a
subset Lev(h
i
) A of levels (such that the Lev(h
i
)’s for i =1,...,n define
a partition of A); (2) a roll-up total order
h
i
of Lev(h
i
);
a finite set of measures M , each defined on a numerical domain Dom(m).
For each hierarchy h
i
, the top level of the order determines the finest aggregation
level for the hierarchy. Conversely, the bottom level has a single possible value
and determines the coarsest aggregation level.
A group-by set includes one level for each hierarchy, and defines a possible way
to aggregate data. A coordinate of a group-by set is a point in the n-dimensional
space defined by the levels in that group-by set.
Definition 2 (Group-by Set). Given schema M = A, H, M ,letDom(H)=
Lev(h
1
) × ... × Lev(h
n
);eachG Dom(H) is called a group-by set of M.
Let G = a
k
1
,..., a
k
n
and Dom(G)=Dom(a
k
1
) × ... × Dom(a
k
n
);each
g Dom(G) is called a coordinate of G.
Example 1. The CENSUS schema includes the five hierarchies whose roll-up or-
ders are shown in Figure 1, and measures AvgIncome, AvgCostGas,andAvgCost-
Elect.ItisCity
RESIDENCE
State; examples of group-by sets are:
G
0
= City, Race, Year, Occ, Sex
G
1
= Region, Mrn, Year, Occ, Sex
G
2
= AllCities, AllRaces, AllYears, AllOccs, AllSexes
A schema is populated with facts, each recording a useful information for the
decision-making process. A fact is characterized by a group-by set G that defines
its aggregation level, by a coordinate of G, and by a value for one measure.

88 J. Aligon et al.
Definition 3 (Fact). Given schema M = A, H, M , a group-by set G
Dom(H),andameasurem M , a fact is a couple f
G,m
= g, v,where
g Dom(G) and v Dom(m). The space of all facts for M is
F
M
=
GDom(H),mM
(Dom(G) × Dom(m))
Example 2. An example of fact is f
G
1
,AvgIncome
= ’Pacific’, ’White’, ’2008’,
’Dentist’, ’Male’, 600.
Finally, an instance of a schema (datacube)isasetoffactsD ⊆F
M
such that
no two facts characterized by the same coordinate and measure exist in D.
3.2 Queries
The MDX (MultiDimensional eXpressions) language is a de-facto standard for
querying multidimensional databases [15]. Some of its distinguishing features are
the possibility of returning query results that contain data with different aggre-
gation levels and the possibility of specifying how the results should be visually
arranged into a multidimensional representation. In this paper we consider MDX
queries that aggregate data at one or more group-by sets, optionally select them
using a predicate in CNF, and return one or more measures. The semantics of
such an MDX query is that of a union of GPSJ queries
1
whose group-by sets
are the cross product of n sets of levels, one for each hierarchy. This semantics
corresponds to the following subset of MDX:
Clauses
SELECT, FROM, WHERE are supported.
All functions for navigating hierarchies are supported:
AllMembers, Ancestor,
Ascendants, Children,etc.
All functions for manipulating sets of members or tuples are supported
(
Crossjoin, Except, Exists, Extract, Filter, Intersect, etc.) except the union.
All functions for manipulating members/tuples are supported.
To effectively use association rules for modeling frequent portions of queries, we
formally split MDX queries into fragments as explained below.
Definition 4 (Query Fragment, Query, Log). Given schema M = A, H,
M,aquery fragment is either a level in A,ameasureinM ,orasimpleBoolean
predicate involving a level and/or a measure. A qf-set is a set of query fragments.
A multidimensional query (briefly, query) is represented by a qf-set that includes
at least one level for each hierarchy in H and at least one measure in M .Alog
is a set of multidimensional queries.
1
A GPSJ query takes form π
a
k
1
,...,a
k
n
,Aggr
σ
p
(χ) where, in our context: χ is the star
join between the fact table and the n dimension tables; p is a selection formula in
CNF; {a
k
1
,...,a
k
n
} is a group-by set; and Aggr is a list of aggregations of the form
α
j
(m
j
), where m
j
is a measure and α
j
is an aggregation operator.

Citations
More filters
Journal ArticleDOI

Fusion Cubes: Towards Self-Service Business Intelligence

TL;DR: The underlying core idea is the notion of fusion cubes, i.e., multidimensional cubes that can be dynamically extended both in their schema and their instances, and in which situational data and metadata are associated with quality and provenance annotations.
Journal ArticleDOI

Similarity measures for OLAP sessions

TL;DR: A set of similarity criteria derived from a user study conducted with a set of OLAP practitioners and researchers is proposed and a function for estimating the similarity between OLAP queries based on three components: the query group-by set, its selection predicate, and the measures required in output is proposed.
Journal ArticleDOI

A collaborative filtering approach for recommending OLAP sessions

TL;DR: It is claimed that the whole sequence of queries belonging to an OLAP session is valuable because it gives the user a compound and synergic view of data; for this reason, the goal is not to recommend single OLAP queries but OLAP sessions.
Proceedings Article

Identifying User Interests within the Data Space - a Case Study with SkyServer

TL;DR: This paper proposes a novel notion of access area, which is independent of any specific database state, and allows the detection of interesting areas within the data space, regardless if they already exist in the database content.
Journal ArticleDOI

Interest-based recommendations for business intelligence users

TL;DR: A collaborative recommender system based on a Markov model that represents the probability for a user to switch from one interest to another and outperforms a state-of-the-art query similarity measure and yields a very good precision with respect to expressed user interests is proposed.
References
More filters
Book

The adaptive web: methods and strategies of web personalization

TL;DR: This paper presents a meta-modelling architecture for the adaptive web that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and cataloging content on the web.
Proceedings ArticleDOI

CMAR: accurate and efficient classification based on multiple class-association rules

TL;DR: The authors propose a new associative classification method, CMAR, i.e., Classification based on Multiple Association Rules, which extends an efficient frequent pattern mining method, FP-growth, constructs a class distribution-associated FP-tree, and mines large databases efficiently.
Book

The Adaptive Web

TL;DR: In this article, the authors present an approach for personalized search on the World Wide Web using focused crawling, navigation support, and content-based recommender systems, as well as case-based recommendation.
Book ChapterDOI

Data mining for web personalization

TL;DR: An overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle, including data collection and pre-processing, pattern discovery and evaluation, and finally applying the discovered knowledge in real-time to mediate between the user and the Web.
Related Papers (5)