scispace - formally typeset
Open AccessBook ChapterDOI

Relational link-based ranking

Reads0
Chats0
TLDR
A generalized ranking framework is provided that can be extended to extend the PageRank link analysis algorithm to relational databases and give this extension a random querier interpretation, and explores the properties of database graphs.

Content maybe subject to copyright    Report

Relational link-based ranking
Floris Geerts
Heikki Mannila Evimaria Terzi
Laboratory for Foundations Basic Research Unit
of Computer Science Helsinki Institute for Information Technology
School of Informatics Department of Computer Science
University of Edinburgh, UK University of Helsinki, Finland
fgeerts@inf.ed.ac.uk {mannila,terzi}@cs.helsinki.fi
Abstract
Link analysis methods show that the intercon-
nections between web pages have lots of valu-
able information. The link analysis methods
are, however, inherently oriented towards an-
alyzing binary relations.
We consider the question of generalizing
link analysis methods for analyzing relational
databases. To this aim, we provide a general-
ized ranking framework and address its prac-
tical implications.
More specifically, we associate with each rela-
tional database and set of queries a unique
weighted directed graph, which we call the
database graph. We explore the properties of
database graphs. In analogy to link analysis
algorithms, which use the Web graph to rank
web pages, we use the database graph to rank
partial tuples. In this way we can, e.g., ex-
tend the PageRank link analysis algorithm to
relational databases and give this extension a
random querier interpretation.
Similarly, we extend the HITS link analysis al-
gorithm to relational databases. We conclude
with some preliminary experimental results.
Work done while at the Basic Research Unit, Helsinki In-
stitute for Information Technology, Department of Computer
Science, University of Helsinki
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the VLDB copyright notice and
the title of the publication and its date appear, and notice is
given that copying is by permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires a fee
and/or special permission from the Endowment.
Proceedings of the 30th VLDB Conference,
Toronto, Canada, 2004
1 Introduction
Methods for ranking elements have been widely dis-
cussed in a variety of settings. In the context of
database systems the motivation for ranking has in-
creased along with the size of databases. In huge
databases the users that pose a query would like to see
the top-k partial tuples that satisfy their query rather
than thousands of tuples ordered in a completely unin-
formative way. Additionally, the necessity of ranking
the query results goes far beyond the functionality of
the existing ORDER BY operator, which sorts the results
only according to the values in the specified attributes.
A variety of algorithms that efficiently handle the top-
k selection [15, 19] and top-k join queries [20, 24] have
been proposed.
Ranking is a notion that has appeared also in the
context of Web search applications. The natural need
in this context is to rank the web pages returned as
a result to a user query. In this case the pages are
ranked such that the more relevant the page is to the
query, the higher it is ranked. Furthermore, among
the web pages that are equally relevant those that are
more “important” should precede the less “important”
ones. Many ranking algorithms for web pages have
been developed ([11, 6, 22, 9, 25]) with the most pop-
ular among them being the HITS algorithm proposed
by Kleinberg [22] and the PageRank algorithm pro-
posed by Brin et.al [11]. The latter has led to the
popular Google search engine.
Web pages are categorical data, and thus the prob-
lem of ranking them as such is not trivial since they
do not have an intrinsic numerical value on which a
ranking could be based on. However, all the ranking
algorithms developed for them exploit the hyperlink
information, i.e. the structure of the Web graph, in
order to assign to each web page a rank value and ob-
tain a ranking based on these values. In contrast to
web pages, the assignment of rank values to categori-
cal data in relational databases has not yet been much
552

investigated. In this paper, we do exactly this. More
specifically, we address the problem of automated as-
signment of rank values to categorical partial tuples.
Based on this assignment we produce useful rankings
of partial tuples. We will construct database graphs
using queries and try to exploit their structure to ob-
tain rank values.
These rank values can be used in a variety of
database applications: First, one can get ranked an-
swers to queries. Second, they can serve as input to
the existing top-k algorithms mentioned above. Un-
til now, the top-k algorithms are mainly applied to
databases with non-categorical attributes and the top-
k algorithms use these values as input. The rank val-
ues we obtain for categorical data can be used in a
similar way. Finally, the obtained rank values can be
helpful in providing ranked keyword search results in
relational databases. How exactly the obtained values
are going to be used is beyond the scope of this paper.
Here we only consider how such rank values can be
obtained.
More specifically, we present a general framework
for obtaining such rank values for partial tuples of re-
lational databases. The goal is to define those rank
scores and find the algorithms to calculate them. For
this we exploit information about the interconnections
of the partial tuples in the database, as these can be
discovered using relational algebra queries.
To obtain rankings for partial tuples we mimic the
principles of link analysis algorithms. The well-studied
algorithms ([11, 6, 22, 9, 25]) for the Web show that
the structure of the interconnections of web pages has
lots of valuable information. For example, Kleinberg’s
HITS algorithm [22] suggests that each page should
have a separate “authority” rating (based on the links
going to the page) and “hub” rating (based on the
links going from the page). The intuition behind the
algorithm is that important hubs have links to impor-
tant authorities and important authorities are linked
by important hubs. Brin’s PageRank algorithm [11],
on the other hand, calculates globally the PageRank
of a web page by considering a random walk on the
Web graph and computing its stationary distribution.
The PageRank algorithm can also be seen as a model
of a user’s behavior where a hypothetical web surfer
clicks on hyperlinks at random with no regard towards
content. More specifically, when the random surfer is
on a web page, the probability that he clicks on one
hyperlink of the page depends solely on the number
of outgoing links the latter has. However, sometimes
the surfer gets bored and jumps to a random web page
on the Web. The PageRank of a web page is the ex-
pected number of times the random surfer visits that
page if he would click infinitely many times. Impor-
tant web pages are ones which are visited very often
by the random surfer.
We now rephrase the random surfer in the relational
a
a
a
a
a
a
π
2
σ
1=b
W π
2
σ
1=a
W
a
a
a
b
b
b
b
c
b
c
b
cb
b
b b
b
b
b b b
π
1
W π
2
W
d d d
Figure 1: Random walk of random surfer using only 2
kinds of queries.
database setting. Consider the fragment of the Web
shown as the binary table W in Figure 1. In the same
figure we have shown the surf trail b a b d of
the random surfer. In order to walk along the partial
tuples (pages) in W , the random surfer needs only two
kinds of queries: The first is simply the query which
returns all pages present in W . This can be expressed
by the expression π
1
W π
2
W . The second kind are
queries expressed by π
2
σ
1=v
W , in which v is a page
present in W. In other words, these queries ask for
all pages reachable from a certain page v. After the
random surfer has evaluated one of these queries, he
selects a random tuple out of the query result and
repeats this procedure again. An important restriction
is that while π
1
W π
2
W may be asked by the random
surfer independent of the current page, π
2
σ
1=v
W may
only be asked when the surfer is at page v. In Figure 1
we have shown which queries are asked in order to
obtain the shown surf trail.
We use this observation to extend the random
surfer model to the random querier, which generalizes
random-walk based link analysis algorithms by provid-
ing the random surfer with a different set of queries at
his disposal. Additionally, the model facilitates exten-
sions that allow for using this model for ranking partial
tuples.
Seeing the Web as a database allows us to see a hy-
perlink between two web pages to exist due to queries
that connect the two web pages. E.g., in Figure 1 the
link between page b and page a can be seen to exists
due to the fact that a is in the query result π
2
σ
1=b
W .
This idea generalizes to arbitrary databases D and any
finite set of queries {q
1
, . . . , q
n
}: There exists a link
between two partial tuples ~s and
~
t of a database D if
there exists an i = 1, . . . , n such that
~
t is in the query
result of q
i
when evaluated on D and where the selec-
tion parameters of q
i
are instantiated with constants
in ~s.
We augment these links with weights relative to
some preference function on the queries and frequency
information of tuples in the query results. In this way
we obtain a weighted directed graph which we call the
database graph. The database graph is a natural gen-
eralization of the graphs used in link analysis.
The database graph enables any graph-based link
553

analysis method to be used for ranking partial tuples.
For example, both the PageRank and HITS algorithms
can be generalized to operate on the database graph;
the generalizations provide tuple ranking algorithms
for relational databases.
The contributions of this paper are the following:
We define the database graph for a given
database, set of queries and preference function
and explore its theoretical properties.
We study random walks on the database graph
and show that they can be interpreted as the
walks of a random querier. We use the stationary
distribution of the random walks to assign rank
values to partial tuples.
We show that the random querier generalizes
many well-known link analysis algorithms.
As a second application of the database graph,
we extend the HITS algorithm to relational
databases and use it to assign rank values as well.
We experimentally evaluate the use of the ob-
tained rank values to rank query results.
Related work
The problem of assigning rank values to partial tuples
in the relational framework is related to the problem
of ranking web pages. The latter has been extensively
investigated and several link analysis algorithms have
been developed for this [11, 6, 22, 9, 25]. Even some
unifying frameworks for link analysis algorithms ex-
ist [12].
Interesting work on ranking elements in relational
databases based on measures from Information Re-
trieval (IR) is described in [3], however the notion of
“link” provided by the queries has not been considered
there.
The representation of a database as a graph and
link-based ranking appears in the context of keyword
search in [5, 18, 2]. The nodes in the graph are the
database tuples and the directed relationships between
the nodes are induced by foreign key or other con-
straints. The ranking values are related to the inverse
of the path distance between nodes.
Graph representations of databases and random
walks on them are considered also in the context of
similarity of categorical attributes. Both [26] and [21]
construct a graph where the nodes are the constants in
the database and two nodes are linked when they ap-
pear in the same tuple. They perform different random
walks on them in order to obtain a similarity measure
for the values. A related iterative approach is the idea
of hyperedges connecting tuples based on values [17].
The main difference that we use partial tuples instead
of tuples and that we use queries to connect them.
A random walk approach to ranking on (semi-
)structured data is proposed in [4]. Although the ap-
proach to ranking is very similar to ours, the graph
construction is heavily dependent on the presence of
(semi-)structured data.
Organization
The rest of this paper is organized as follows. In
Section 2 we define databases and query languages.
In Section 3, we formally define the database graph
and prove some of its properties. We then define the
random walk on the database graph and the random
querier in Section 4. In Section 5 we extend PageRank
and HITS algorithm to relational databases using the
database graph. Section 6 describes some experimen-
tal results. We conclude the paper in Section 7.
2 Preliminaries
We refer to [1, 16] for a more detailed description of ba-
sic database notions. For simplicity of exposition, we
assume that the database schema S consists of a single
relation name R of arity n. However, all definitions
and results generalize directly to arbitrary database
schemas.
Let D be a database instance over S. The active
domain of D, denoted by adom(D), consists of all con-
stants in D. For a tuple
~
t D of size n, we denote
the value of its i-th attribute by t
i
adom(D). The
active domain of a tuple
~
t D, denoted by adom(
~
t),
is the set {t
1
, . . . , t
n
}.
The standard query language is the relational al-
gebra, or equivalently the relational calculus, over the
database schema S. We denote this query language
by RA. Relations and queries are interpreted using
the bag semantics, i.e., duplicate tuples are allowed.
The reason for this is that we need the notion of fre-
quency which disappears if we do not allow for dupli-
cates. We will not distinguish between queries and the
RA expressions expressing them. We denote the query
result of q on D by q(D).
Let q RA be an n-ary query and denote the set of
attributes in the query result by I. We will partition
I in source attributes ~x and the target attributes ~y.
We always assume that this partition is specified for
each query q we encounter. We make this explicit by
writing q(~y|~x) instead of simply q.
Let ~s adom(D)
k
where k = |~x| and let ` = |~y|.
Then we define the RA expression
q(~y|~s) π
y
1
,...,y
`
σ
x
1
=s
1
,...,x
k
=s
k
q(~y|~x).
We will denote the query result of q(~y|~s) on D by
q(D, ~s).
We extend the RA with the duplicate elimination
operator δ for transforming bags into sets if necessary.
Given a tuple ~s and a query q RA the support of
~s in q(D), denoted by supp(~s, q(D)), is the number of
554

times ~s appears in q(D). The frequency of ~s in q(D)
is defined as freq(~s, q(D)) =
supp(~s,q(D))
|q(D)|
, where |q(D)|
denotes the size of q(D).
3 The database graph
As already mentioned in the Introduction, one can con-
sider the web as a database D over a binary relation
W . Then following a hyperlink from a page v can
be seen as first querying the database using the query
q(y|x) W (x, y), and then selecting a page out of
q(D, v). Two web pages v and w are now linked by
the query q iff w q(D, v). We generalize this idea to
arbitrary databases and queries.
Definition 1 (Link). For a given database D and
query language L RA, a tuple ~s adom(D)
k
is
L-linked to a tuple
~
t adom(D)
`
iff there exists a
query q(~y|~x) L such that |~x| = k, |~y| = `, and
~
t q(D, ~s).
From now on we assume that L consists of a finite
number of queries.
Let M = hD, L, fi where D is a database, L
RA, and f is some preference function f : L Q
+
.
Here, Q
+
denotes the set of strictly positive rational
numbers.
We now define the database graph. The definition
is rather technical but the intuition behind it is very
natural. Indeed, the vertices of the database graph
correspond to the active domain of tuples in the an-
swers to queries in L. The reason why we work with
the active domains instead of the tuples themselves is
that a constant appearing in some attribute can pos-
sibly be used in other attributes as well. So instead
of storing a constant for each possible attribute sepa-
rately, we store it only once. This slightly complicates
the formal definition (see below) of database graph
since many different tuples can correspond to the same
vertex. The edge relation is based on Definition 1. Fi-
nally, we will assign weights to the edges corresponding
to the preferences of the queries establishing this edge
(or link) and the support of the tuples consistent with
the target vertex in the query results. More formally,
Definition 2 (Database graph). Given M =
hD, L, fi the corresponding database graph is the
weighted directed graph G
M
= (V
M
, E
M
, λ
M
) where,
The set of vertices V
M
is constructed as follows:
For each query q(~y|~x) L we instantiate the
parameters ~x with tuples ~s adom(D)
k
, where
k = |~x|. For each
~
t q(D, ~s), we add the vertex
v = adom(
~
t) to V
M
, if not already included. Note
that v is a set of constants. Thus, V
M
is
©
adom(
~
t) | q(~y | ~x) L, |~x| = k
~s adom(D)
k
,
~
t q(D, ~s)}.
For a vertex v V
M
, we denote by v
k
the set of
all k-tuples formed from constants in v.
The set of edges E
M
is equal to all ordered pairs
of vertices (v, w) such that there exists a tuple
~s v
k
which is L-linked to a tuple
~
t adom(D)
`
such that w = adom(
~
t); and
The weight function λ
M
: E
M
Q
+
is defined as
λ
M
(v, w) =
X
q(~y|~x)∈L
f(q)(
X
~sv
k
,k=|~x|
~
tw
`
,`=|~y|
freq(
~
t, q(D, ~s))).
We illustrate the concept of database graph by the
following examples.
Example 1. Let D be the database given by the table
in Figure 2. The language L consists of the queries
q
1
(y|x) π
1,2
R(x, y, z) and q
2
(y, z|x) R(x, y, z).
Then for any constant a appearing in the first at-
tribute q
1
(D, a) equals {b | (a, b) q
1
(D)}. Sim-
ilarly, for any constant a, q
2
(D, a) consists of the
pairs {(b, c) | (a, b, c) q
2
(D)}. This shows that q
1
will link the first attribute to the second one, while
q
2
links the first attribute to the second and third
one, as can be seen in Figure 2. We define the pref-
erence function as f(q
1
) = f (q
2
) = 1. The com-
plete database graph is shown in Figure 2. E.g., the
weight on the edge from {v
2
} to {t
2
, v
3
} is equal to
f(q
2
)freq((t
2
, v
3
), q
2
(D, v
2
)) = 1.
When we disregard the weights, another example is
the Gaifman graph of finite model theory [13].
Example 2. Let D be a database over an n-ary
relation R. Consider the language L consisting of
q
i,j
(x
j
|x
i
) π
i,j
R(x
1
, . . . , x
n
) and f (q
i,j
) = 1 for all
i, j = 1, . . . , n. The database graph has as vertices
the constants in adom(D) and there is an edge be-
tween two constants iff they appear in the same tuple
in D.
The database graph is a well-defined object. Indeed,
we call hD, L, fi and hD
0
, L, fi isomorphic, denoted
by hD, L, fi
=
hD
0
, L, fi, if there exists a bijection b :
adom(D) adom(D
0
) such that for all q(~y|~x) L and
~s adom(D)
k
for k = |~x|, we have for any
~
t q(D, ~s)
that
freq(
~
t, q(D, ~s)) = freq(b(
~
t), q(D
0
, b(~s))),
where b is extended to tuples ~x as b(~x) =
(b(x
1
), . . . , b(x
k
)).
Theorem 1. If M = hD, L, fi and N = hD
0
, L, fi
such that M
=
N, then G
M
is isomorphic to G
N
.
555

D =
v
1
v
2
t
1
v
1
v
3
t
1
v
1
v
4
t
1
v
2
v
3
t
2
v
4
v
1
t
2
v
4
v
3
t
2
t
2
, v
1
t
2
, v
3
t
1
, v
4
t
1
, v
3
t
1
, v
2
v
4
v
3
v
2
v
1
1/2 1/2
1/2
1
1/3
1/3 1/3 1/3
1
1/2
1/3
1/3
Figure 2: The database D (left) and the database graph G
M
of M = hD, L, fi of Example 1 (right).
Proof. We refer for the proof to the full paper.
We also have a monotonicity property with respect
to taking sub-languages.
Theorem 2. If M = hD, L, fi and N = hD, L
0
, f
0
i
such that L
0
L and f(q) = f
0
(q) for any q L
0
,
then G
N
is isomorphic to a subgraph of G
M
.
Proof. The proof is analogous to the proof of Theo-
rem 1.
In general, the reverse of Theorem 1 is not true as
can be seen from the following example.
Example 3. Consider the databases D and D
0
shown
in Figure 3. Here, different symbols denote different
constants. Let L consist of the queries
q
0
(x|) π
1
R(x, y, z, u, v),
q
1
(y|x) π
1,2
σ
3=5
R(x, y, z, u, v),
q
2
(y|x) π
1,2
σ
3=4
R(x, y, z, u, v),
q
3
(y|x) π
1,2
σ
36=4
R(x, y, z, u, v).
The preference function f assigns weight 1 to each
query. It is easily verified that the graphs G
M
and
G
N
are isomorphic and correspond to the graph shown
in Figure 3. However, there is no bijection mak-
ing hD, L, f i and hD
0
, L, fi isomorphic. Indeed, from
q
1
(D, s) and q
1
(D
0
, s) the bijection b should map
b(t
1
) = t
1
, while from q
2
(D, s), q
2
(D
0
, s), q
3
(D, s) and
q
3
(D
0
, s) it follows that b(t
1
) = t
2
.
The database graph is defined without taking into
account any semantic relationships between attributes
or additional schema constraints. However, this can be
easily incorporated in the queries used in the language
L.
In the next section we define a random walk on the
database graph. In order for the random walk to have
nice convergence properties (see the next section), the
underlying graph should be strongly connected and
non-bipartite. This property turns out to be undecid-
able.
Theorem 3. Given a query language L, it is undecid-
able whether the database graph is strongly connected
and non-bipartite for all D and preference functions f.
Proof. First, we remark that the topology of the graph
is independent of f. So, we can disregard the prefer-
ence function in what follows. We use a reduction to
the undecidability of satisfiability of relational algebra
expressions on binary relations [8]. We construct for
each q(x
1
, . . . , x
k
) RA the language L = {q
1
, q
2
, q
3
}
where,
q
1
(u, z|x, y) if(x
1
· · · x
k
q(x
1
, . . . , x
k
|))
then
σ
16=21=32=4
R(x, y) × R(u, z)
q
2
(y|) if(x
1
· · · x
k
q(x
1
, . . . , x
k
|))
then π
2
R(x, y)
q
3
(z|) if not(x
1
· · · x
k
q(x
1
, . . . , x
k
|))
then π
1
R(z, u) π
2
R(u, z)
By construction, for any D and f , the database graph
associated with hD, L, fi will be connected and non-
bipartite iff q is not satisfiable.
Indeed if q is not satisfiable then L collapses to
q
3
. For any D and f, the database graph associated
with hD, q
3
, fi is the complete graph with vertex set
adom(D). This is clearly always a strongly connected
and non-bipartite graph.
For the other direction, suppose that there exists D
and f such that the graph associated with hD, L, fi is
disconnected or bipartite. We need to show that this
implies that on D the query q is satisfiable. Therefore,
we show that for any D and f, the database graph as-
sociated with hD, {q
1
, q
2
}, fi is disconnected. W.l.o.g.,
we may assume that D only consists of tuples (s, t)
such that s 6= t. Indeed, if |adom(D)| > 1 (The case
when |adom(D)| = 1 can be disregarded), applying
first the query π
14
R × R ensures that D always con-
tains (s, t) with s 6= t. We then select only those pairs
(s, t) from D such that s 6= t. So, the database graph
will be not connected because there is an edge in the
database graph from vertex {s, t} to vertex {t} by q
2
,
but no edge exists from {t} to {s, t}. This is because q
1
only links {t} to vertex {t, t}, which is by construction
not in D.
4 Random walks on databases
Let G = (V, E, λ) be a weighted directed graph. We
next define the random walk on this graph, and then
show how the concept applies to database graphs.
556

Citations
More filters
Proceedings ArticleDOI

Fast Random Walk with Restart and Its Applications

TL;DR: The heart of the approach is to exploit two important properties shared by many real graphs: linear correlations and block- wise, community-like structure and exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman- Morrison lemma for matrix inversion.
Journal ArticleDOI

A Survey on PageRank Computing

TL;DR: The theoretical foundations of the PageRank formulation are examined, the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability.
Journal ArticleDOI

Random walk with restart: fast solutions and applications

TL;DR: The heart of the approach is to exploit two important properties shared by many real graphs: linear correlations and block-wise, community-like structure, which is exploited by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman–Morrison lemma for matrix inversion.
Proceedings ArticleDOI

Center-piece subgraphs: problem definition and fast solutions

TL;DR: Wall-clock timing results on the DBLP dataset show that the proposed approximation achieve good accuracy for about 6:1 speedup, and experiments confirm that the method naturally deals with multi-source queries and that the resulting subgraphs agree with the intuition.
Proceedings ArticleDOI

P-Rank: a comprehensive structural similarity measure over information networks

TL;DR: A new similarity measure, P-Rank (Penetrating Rank), toward effectively computing the structural similarities of entities in real information networks and a fixed point algorithm to reinforce structural similarity of vertex pairs beyond the localized neighborhood scope toward the entire information network is proposed.
References
More filters
Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more
- 01 Jan 1998 - 
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Journal ArticleDOI

Authoritative sources in a hyperlinked environment

TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Book

Theory of Linear and Integer Programming

TL;DR: Introduction and Preliminaries.
Book

Randomized Algorithms

TL;DR: This book introduces the basic concepts in the design and analysis of randomized algorithms and presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications.
Frequently Asked Questions (16)
Q1. What contributions have the authors mentioned in the paper "Relational link-based ranking " ?

The authors consider the question of generalizing link analysis methods for analyzing relational databases. To this aim, the authors provide a generalized ranking framework and address its practical implications. Work done while at the Basic Research Unit, Helsinki Institute for Information Technology, Department of Computer Science, University of Helsinki Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. In this way the authors can, e. g., extend the PageRank link analysis algorithm to relational databases and give this extension a random querier interpretation. 

The nodes in the graph are the database tuples and the directed relationships between the nodes are induced by foreign key or other constraints. 

The first language used for obtaining RelWalk rankings was L1 = {q1(xj |xi), q2(x|)} where q1(xj |xi) ≡ πi,jR(x1, x2, x3, x4) with i 6= j and preference f1 = 0.9 and q2(x|) ≡ π1R ∪ π2R ∪ π3R ∪ π4R with preference f2 = 0.1. 

The second language used for obtaining RelWalk rankings was L2 = {q1(xj |xi), q2(x2|x6), q3(x|)} where q1(xj |xi) ≡ πi,jR(x1, x2, x3, x4) with i 6= j and preference f1 = 0.45, q2(x2|x6) ≡ π2,6σ1=5,26=6R × R with preference f2 = 0.45 , and finally q3(x|) ≡ π1R∪π2R∪ π3R∪π4R with preference f3 = 0.1. 

An important restriction is that while π1W ∪π2W may be asked by the random surfer independent of the current page, π2σ1=vW may only be asked when the surfer is at page v. 

the database graph will be not connected because there is an edge in the database graph from vertex {s, t} to vertex {t} by q2, but no edge exists from {t} to {s, t}. 

Language L′1 = {q1(x2|x3), q2(x2|x4)} consists of q1(x2|x3) ≡ π2,3R(x1, x2, x3, x4) with preference f1 = 0.5 and q2(x2|x4) ≡ π2,4R(x1, x2, x3, x4) with preference f2 = 0.5. 

For a given database D and query language L ⊆ RA, a tuple ~s ∈ adom(D)k is L-linked to a tuple ~t ∈ adom(D)` iff there exists a query q(~y|~x) ∈ 

In huge databases the users that pose a query would like to see the top-k partial tuples that satisfy their query rather than thousands of tuples ordered in a completely uninformative way. 

Many ranking algorithms for web pages have been developed ([11, 6, 22, 9, 25]) with the most popular among them being the HITS algorithm proposed by Kleinberg [22] and the PageRank algorithm proposed by Brin et.al [11]. 

The reason why the authors work with the active domains instead of the tuples themselves is that a constant appearing in some attribute can possibly be used in other attributes as well. 

The intuition behind L ′ 1 is that, as in L1, partial tuples of size 1 and direct links between them are again included in the database graph. 

The database graph is defined without taking into account any semantic relationships between attributes or additional schema constraints. 

The experiments conducted and described in the previous subsection show that the obtained rankings are highly dependent on the query languages that are used for constructing the database graphs. 

More specifically, when the random surfer is on a web page, the probability that he clicks on one hyperlink of the page depends solely on the number of outgoing links the latter has. 

The problem of assigning rank values to partial tuples in the relational framework is related to the problem of ranking web pages.