scispace - formally typeset
Open AccessProceedings ArticleDOI

Ranking document clusters using markov random fields

Reads0
Chats0
TLDR
This work presents a novel cluster ranking approach that utilizes Markov Random Fields (MRFs), and shows that it significantly outperforms state-of- the-art cluster ranking methods and can be used to improve the performance of results-diversification methods.
Abstract
An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types of cluster-relevance evidence; e.g., the query-similarity values of the cluster's documents and query-independent measures of the cluster. We use our method to re-rank an initially retrieved document list by ranking clusters that are created from the documents most highly ranked in the list. The resultant retrieval effectiveness is substantially better than that of the initial list for several lists that are produced by effective retrieval methods. Furthermore, our cluster ranking approach significantly outperforms state-of- the-art cluster ranking methods. We also show that our method can be used to improve the performance of (state-of- the-art) results-diversification methods.

read more

Content maybe subject to copyright    Report

Ranking Document Clusters Using Markov Random Fields
Fiana Raiber
fiana@tx.technion.ac.il
Oren Kurland
kurland@ie.technion.ac.il
Faculty of Industr ial Engineering and Management, Technion
Haifa 32000, Israel
ABSTRACT
An important challenge in cluster-based document retrieval
is ranking document clusters by their relevance to the query.
We present a novel cluster ranking approach that utilizes
Markov Random Fields (MR Fs). MRFs enable the integra-
tion of various types of cluster-relevance evidence; e.g., the
query-similarity values of the cluster’s documents and query-
independent measures of the cluster. We use our method to
re-rank an in itially retrieved document list by ranking clus-
ters that are created from the docum ents most highly ranked
in the list. The resultant retrieval eff ectiveness is substan-
tially better than th at of the initial list for several lists that
are produced by effective retrieval methods. Furthermore,
our cluster ranking approach significantly outperforms state-
of-the-art cluster ranking methods. We also show that our
metho d can be used to improve the performance of (state-
of-the-art) results-diversification methods.
Categories and Subject Descriptors: H.3.3 [Information Search
and Retrieval]: Retrieval models
General Terms: Algorithms, Experimentation
Keywords: ad hoc retrieval, cluster ranking, query-specific clus-
ters, markov random fields
1. INTRODUCTION
The cluster hypothesis [33] gave rise to a large body of
work on using query-specific document clusters [35] for im-
proving retrieval effectiveness. These clusters are created
from documents that are the most highly ranked by an ini-
tial search performed in response to the query.
For many queries there are query-specific clusters that
contain a very high percentage of relevant documents [8,
32, 25, 14]. Furthermore, positioning the constituent doc-
uments of these clusters at the top of the result list yields
highly effective retrieval performance; specifically, much bet-
ter than that of state-of-the art retrieval methods that rank
documents directly [8, 32, 25, 14, 10].
As a result of these findings, there has been much work on
ranking query-specific clusters by their presumed relevance
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.
SIGIR’13, July 28–August 1, 2013, Dublin, Ireland.
Copyright 2013 ACM 978-1-4503-2034-4/13/07 ...$15.00.
to the query (e.g., [35, 22, 24, 25, 26, 14, 15]). Most previous
approaches to cluster ranking compare a representation of
the cluster with that of the query. A few methods integrate
additional types of information such as inter-cluster an d
cluster-do cument similarities [18, 14, 15]. However, th ere
are no reports of fundamental cluster ranking frameworks
that enable to effectively integrate various information types
that might attest to the relevance of a cluster to a query.
We present a novel cluster ranking approach that uses
Markov Random Fields. The approach is based on integrat -
ing various types of cluster-relevance evidence in a princi-
pled manner. These include the query-similarity values of
the cluster’s documents, inter-document similarities within
the cluster, and measures of query- in dependent properties
of the cluster, or more precisely, of its documents.
A large array of experiments conducted with a variety of
TREC datasets demonstrates the high effectiveness of using
our cluster ranking method to re-rank an initially retrieved
document list. The resultant retrieval performance is sub-
stantially better than that of the initial rankin g for several
effective rankings. Furthermore, our method significantly
outperforms state-of-the-art cluster ranking methods. Al-
though the met hod ranks clusters of similar documents, we
show that using it to induce document ranking can help to
substantially improve the effectiveness of (state-of-the-art)
retrieval methods that diversify search results.
2. RETRIEVAL FRAMEWORK
Suppose that some search algorithm was employed over
a corpus of documents in response to a query. Let D
init
be
the list of the initially highest ranked documents. Our goal
is to re-rank D
init
so as to improve retrieval effectiveness.
To that end, we employ a standard cluster-based retrieval
paradigm [34, 24, 18, 26, 15]. We first apply some cluster-
ing method upon the documents in D
init
; C l(D
init
) is the
set of resultant clusters. Then, the clusters in C l(D
init
) are
ranked by their presumed relevance to the query. Finally,
the clusters’ ranking is transformed to a ranking of the d oc-
uments in D
init
by replacing each cluster with its constituent
documents and omitting repeats in case the clusters overlap.
Documents in a cluster are ordered by their query similarity.
The motivation for employing the cluster-based approach
just described follows the cluster hypothesis [33]. That is,
letting similar documents provide relevance status su pport
to each other by the virtue of being members of the same
clusters. The challenge that we address here is devising a
(novel) cluster ranking method i.e., we tackle the second
step of the cluster-b ased retrieval paradigm.
333

Figure 1: The three types of cliques considered for graph G. G is composed of a query node (Q) and three
(for the sake of the example) nodes (d
1
, d
2
, and d
3
) that correspond to the documents in cluster C. (i) l
QD
contains the query and a single document from C; (ii) l
QC
contains all nodes in G; and, (iii) l
C
contains only
the documents in C.
Formally, let C and Q d enote random variables that take
as values document clusters and queries respectively. The
cluster ranking task amounts to estimating the probability
that a cluster is relevant to a query, p(C|Q):
p(C|Q) =
p(C, Q)
p(Q)
rank
= p(C, Q). (1)
The rank equivalence holds as clusters are ranked with re-
sp ect to a fixed q uery.
To estimate p(C, Q), we use Markov Random Fields (MRFs).
As we discuss below, MRFs are a convenient framework for
integrating various types of cluster-relevance evidence.
2.1 Using MRFs to rank document clusters
An MRF is defi ned over a graph G. No des represent
random variables and edges represent dependencies between
these variables. Two nodes that are not connected with an
edge correspon d to random variables that are indep endent of
each other given all other random variables. The set of nodes
in the graph we constru ct is composed of a node representing
the query and nodes representing the cluster’s constituent
documents. The joint probability over G’s nodes, p(C, Q),
can be expressed as follows:
p(C, Q) =
Q
lL(G)
ψ
l
(l)
Z
; (2)
L(G) is th e set of cliques in G and l is a clique; ψ
l
(l)
is a potential (i.e., positive function) defined over l; Z =
P
C,Q
Q
lL(G)
ψ
l
(l) is the normalization factor that serves
to ensure that p(C, Q) is a probability distribution. The
normalizer need not be computed here as we rank clusters
with respect to a fixed query.
A common instantiation of potential functions is [28]:
ψ
l
(l)
def
= exp(λ
l
f
l
(l)),
where f
l
(l) is a feature function defined over the clique l
and λ
l
is the weight associated with this function. Accord-
ingly, omitting the normalizer from Equation 2, applying the
rank-preserving log transformation, and substituting the po-
tentials with the corresponding feature functions results in
our ClustMRF cluster rankin g meth od:
p(C|Q)
rank
=
X
lL(G)
λ
l
f
l
(l). (3)
This is a generic linear (in feature functions) cluster ranking
function that depends on the graph G. To instantiate a sp e-
cific ranking method, we need to (i) determine G’s structure,
sp ecifically, its clique set L(G); and, (ii) associate feature
functions with the cliques. We next address these two tasks.
2.1.1 Cliques and feature functions
We consider three types of cliques in the graph G. These
are depicted in Figure 1. In what follows we write d C to
indicate that document d is a member of cluster C.
The first clique (type), l
QD
, contains the query and a sin-
gle document in the cluster. This clique serves for making
inferences based on t he query similarities of the cluster’s
constituent documents when considered independently. The
second cliqu e, l
QC
, contains all nodes of the graph; that is,
the q uery Q and all C’s constituent docu m ents. This clique
is used for inducing information from the relations between
the query-similarity values of the cluster’s constituent docu-
ments. The third clique, l
C
, contains only the cluster’s con-
stituent documents. It is used to induce information based
on query-independent properties of the cluster’s documents.
In what follows we describe the feature functions defined
over the cliques. In some cases a few feature functions are
defined for the same clique, and these are used in the summa-
tion in Equation 3. Note that the sum of feature functions
is also a feature function. The weights associated with the
feature functions are set using a train set of queries. (Details
are provided in Section 4.1.)
The l
QD
clique. High q uery similarity exhibited by C’s
constituent documents can potentially imply to C’s rele-
vance [26]. Accordingly, let d ( C) be the document in
l
QD
. We define f
geoqsim;l
QD
(l
QD
)
def
= log sim(Q, d)
1
|C|
,
where |C| is the number of do cuments in C, and sim(·, ·) is
some inter-text similarity measure, details of which are pro-
vided in Section 4.1. Using this feature function in Equation
3 for all the l
QD
cliques of G amounts to using the geometric
mean of the query-similarity values of C’s constituent docu-
ments. All feature functions that we consider use logs so as
to h ave a conjunction semantics for the integration of their
assigned values when using Equation 3.
1
The l
QC
clique. Using the l
QD
clique from above results
in considering the query-similarity values of the cluster’s
documents independently of each other. In contrast, the
l
QC
clique provides grounds for utilizing the relations be-
tween these similarity values. Specifically, we use the log
1
Before applying the log function we employ add-ǫ (=
10
10
) smoothing.
334

of the minimal, maximal, and standard deviation
2
of t he
{sim(Q, d)}
dC
values as feature functions for l
QC
, denoted
min-qsim, max-qsim, and stdv-qsim, respectively.
The l
C
clique. Heretofore, the l
QD
and l
QC
cliques served
for inducing information from the query similarity values of
C’s documents. We now consider query-independent proper-
ties of C that can potentially attest to its relevance. Doing so
amounts to defining feature functions over the l
C
clique t hat
contains C’s documents but not the query. All the feature
functions that we define for l
C
are constructed as follows.
We first define a query-independent document measure, P,
and apply it to document d ( C) yielding t he value P(d).
Then, we use log A({P(d)}
dC
) where A is an aggregator
function: minimum, maximum, and geometric mean. The
resultant feature functions are referred to as min-P, max-
P, and geo-P, respectively. We next describe the document
measures that serve as t he basis for the feature functions.
The cluster hypothesis [33] implies that relevant docu-
ments should be similar to each other. Accordingly, we mea-
sure for document d in C its similarity with all documents
in C: P
dsim
(d)
def
=
1
|C|
P
d
i
C
sim(d, d
i
).
The next few query-independent document measures are
based on the following premise. The higher the breadth of
content in a docum ent, the higher the probability it is rel-
evant to some query. Thus, a cluster containing documents
with broad content should be assigned with relatively high
probability of being relevant to some query.
High entropy of the term distribution in a document is a
potential indicator for content breadth [17, 3]. This is be-
cause the distribution is “spread” over many terms rather
than focused over a few ones. Accordingly, we define
P
entropy
(d)
def
=
P
wd
p(w|d) log p(w|d), where w is a term
and p(w|d) is the p robab ility assigned to w by an unsmoothed
unigram language model ( i.e., maximum likelihood estimate)
induced from d.
Inspired by work on Web spam classification [9], we use
the inverse compression ratio of document d, P
icompress
(d),
as an additional measure. (Gzip is used for compression.)
High compression ratio presumably attests to reduced con-
tent breadth [9].
Two additional content-breadth measures t hat were pro-
posed in work on Web retrieval [3] are the ratio between the
number of stopwords and non-stopwords in the document,
P
sw1
(d); and, the fraction of stopwords in a stopword list
that appear in the document, P
sw2
(d). We use INQUERY’s
stopword list [2]. A document containing many stopwords
is presumably of richer language (and hence content) than
a document that does not contain many of these; e.g., a
document containing a table composed only of keywords [3].
For some of the Web collections u sed for evaluation in
Section 4, we also use the PageRank score [4] of the d ocu-
ment, P
pr
(d), and the confidence level that the document is
not spam, P
spam
(d). The details of the spam classifier are
provided in Section 4.1.
We note that using the feature fun ct ions that result from
applying the geometric mean aggregator upon the query-
independent document measures just described, excep t for
2
It was recently argued t hat high variance of the query-
similarity values of the cluster’s documents might be an in-
dicator for the cluster’s relevance, as it presumably attests
to a low level of “q uery drift” [19].
dsim, could h ave been described in an alternative way. That
is, using log P(d)
1
|C|
as a feature function over a clique con-
taining a single document. Then, using these feature func-
tions in Equation 3 amounts to using the geometric mean.
3
3. RELATED WORK
The work most related to ours is t hat on devising cluster
ranking methods. The standard approach is based on mea-
suring the similarity between a cluster representation and
that of the query [7, 34, 35, 16, 24, 25, 26]. Specifically, a
geometric-mean-based cluster representation was shown to
be highly effective [26, 30, 15]. Indeed, ranking clusters by
the geometric mean of the query-similarity values of their
constituent d ocuments is a state-of-the-art cluster ranking
approach [15]. This approach rose as an integration of fea-
ture functions used in ClustMRF, and is shown in S ection 4
to substantially underperform ClustMRF.
Clusters were also ranked by the h ighest query similar-
ity exhibited by their constituent documents [22, 31] and by
the variance of these similarities [25, 19]. ClustMRF incor-
porates these m ethods as feature functions and is shown to
outperform each.
Some cluster ranking methods use inter-cluster and cluster-
document similarities [14, 15]. While ClustMRF does not
utilize such similarities, it is shown to substantially outper-
form one such state-of-the-art method [15].
A different use of clusters in past work on cluster-based
retrieval is for “smooth in g” (enriching) the representation of
documents [20, 16, 24, 13]. ClustMRF is shown to substan-
tially outperform one such state-of-the-art method [13].
To the best of our knowledge, our work is first to use
MRFs for cluster ranking. In the context of retrieval tasks,
MRFs were first introduced for ranking documents directly
[28]. We show that using ClustMRF to produce document
ranking substantially outperforms this retrieval approach;
and, that which augments the standard MRF retrieval model
with query-independent document measures [3]. MRFs were
also used, for example, for query expansion, passage-based
document retrieval, and weighted concept expansion [27].
4. EVALUATION
4.1 Experimental setup
corpus # of docs data queries
AP 242,918 Disks 1-3 51-150
ROBUST 528,155 Disks 4-5 (-CR)
301-450,
600-700
WT10G 1,692,096 WT10g 451-550
GOV2 25,205,179 GOV2 701-850
ClueA
503,903,810 ClueWeb09 (Category A) 1-150
ClueAF
ClueB
50,220,423 ClueWeb09 (Category B) 1-150
ClueBF
Table 1: Datasets used for experiments .
The TREC datasets specified in Table 1 were used for
experiments. AP and ROBUST are small collections, com-
posed mostly of news articles. WT10G and GOV2 are Web
3
Similarly, we could have used the geometric mean of the
query-similarity values of the cluster constituent documents
as a feature funct ion defined over the l
QC
clique rather than
constructing it using th e l
QD
cliques as we did above.
335

collections; th e latter is a crawl of the .gov domain. For
the ClueWeb Web collection both the English part of Cat-
egory A (ClueA) and the Category B subset (ClueB) were
used. ClueAF and ClueBF are two additional experimental
settings created from ClueWeb followin g previous work [6].
Specifically, documents assigned by Waterloo’s spam classi-
fier [6] with a score below 70 and 50 for ClueA and ClueB,
respectively, were ltered out from the initial corpus rank-
ing described below. The score indicates the percentage of
all documents in ClueWeb Category A that are presumably
“spammier” than t he document at hand. The ranking of the
residual corpus was used to create the do cument list upon
which the various methods operate. Waterloo’s spam score
is also used for the P
spam
(·) measure that was described in
Section 2.1. The P
spam
(·) and P
pr
(·) (PageRank score) mea-
sures are used only for the ClueWeb-based settings as th ese
information types are not available for the other settings.
The titles of TREC topics served for queries. All data
was stemmed using the Krovetz stemmer. Stopwords on
the INQUERY list were removed from queries but not from
documents. The Indri toolkit (www.lemurproject.org/indri)
was used for experiments.
Initial retrieval and clustering. As described in Section
2, we use the ClustMRF cluster ranking method to re-rank
an initially retrieved document list D
init
. Recall t hat af-
ter ClustMRF ranks the clusters created from D
init
, these
are “replaced” by their constituent docum ents while omit-
ting repeats. Documents within a cluster are ranked by
their qu ery similarity, the measure of which is detailed be-
low. This cluster-based re-ranking approach is emp loyed
by all the reference comparison methods t hat we use and
that rely on cluster ranking. Furthermore, ClustMRF and
all reference comparison approaches re-rank a list D
init
that
is composed of the 50 documents t hat are the most highly
ranked by some retrieval method specified below. D
init
is rel-
atively short following recommendations in previous work on
cluster-based re-ranking [18, 25, 26, 13]. In Section 4.2.7 we
study the effect of varying the list size on the performance
of ClustMRF and the reference comparisons.
We let all methods re-rank three different initial lists D
init
.
The first, denoted MRF, is used unless otherwise specified.
This list contains the documents in t he corpus that are the
most highly ranked in response to th e query when using the
state-of-the-art Markov Random Field approach with the
sequential dependence mod el (SDM) [28]. The free param-
eters that control the use of term proximity information in
SDM, λ
T
, λ
O
, and λ
U
, are set to 0.85, 0.1, and 0.05, respec-
tively, following previous recommendations [28]. We also use
MRF’s SDM with its free parameters set using cross valida-
tion as one of the re-ranking reference comparisons. (De-
tails provided below.) All methods operating on the MRF
initial list use the expon ent of the document score assigned
by SDM which is a rank-equivalent estimate to that of
log p(Q, d) as sim
MRF
(Q, d), the d ocument-query simi-
larity measure. This measure was used to induce the initial
ranking using which D
init
was created. More generally, for a
fair performance comp arison we maintain in all the experi-
ments the invariant that the scoring function used to create
an initially retrieved list is rank equivalent to the document-
query similarity measure used in methods operating on the
list. Furthermore, the document-query similarity measure is
used in all methods that are based on cluster ranking (in-
cluding ClustMRF) to order documents within the clusters.
The second initial list used for re-ranking, DocMRF (dis-
cussed in Section 4.2.4), is created by enriching MRF’s SDM
with query-independent document measures [3].
The third initial list, LM, is addressed in Section 4.2.5.
The list is created using unigram language models. In con-
trast, the MRF and DocMRF lists were created using re-
trieval methods that use term proximity information. Let
p
Dir[µ]
z
(·) be the Dirichlet-smoothed unigram language model
induced from text z; µ is the smoothing parameter. The LM
similarity between texts x and y is sim
LM
(x, y)
def
=
exp
CE
p
Dir[0]
x
(·)
p
Dir[µ]
y
(·)

[37, 17], where CE is
the cross entropy measure; µ is set to 1000.
4
Accordingly,
the LM initial list is created by using sim
LM
(Q, d) to rank
the entire corpus.
5
This measure serves as the document-
query similarity measure for all meth ods operating over the
LM list, and for th e inter-document similarity measure used
by the dsim feature function.
Unless otherwise stated, to cluster any of the three ini-
tial lists D
init
, we use a simple nearest-neighbor clustering
approach [18, 25, 14, 26, 13, 15]. For each document d
( D
init
), a cluster is created from d and the k 1 docu-
ments d
i
in D
init
(d
i
6= d) with the highest sim
LM
(d, d
i
); k
is set to a valu e in {5, 10, 20} using cross validation as de-
scribed below. Using such small overlapping clusters (all of
which contain k documents) was shown to be highly eff ective
for cluster-based document retrieval [18, 25, 14, 26, 13, 15].
In Section 4.2.6 we also study the performance of ClustMRF
when using hierarchical agglomerative clustering.
Evaluation metrics and free parameters. We use MAP
(computed at cutoff 50, the size of the list D
init
that is re-
ranked) and the precision of t he top 5 documents (p@5) and
their NDCG (NDCG@5) for evaluation measures.
6
The free
parameters of our ClustMRF method, as well as t hose of all
reference comparison methods, are set using 10-fold cross
validation performed over the queries in an experimental
setting. Query IDs are the basis for creating the folds. The
two-tailed paired t-test with p 0.05 was used for testing
statistical significance of performance differences.
For our ClustMRF method, th e free-parameter values are
set in two steps. First, SVM
rank
[12] is used to learn the val-
ues of the λ
l
weights associated with the feature functions.
The NDCG@k of the k constituent documents of a cluster
serves as the cluster score used for ranking clusters in the
learning phase
7
. (Recall from above that documents in a
4
The MRF SDM used above also uses Dirichlet-smoothed
unigram language models with µ = 1000.
5
Queries for which th ere was not a single relevant document
in the MRF or LM initial lists were removed from the eval-
uation. For the ClueWeb settings, the same query set was
used for ClueX and ClueXF.
6
We note that statAP, rather than AP, was the official
TREC evaluation metric in 2009 for ClueWeb with queries
1–50. For consistency with the other queries for ClueWeb,
and following previous work [3], we use AP for all ClueWeb
queries by treating prel files as qrel les. We hasten to point
out that evaluation using statAP for the ClueWeb collections
with queries 1–50 yielded relative performance patterns that
are highly similar to those attained when using AP.
7
Using MAP@k as the cluster score resulted in a slightly
less effective performance. We also note that learning-to-
336

Init TunedMRF ClustMRF
AP
MAP 10.1 9.9 10.8
p@5 50.7 48.7 53.0
NDCG@5 50.6 49.4 54.4
t
ROBUST
MAP 19.9 20.0 21.0
i
t
p@5 51.0 51.0 52.4
NDCG@5 52.5 52.7 54.7
WT10G
MAP 15.8 15.4 18.0
i
t
p@5 37.5 36.9 44.9
i
t
NDCG@5 37.2 35.3
i
42.8
i
t
GOV2
MAP 12.7 12.7 14.2
i
t
p@5 59.3 60.8 70.1
i
t
NDCG@5 48.6 49.5 56.2
i
t
ClueA
MAP 4.5 4.9
i
6.3
i
t
p@5 19.1 21.1 44.6
i
t
NDCG@5 12.6 15.6
i
29.4
i
t
ClueAF
MAP 8.6 8.7 8.9
p@5 46.3 47.8 50.2
NDCG@5 32.4 33.1 33.9
ClueB
MAP 12.5 13.5
i
16.1
i
t
p@5 33.1 35.5 48.7
i
t
NDCG@5 24.4 27.0 37.4
i
t
ClueBF
MAP 15.8 16.3
i
17.0
p@5 44.8 46.8 48.5
NDCG@5 33.2 34.3 36.9
Table 2: The performance of ClustMRF and a tuned
MRF (TunedMRF) when re-ranking the MRF ini-
tial l ist (Init). Boldface: the best result in a row. ’i’
and t’ mark statistically significant differences with
Init and TunedMRF, resp ectively.
cluster are ordered based on their query similarity.) A rank-
ing of documents in D
init
is created from the cluster ranking,
which is performed for each cluster size k ( {5, 10, 20}), us-
ing the approach described above; k is then also set using
cross validation by optimizing the MAP performance of the
resulting document ranking. The train/test split for the
first and second steps are the same i.e., the same train
set used for learning the λ
l
’s is the one used for setting t he
cluster size. As is the case for ClustMRF, the nal docu-
ment ranking in duced by any reference comparison metho d
is based on using cross validation to set free-parameter val-
ues; and, MAP serves as the optimization criterion in the
training (learning) phase.
Finally, we note that the main computational overhead,
on top of the initial ranking, incurred by using ClustMRF is
the clustering. That is, the feature functions used are either
query-ind epen dent, and therefore can be computed offline;
or, use mainly document-query similarity values that have
already been computed to create the initial ranking. Clus-
tering of a few dozen d ocuments can be computed efficiently;
e.g., based on document snippets.
4.2 Experimental results
4.2.1 Main result
Table 2 presents our main result. Namely, the perfor-
mance of ClustMRF when used to re-rank the MRF initial
list. Recall that the initial ranking was induced using MRF’s
SDM with free-parameter values set following previous rec-
ommendations [28]. Thus, we also present for reference the
re-ranking performance of using MRF’s SDM with its three
free parameters set using cross validation as is the case for
rank methods [23] other than SVM
r
ank
, which proved to
result in highly effective performance as shown below, can
also be used for setting the values of the λ
l
weights.
ClustMRF
stdv-
qsim
max-
sw2
geo-
qsim
min-
sw2
AP
MAP 10.8 9.4 9.7 10.6 9.6
p@5 53.0 43.7
c
44.6
c
50.9 49.1
NDCG@5 54.4 45.0
c
45.8
c
52.0 50.4
ROBUST
MAP 21.0 19.0
c
17.7
c
20.6 16.8
c
p@5 52.4 50.7 46.9
c
50.4 44.7
c
NDCG@5 54.7 52.4 49.1
c
52.4 45.9
c
WT10G
MAP 18.0 15.4
c
12.2
c
16.3
c
14.2
c
p@5 44.9 38.4
c
31.7
c
39.3
c
33.9
c
NDCG@5 42.8 37.8
c
28.6
c
39.0
c
32.4
c
GOV2
MAP 14.2 12.7
c
12.9
c
13.2
c
14.2
p@5 70.1 59.3
c
62.3
c
58.0
c
66.3
NDCG@5 56.2 48.2
c
48.8
c
46.6
c
52.3
ClustMRF
max-
sw2
max-
sw1
max-
qsim
geo-
qsim
ClueA
MAP 6.3 5.4
c
5.3
c
4.5
c
4.8
c
p@5 44.6 28.7
c
29.3
c
18.7
c
20.9
c
NDCG@5 29.4 20.3
c
20.5
c
12.4
c
14.0
c
ClueAF
MAP 8.9 8.6 7.8
c
8.3 8.6
p@5 50.2 47.2 40.4
c
49.3 48.7
NDCG@5 33.9 32.5 28.9
c
34.3 33.9
ClueB
MAP 16.1 14.2
c
15.4 12.8
c
12.9
c
p@5 48.7 41.9
c
42.9
c
33.9
c
34.2
c
NDCG@5 37.4 30.1
c
32.5
c
25.5
c
25.6
c
ClueBF
MAP 17.0 16.3 15.7
c
14.8
c
15.9
p@5 48.5 45.0 42.3
c
42.9
c
43.2
NDCG@5 36.9 35.5 32.8 32.8 33.6
Table 3: Using each of ClustMRF’s top-4 feature
functions by itself for ranking the clusters so as to
re-rank the MRF initial list. Boldface: the bes t per-
formance per row. ’c’ marks a statistically signifi-
cant difference with ClustMRF.
the free parameters of ClustMRF; TunedMRF denotes this
metho d. We found that using exhaustive search for finding
SDM’s optimal parameter values in the training phase yields
better performance (on th e test set) than using SVM
rank
[12] and SVM
map
[36]. Specifically, λ
T
, λ
O
, and λ
U
were
set to values in {0, 0.05, . . . , 1} with λ
T
+ λ
O
+ λ
U
= 1.
We first see in Table 2 that while TunedMRF outperforms
the initial MRF ranking in most relevant comparisons (ex-
perimental setting × evaluation measure), there are cases
(e.g., for AP and WT10G) for which the reverse holds. The
latter finding implies that optimal free-parameter values of
MRF’s SDM do not necessarily generalize across queries.
More importantly, we see in Table 2 that ClustMRF out-
performs both the initial ranking and TunedMRF in all rel-
evant comparisons. Many of the improvements are substan-
tial and statistically significant. These findings attest to the
high effectiveness of using ClustMRF for re-ranking.
4.2.2 Analysis of feature functions
We now t urn to analyze the relative importance attributed
to the different feature functions used in ClustMRF; i.e., the
λ
l
weights assigned to these fun ct ions in the training phase
by SVM
rank
. We first average, per experimental setting and
cluster size, the weights assigned to a feature function using
the different training folds. Then, the feature function is
assigned with a score that is the reciprocal rank of its cor-
responding (average) weight. Finally, the feature functions
are ordered by averaging their scores across experimental
settings and cluster sizes. Two feature functions, pr and
spam, are only used for the ClueWeb-based settings. Hence,
we perform the analysis separately for the ClueWeb and non-
ClueWeb (AP, ROBUST, WT10G, and GOV2) settings.
337

Citations
More filters
Journal ArticleDOI

Fast and effective cluster-based information retrieval using frequent closed itemsets

TL;DR: A new cluster-based information retrieval approach named ICIR (Intelligent Cluster-based Information Retrieval) is proposed, which combines k-means clustering with frequent closed itemset mining to extract clusters of documents and find frequent terms in each cluster.
Proceedings ArticleDOI

A Comparison of Retrieval Models using Term Dependencies

TL;DR: Comparisons of the effectiveness of recent bi-term dependency models over a range of TREC collections show that the weighted sequential dependence model is at least as effective as, and often significantly better than, any other model across this range of collections and queries.
Proceedings ArticleDOI

Query-performance prediction: setting the expectations straight

TL;DR: Focusing on this specific prediction task, namely query ranking by presumed effectiveness, a novel learning-to-rank-based approach that uses Markov Random Fields is presented and the resultant prediction quality substantially transcends that of state-of-the-art predictors.
Journal ArticleDOI

Cluster-based information retrieval using pattern mining

TL;DR: This paper addresses the problem of responding to user queries by fetching the most relevant object from a clustered set of objects by proposing a novel cluster-based information retrieval approach, named Cluster-based Retrieval using Pattern Mining (CRPM), which integrates various clustering and pattern mining algorithms.
Journal ArticleDOI

Cluster-based polyrepresentation as science modelling approach for information retrieval

TL;DR: An approach that combines a cognitive information retrieval framework based on the principle of polyrepresentation with document clustering to enable the user to explore a collection more interactively than by just examining a ranked result list is presented.
References
More filters
Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more
- 01 Jan 1998 - 
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Book

Learning to Rank for Information Retrieval

TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.
Journal ArticleDOI

The use of MMR, diversity-based reranking for reordering documents and producing summaries

TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Proceedings ArticleDOI

Training linear SVMs in linear time

TL;DR: A Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n)) for ordinal regression problems and several orders of magnitude faster than decomposition methods like svm light for large datasets.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What are the contributions mentioned in the paper "Ranking document clusters using markov random fields" ?

The authors present a novel cluster ranking approach that utilizes Markov Random Fields ( MRFs ). The authors use their method to re-rank an initially retrieved document list by ranking clusters that are created from the documents most highly ranked in the list. Furthermore, their cluster ranking approach significantly outperforms stateof-the-art cluster ranking methods. The authors also show that their method can be used to improve the performance of ( stateof-the-art ) results-diversification methods. 

The free parameters that control the use of term proximity information in SDM, λT , λO, and λU , are set to 0.85, 0.1, and 0.05, respectively, following previous recommendations [28]. 

The second initial list used for re-ranking, DocMRF (discussed in Section 4.2.4), is created by enriching MRF’s SDM with query-independent document measures [3]. 

for the ClueWeb settings, the feature functions defined over the lC clique and which are based on query-independent document measures (e.g., max-sw1, max-sw2, max-spam) are attributed with high importance. 

the authors maintain the invariant mentioned above that the scoring function used to induce the ranking upon which ClustMRF operates is rank equivalent to the document-query similarity measure used in ClustMRF. 

each of the three types of cliques used in Section 2.1 for defining the MRF has at least one associated feature function that is assigned with a relatively high weight. 

the authors define Pentropy(d) def = − ∑w∈d p(w|d) log p(w|d), where w is a term and p(w|d) is the probability assigned to w by an unsmoothed unigram language model (i.e., maximum likelihood estimate) induced from d.Inspired by work on Web spam classification [9], the authors use the inverse compression ratio of document d, Picompress(d), as an additional measure. 

More generally, the best performance for each diversification method (MMR and xQuAD) is almost always attained by ClustMRF, which often outperforms the other methods in a substantial and statistically significant manner. 

ClustMRF and all reference comparison approaches re-rank a list Dinit that is composed of the 50 documents that are the most highly ranked by some retrieval method specified below. 

The LM similarity between texts x and y is simLM (x, y) def = exp ( −CE ( p Dir[0] x (·) ∣ ∣ ∣ ∣ ∣ ∣ p Dir[µ] y (·) )) [37, 17], where CE isthe cross entropy measure; µ is set to 1000.4 

The graph out degree and the dumping factor used by CRank are set to values in {4, 9, 19, 29, 39, 49} and {0.05, 0.1, . . . , 0.9, 0.95}, respectively.