# Top-K nearest keyword search on large graphs

01 Aug 2013-Vol. 6, Iss: 10, pp 901-912

Abstract: It is quite common for networks emerging nowadays to have labels or textual contents on the nodes. On such networks, we study the problem of top-k nearest keyword (k-NK) search. In a network G modeled as an undirected graph, each node is attached with zero or more keywords, and each edge is assigned with a weight measuring its length. Given a query node q in G and a keyword λ, a k-NK query seeks k nodes which contain λ and are nearest to q. k-NK is not only useful as a stand-alone query but also as a building block for tackling complex graph pattern matching problems.The key to an accurate k-NK result is a precise shortest distance estimation in a graph. Based on the latest distance oracle technique, we build a shortest path tree for a distance oracle and use the tree distance as a more accurate estimation. With such representation, the original k-NK query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We propose two efficient algorithms to report the exact k-NK result on a tree. One is query time optimized for a scenario when a small number of result nodes are of interest to users. The other handles k-NK queries for an arbitrarily large k efficiently. In obtaining a k-NK result on a graph from that on trees, a global storage technique is proposed to further reduce the index size and the query time. Extensive experimental results conform with our theoretical findings, and demonstrate the effectiveness and efficiency of our k-NK algorithms on large real graphs.
ABSTRACT
It is quite common for networks emerging nowadays to have labels
or textual contents on the nodes. On such networks, we study the
problem of top-k nearest keyword (k-NK) search. In a network G
modeled as an undirected graph, each node is attached with zero or
more keywords, and each edge is assigned with a weight measuring
its length. Given a query node q in G and a keyword λ, a k-NK
query seeks k nodes which contain λ and are nearest to q. k-NK is
not only useful as a stand-alone query but also as a building block
for tackling complex graph pattern matching problems.
The key to an accurate k-NK result is a precise shortest distance
estimation in a graph. Based on the latest distance oracle technique,
we build a shortest path tree for a distance oracle and use the tree
distance as a more accurate estimation. With such representation,
the original k-NK query on a graph can be reduced to answering
the query on a set of trees and then assembling the results obtained
from the trees. We propose two efﬁcient algorithms to report the
exact k-NK result on a tree. One is query time optimized for a
scenario when a small number of result nodes are of interest to
users. The other handles k-NK queries for an arbitrarily large k
efﬁciently. In obtaining a k-NK result on a graph from that on trees,
a global storage technique is proposed to further reduce the index
size and the query time. Extensive experimental results conform
with our theoretical ﬁndings, and demonstrate the effectiveness and
efﬁciency of our k-NK algorithms on large real graphs.
1. INTRODUCTION
Many real-world networks emerging nowadays have labels or
textual contents on the nodes. For example in a road network, a
location may have labels such as “McDonald’s”, “hospital”, and
“kindergarten”. In a social network, a person may have informa-
tion including name, interests and skills, etc.. In a bibliographic
network, a paper may have keywords and abstract, and an author
may have name, afﬁliation and email address. In this study, we
consider the problem of top-k nearest keyword (k-NK) search on
large networks. In a network G modeled as an undirected graph,
each node is attached with zero or more keywords, and each edge
is assigned with a weight measuring its length. Given a query node
