# Top-K nearest keyword search on large graphs

...Index Terms—Single-pair shortest path, KNN search, keyword search, road network, index, spatial databases Ç...

...3.3 G-tree Construction In this section, we present how to construct the G-tree....

...Such queries, known as spatial keyword queries, which find the top-k objects of interest in terms of both spatial proximity and textual relevance to the query, have been extensively studied in recent years [6][13][15][20][21][25][26][27]....

...Technique Boolean keyword Continuous query Unknown path Static data objects Road network Safe region ROAD[13], G-tree[27], SP-tree[20], FBS[11] – – OA-kNN[21] – – YPK-CNN[24], CPM[16], GMA[17] CkNN[22] UNICONS[5] V∗-Diagram[18], MkSK[23], INS[14] LARC...

...SP-tree [20] deals with the problem of keyword search on large graphs by introducing a shortest path tree, thus the network distances between results and query are approximated by tree distances....

...(2) Both methods [4, 26] assume that the index can reside in main memory....

...Given a graph G = (V,E) with vertex set V , and edge set E, the algorithm in [4] incurs a (2 log2 |V | − 1) approximation factor, which can be quite large given large values of |V |, and as shown in [26], the resulting error is significant in their empirical study in real graphs and good solutions can be missed....

...The authors of [26] point out that the error introduced by the star summary in [4] can be large....

...Both PMI and pivot-gs were implemented by the authors of [26]....

...As pointed out in [4] and [26], some keyword queries in a network are generated from a vertex inside the network with an interest of looking for vertices in a near-vicinity of the network....

...The social distance is usually modeled as the shortest distance on the social graph [9], [10], [6], [5], [7]....

...(1) Social Relevance: The social distance for two vertices v ↔ v′ is adopted as the shortest distance [9], [10], [6], [5]....

...[29] studied the top-k nearest keyword (k-NK) query over a graph....

...Therefore, the studied problems in [28], [29] are different from that in this paper, and their proposed techniques cannot be directed used for solving our problem in this paper....

...There are some works to study the variants of graph keyword search [28], [29]....

...We use six metrics for evaluation: hit rate, Spearman’s rho [21], error, query time, index time, and index size....

...The answer substructure can be a tree [12, 3, 13, 8, 10, 9], a subgraph [16, 17] or a r-clique [14]....

...k Interval [1, 1] [2, 3] [4, 5] [6, 6] [7, 8]...

...The answer substructure can be a tree [12, 3, 13, 8, 10, 9], a subgraph [16, 17] or a r-clique [14]....

...[2, 6] is processed recursively by invoking partition(EEP(λ), [2, 6], (r, a),CT(λ)), and [7, 20] is processed by the other two child nodes c and b similarly....

...16 17 19 Interval [1,2] 3 [4,5] 6 [7,10]...

...We first process edge (r, a) with interval [2, 6], which divides the interval [1, 20] into three parts: [1, 1], [2, 6], and [7, 20]....

...k Interval [1, 1] [2, 3] [4, 5] [6, 6] [7, 8]...

...Using the techniques in [2], LCA(u, v) can be found in O(1) time using O(|V |) index space....

...The answer substructure can be a tree [12, 3, 13, 8, 10, 9], a subgraph [16, 17] or a r-clique [14]....

...For the node h, its interval is [10, 18] because the preorder of h on T is 10 and the maximum preorder for all nodes on the subtree rooted at h is 18....

