Topic
SimRank
About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.
Papers published on a yearly basis
Papers
More filters
••
02 Nov 2009TL;DR: A novel neighbor-based similarity measure called MatchSim, which uses only the neighborhood structure of web pages, which successfully overcomes a severe counterintuitive loophole in SimRank, due to its strict consistency with the intuitions of similarity.
Abstract: The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neighbor-based similarity measure called MatchSim, which uses only the neighborhood structure of web pages. Technically, MatchSim recursively defines similarity between web pages by the average similarity of the maximum matching between their neighbors. Our method extends the traditional methods which simply count the numbers of common and/or different neighbors. It also successfully overcomes a severe counterintuitive loophole in SimRank, due to its strict consistency with the intuitions of similarity. We give the computational complexity of MatchSim iteration. The accuracy of MatchSim is compared against others on two real datasets. The results show that our method performs best in most cases.
51 citations
••
19 May 2014TL;DR: A novel fast incremental algorithm computing similarities of n2 node-pairs in O(Kn2) time for K iterations that outperforms the best known link-update algorithm and runs much faster than its batch counterpart when link updates are small.
Abstract: SimRank is an arresting measure of node-pair similarity based on hyperlinks. It iteratively follows the concept that 2 nodes are similar if they are referenced by similar nodes. Real graphs are often large, and links constantly evolve with small changes over time. This paper considers fast incremental computations of SimRank on link-evolving graphs. The prior approach [12] to this issue factorizes the graph via a singular value decomposition (SVD) first, and then incrementally maintains this factorization for link updates at the expense of exactness. Consequently, all node-pair similarities are estimated in O(r4n2) time on a graph of n nodes, where r is the target rank of the low-rank approximation, which is not negligibly small in practice. In this paper, we propose a novel fast incremental paradigm. (1) We characterize the SimRank update matrix ΔS, in response to every link update, via a rank-one Sylvester matrix equation. By virtue of this, we devise a fast incremental algorithm computing similarities of n2 node-pairs in O(Kn2) time for K iterations. (2) We also propose an effective pruning technique capturing the “affected areas” of ΔS to skip unnecessary computations, without loss of exactness. This can further accelerate the incremental SimRank computation to O(K(nd+|AFF|)) time, where d is the average in-degree of the old graph, and |AFF| (≤ n2) is the size of “affected areas” in ΔS, and in practice, |AFF| ≪ n2. Our empirical evaluations verify that our algorithm (a) outperforms the best known link-update algorithm [12], and (b) runs much faster than its batch counterpart when link updates are small.
49 citations
••
TL;DR: This paper proposes a novel Mashup service clustering approach based on a structural similarity and a genetic algorithm based clustering algorithm that can cluster Mashup services efficiently without any constraints on the number of clusters, and its performance is better than other Mashupservice clustering approaches based on semantic metrics.
48 citations
••
23 May 2006TL;DR: This paper introduces a novel link-based similarity measure, called PageSim, which can measure similarity between any two web pages, whereas SimRank cannot in some cases.
Abstract: To find similar web pages to a query page on the Web, this paper introduces a novel link-based similarity measure, called PageSim. Contrast to SimRank, a recursive refinement of cocitation, PageSim can measure similarity between any two web pages, whereas SimRank cannot in some cases. We give some intuitions to the PageSim model, and outline the model with mathematical definitions. Finally, we give an example to illustrate its effectiveness.
45 citations
••
01 May 2017TL;DR: A random walk based indexing scheme to compute SimRank efficiently and accurately over large dynamic graphs is proposed and it is shown that the algorithm outperforms the state-of-the-art static and dynamic SimRank algorithms.
Abstract: Similarity among entities in graphs plays a key role in data analysis and mining. SimRank is a widely used and popular measurement to evaluate the similarity among the vertices. In real-life applications, graphs do not only grow in size, requiring fast and precise SimRank computation for large graphs, but also change and evolve continuously over time, demanding an efficient maintenance process to handle dynamic updates. In this paper, we propose a random walk based indexing scheme to compute SimRank efficiently and accurately over large dynamic graphs. We show that our algorithm outperforms the state-of-the-art static and dynamic SimRank algorithms.
43 citations