scispace - formally typeset
Search or ask a question
Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2006
TL;DR: This paper takes advantage of the power law distribution of links, and develops a hierarchical structure called SimTree to represent similarities in multi-granularity manner, to compute similarities between objects by avoiding pairwise similarity computations through merging computations that go through the same branches in the SimTree.
Abstract: Data objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich semantic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. However, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects.In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hierarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through merging computations that go through the same branches in the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.

124 citations

Proceedings ArticleDOI
25 Jul 2010
TL;DR: This paper exploits the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs and proposes to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs.
Abstract: Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a first-order Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the link-updating problem but also the node-updating problem. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.

113 citations

Proceedings ArticleDOI
21 Aug 2011
TL;DR: RoleSim as mentioned in this paper is a role similarity metric which satisfies axioms and which can be computed with a simple iterative algorithm, and rigorously prove that RoleSim satisfies all the axiomatic properties and demonstrate its superior interpretative power on both synthetic and real datasets.
Abstract: A key task in analyzing social networks and other complex networks is role analysis: describing and categorizing nodes by how they interact with other nodes. Two nodes have the same role if they interact with equivalent sets of neighbors. The most fundamental role equivalence is automorphic equivalence. Unfortunately, the fastest algorithm known for graph automorphism is nonpolynomial. Moreover, since exact equivalence is rare, a more meaningful task is measuring the role similarity between any two nodes. This task is closely related to the link-based similarity problem that SimRank addresses. However, SimRank and other existing simliarity measures are not sufficient because they do not guarantee to recognize automorphically or structurally equivalent nodes. This paper makes two contributions. First, we present and justify several axiomatic properties necessary for a role similarity measure or metric. Second, we present RoleSim, a role similarity metric which satisfies these axioms and which can be computed with a simple iterative algorithm. We rigorously prove that RoleSim satisfies all the axiomatic properties and demonstrate its superior interpretative power on both synthetic and real datasets.

109 citations

Proceedings ArticleDOI
18 Jun 2014
TL;DR: This paper proposes a very fast and scalable SimRank-based similarity search problem, and establishes a Monte-Carlo based algorithm to compute a single pair SimRank score s(u,v), which is based on the random-walk interpretation of the linear recursive formula.
Abstract: SimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many of the above mentioned applications. While there are many algorithms proposed so far to compute SimRank, but unfortunately, none of them are scalable up to graphs of billions size. Motivated by this fact, we consider the following SimRank-based similarity search problem: given a query vertex u, find top-k vertices v with the k highest SimRank scores s(u,v) with respect to u. We propose a very fast and scalable algorithm for this similarity search problem. Our method consists of the following ingredients: (1) We first introduce a "linear" recursive formula for SimRank. This allows us to formulate a problem that we can propose a very fast algorithm. (2) We establish a Monte-Carlo based algorithm to compute a single pair SimRank score s(u,v), which is based on the random-walk interpretation of our linear recursive formula. (3) We empirically show that SimRank score s(u,v) decreases rapidly as distance d(u,v) increases. Therefore, in order to compute SimRank scores for a query vertex u for our similarity search problem, we only need to look at very "local" area. (4) We can combine two upper bounds for SimRank score s(u,v) (which can be obtained by Monte-Carlo simulation in our preprocess), together with some adaptive sample technique, to prune the similarity search procedure. This results in a much faster algorithm. Once our preprocess is done (which only takes O(n) time), our algorithm finds, given a query vertex u, top-20 similar vertices v with the 20 highest SimRank scores s(u,v) in less than a few seconds even for graphs with billions edges. To the best of our knowledge, this is the first time to scale for graphs with at least billions edges(for the single source case).

99 citations

Proceedings ArticleDOI
23 May 2006
TL;DR: This paper achieves unrestricted personalization by combining rounding and randomized sketching techniques in the dynamic programming algorithm of Jeh and Widom and shows that the algorithms use an optimal amount of space by also improving earlier asymptotic worst-case lower bounds.
Abstract: Personalized PageRank expresses link-based page quality around user selected pages. The only previous personalized PageRank algorithm that can serve on-line queries for an unrestricted choice of pages on large graphs is our Monte Carlo algorithm [WAW 2004]. In this paper we achieve unrestricted personalization by combining rounding and randomized sketching techniques in the dynamic programming algorithm of Jeh and Widom [WWW 2003]. We evaluate the precision of approximation experimentally on large scale real-world data and find significant improvement over previous results. As a key theoretical contribution we show that our algorithms use an optimal amount of space by also improving earlier asymptotic worst-case lower bounds. Our lower bounds and algorithms apply to the SimRank as well; of independent interest is the reduction of the SimRank computation to personalized PageRank.

92 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
77% related
Ontology (information science)
57K papers, 869.1K citations
77% related
Scalability
50.9K papers, 931.6K citations
74% related
Tree (data structure)
44.9K papers, 749.6K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202115
202026
201916
201817
201719
201616