scispace - formally typeset
Search or ask a question
Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.


Papers
More filters
Book ChapterDOI
14 Aug 2009
TL;DR: This paper proposes a novel algorithm called SW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores after first several iterations and shows the efficiency of this approach on web datasets.
Abstract: SimRank is a well-known algorithm for similarity calculation based on link analysis. However, it suffers from high computational cost. It has been shown that the world web graph is a "small world graph". In this paper, we observe that for this kind of small world graph, node pairs whose similarity scores are zero after first several iterations will remain zero in the final output. Based on this observation, we proposed a novel algorithm calledSW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores. Our experimental results on web datasets showed the efficiency of our approach. The larger the proportion of unreachable pairs is in the relationship graph, the more improvement the SW-SimRank algorithm will achieve. In addition, SW-SimRank can be integrated with other SimRank acceleration methods.

10 citations

Journal ArticleDOI
01 Mar 2020
TL;DR: SimPush as discussed by the authors is a single-source SimRank algorithm that uses a small number of nodes relevant to the query, and performs residue push from these nodes only to compute the statistics of the query.
Abstract: Given a graph G and a node u ∈ G, a single source SimRank query evaluates the similarity between u and every node v ∈ G. Existing approaches to single source SimRank computation incur either long query response time, or expensive pre-computation, which needs to be performed again whenever the graph G changes. Consequently, to our knowledge none of them is ideal for scenarios in which (i) query processing must be done in realtime, and (ii) the underlying graph G is massive, with frequent updates.Motivated by this, we propose SimPush, a novel algorithm that answers single source SimRank queries without any pre-computation, and achieves significantly higher query speed than even the fastest known index-based solutions. Further, SimPush provides rigorous result quality guarantees, and its high performance does not rely on any strong assumption of the graph. Specifically, compared to existing methods, SimPush employs a radically different algorithmic design that focuses on (i) identifying a small number of nodes relevant to the query, and subsequently (ii) computing statistics and performing residue push from these nodes only.We prove the correctness of SimPush, analyze its time complexity, and compare its asymptotic performance with that of existing methods. Meanwhile, we evaluate the practical performance of SimPush through extensive experiments on 9 real datasets. The results demonstrate that SimPush consistently outperforms all existing solutions, often by over an order of magnitude. In particular, on a commodity machine, SimPush answers a single source SimRank query on a web graph containing over 133 million nodes and 5.4 billion edges in under 62 milliseconds, with 0.00035 empirical error, while the fastest index-based competitor needs 1.18 seconds.

10 citations

Proceedings ArticleDOI
06 Dec 2009
TL;DR: A new approach to cluster popular groups into categories by analyzing the similarity of groups via SimRank is designed, and both visual content and its annotations are integrated to understand the events or topics depicted in the images.
Abstract: Popular photo-sharing sites have attracted millions of people and helped construct massive social networks in cyberspace. Different from traditional social relationship, users actively interact within groups where common interests are shared on certain types of events or topics captured by photos and videos. Contributing images to a group would greatly promote the interactions between users and expand their social networks. In this work, we intend to produce accurate predictions of suitable photo-sharing groups from a user's images by mining images both on the Web and in the user’s personal collection. To this end, we designed a new approach to cluster popular groups into categories by analyzing the similarity of groups via SimRank. Both visual content and its annotations are integrated to understand the events or topics depicted in the images. Experiments on real user images demonstrate the feasibility of the proposed approach.

10 citations

Proceedings ArticleDOI
19 Apr 2017
TL;DR: A Monte Carlo based method, UniWalk, is designed to enable the fast top-k SimRank computation over large undirected graphs without indexing, and outperforms the state-of-the-art methods by orders of magnitude.
Abstract: SimRank is an effective structural similarity measurement between two vertices in a graph, which can be used in many applications like recommender systems Although progresses have been achieved, existing methods still face challenges to handle large graphs Besides huge index construction and maintenance cost, the existing methods require considerable search space and time overheads in the online SimRank query In this paper, we design a Monte Carlo based method, Uni-Walk, to enable the fast top-k SimRank computation over large undirected graphs without indexing UniWalk directly locates the top-k similar vertices for any single source vertex u via O(R) sampling paths originating from u only, which avoids the selection of candidate vertex set C and the following O(|C|R) bidirectional sampling paths starting from u and each candidate respectively in existing methods We also design a space-efficient method to reduce intermediate results, and a path-sharing strategy to optimize path sampling for multiple source vertices Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude, and such an improvement is achieved without any indexing overheads

10 citations

Journal ArticleDOI
TL;DR: This paper defines effective relationship strength (ERS) to distinguish link importance by utilizing node activity, node attraction and link frequency, and formalizes ESimRank equation by combining ERS and the expected meeting probabilities of any path length.

9 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
77% related
Ontology (information science)
57K papers, 869.1K citations
77% related
Scalability
50.9K papers, 931.6K citations
74% related
Tree (data structure)
44.9K papers, 749.6K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202115
202026
201916
201817
201719
201616