scispace - formally typeset
Search or ask a question

Showing papers on "SimRank published in 2009"


Proceedings ArticleDOI
02 Nov 2009
TL;DR: A new similarity measure, P-Rank (Penetrating Rank), toward effectively computing the structural similarities of entities in real information networks and a fixed point algorithm to reinforce structural similarity of vertex pairs beyond the localized neighborhood scope toward the entire information network is proposed.
Abstract: With the ubiquity of information networks and their broad applications, the issue of similarity computation between entities of an information network arises and draws extensive research interests. However, to effectively and comprehensively measure "how similar two entities are within an information network" is nontrivial, and the problem becomes even more challenging when the information network to be examined is massive and diverse. In this paper, we propose a new similarity measure, P-Rank (Penetrating Rank), toward effectively computing the structural similarities of entities in real information networks. P-Rank enriches the well-known similarity measure, SimRank, by jointly encoding both in- and out-link relationships into structural similarity computation. P-Rank is proven to be a unified structural similarity framework, under which all state-of-the-art similarity measures, including CoCitation, Coupling, Amsler and SimRank, are just its special cases. Based on its recursive nature of P-Rank, we propose a fixed point algorithm to reinforce structural similarity of vertex pairs beyond the localized neighborhood scope toward the entire information network. Our experimental studies demonstrate the power of P-Rank as an effective similarity measure in different information networks. Meanwhile, under the same time/space complexity, P-Rank outperforms SimRank as a comprehensive and more meaningful structural similarity measure, especially in large real information networks.

224 citations


Proceedings ArticleDOI
02 Nov 2009
TL;DR: A novel neighbor-based similarity measure called MatchSim, which uses only the neighborhood structure of web pages, which successfully overcomes a severe counterintuitive loophole in SimRank, due to its strict consistency with the intuitions of similarity.
Abstract: The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neighbor-based similarity measure called MatchSim, which uses only the neighborhood structure of web pages. Technically, MatchSim recursively defines similarity between web pages by the average similarity of the maximum matching between their neighbors. Our method extends the traditional methods which simply count the numbers of common and/or different neighbors. It also successfully overcomes a severe counterintuitive loophole in SimRank, due to its strict consistency with the intuitions of similarity. We give the computational complexity of MatchSim iteration. The accuracy of MatchSim is compared against others on two real datasets. The results show that our method performs best in most cases.

51 citations


Proceedings ArticleDOI
06 Dec 2009
TL;DR: This paper proposes a new approximate algorithm, namely Power-SimRank, with guaranteed error bound to efficiently compute link-based similarity measure, and proves the convergence of the proposed algorithm.
Abstract: Similarity calculation has many applications, such as information retrieval, and collaborative filtering, among many others. It has been shown that link-based similarity measure, such as SimRank, is very effective in characterizing the object similarities in networks, such as the Web, by exploiting the object-to-object relationship. Unfortunately, it is prohibitively expensive to compute the link-based similarity in a relatively large graph. In this paper, based on the observation that link-based similarity scores of real world graphs follow the power-law distribution, we propose a new approximate algorithm, namely Power-SimRank, with guaranteed error bound to efficiently compute link-based similarity measure. We also prove the convergence of the proposed algorithm. Extensive experiments conducted on real world datasets and synthetic datasets show that the proposed algorithm outperforms SimRank by four-five times in terms of efficiency while the error generated by the approximation is small.

22 citations


Proceedings ArticleDOI
31 Mar 2009
TL;DR: A graph-theoretic approach to the identification of yet-unknown word translations based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words.
Abstract: This paper presents a graph-theoretic approach to the identification of yet-unknown word translations. The proposed algorithm is based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge labels and multiple graphs.

14 citations


Book ChapterDOI
14 Aug 2009
TL;DR: This paper proposes a novel algorithm called SW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores after first several iterations and shows the efficiency of this approach on web datasets.
Abstract: SimRank is a well-known algorithm for similarity calculation based on link analysis. However, it suffers from high computational cost. It has been shown that the world web graph is a "small world graph". In this paper, we observe that for this kind of small world graph, node pairs whose similarity scores are zero after first several iterations will remain zero in the final output. Based on this observation, we proposed a novel algorithm calledSW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores. Our experimental results on web datasets showed the efficiency of our approach. The larger the proportion of unreachable pairs is in the relationship graph, the more improvement the SW-SimRank algorithm will achieve. In addition, SW-SimRank can be integrated with other SimRank acceleration methods.

10 citations


Proceedings ArticleDOI
06 Dec 2009
TL;DR: A new approach to cluster popular groups into categories by analyzing the similarity of groups via SimRank is designed, and both visual content and its annotations are integrated to understand the events or topics depicted in the images.
Abstract: Popular photo-sharing sites have attracted millions of people and helped construct massive social networks in cyberspace. Different from traditional social relationship, users actively interact within groups where common interests are shared on certain types of events or topics captured by photos and videos. Contributing images to a group would greatly promote the interactions between users and expand their social networks. In this work, we intend to produce accurate predictions of suitable photo-sharing groups from a user's images by mining images both on the Web and in the user’s personal collection. To this end, we designed a new approach to cluster popular groups into categories by analyzing the similarity of groups via SimRank. Both visual content and its annotations are integrated to understand the events or topics depicted in the images. Experiments on real user images demonstrate the feasibility of the proposed approach.

10 citations


Book ChapterDOI
16 Mar 2009
TL;DR: This paper finds that the convergence behavior of different object pairs is different when the authors use SimRank to compute the similarity of objects, and proposes an adaptive method called Adaptive-SimRank to speed up similarity calculation.
Abstract: SimRank is a well-known algorithm for similarity calculation based on object-to-object relationship. However, it suffers from high computation cost. In this paper, we find that the convergence behavior of different object pairs is different when we use SimRank to compute the similarity of objects. Many similarity scores converge fast, while others need more time before convergence. Based on this observation, we propose an adaptive method called Adaptive-SimRank to speed up similarity calculation. Using this method, we don't need to recalculate those converged pairs' similarity. The experiments conducted on web datasets and synthetic dataset show that our new method can reduce the running time by nearly 35%.

6 citations


Journal Article
TL;DR: The experimental results for the ACM data set show that S-SimRank outperforms other algorithms and the mathematic prove for the convergence of S- SimRank is given.
Abstract: Content analysis and link analysis among documents are two common methods in recommending systemCompared with content analysis,link analysis can discover more implicit relationship between documentsAt the same time,because of the noise,these methods can't gain precise resultTo solve this problem,a new algorithm,S-SimRank(Star-SimRank),is proposed to effectively combine content analysis and link analysis to improve the accuracy of similarity calculationThe experimental results for the ACM data set show that S-SimRank outperforms other algorithmsIn the end,the mathematic prove for the convergence of S-SimRank is given

5 citations