Showing papers on "SimRank published in 2009"

PDF

Open Access

Proceedings Article•DOI•

P-Rank: a comprehensive structural similarity measure over information networks

[...]

Peixiang Zhao¹, Jiawei Han¹, Yizhou Sun¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

02 Nov 2009

TL;DR: A new similarity measure, P-Rank (Penetrating Rank), toward effectively computing the structural similarities of entities in real information networks and a fixed point algorithm to reinforce structural similarity of vertex pairs beyond the localized neighborhood scope toward the entire information network is proposed.

...read moreread less

Abstract: With the ubiquity of information networks and their broad applications, the issue of similarity computation between entities of an information network arises and draws extensive research interests. However, to effectively and comprehensively measure "how similar two entities are within an information network" is nontrivial, and the problem becomes even more challenging when the information network to be examined is massive and diverse. In this paper, we propose a new similarity measure, P-Rank (Penetrating Rank), toward effectively computing the structural similarities of entities in real information networks. P-Rank enriches the well-known similarity measure, SimRank, by jointly encoding both in- and out-link relationships into structural similarity computation. P-Rank is proven to be a unified structural similarity framework, under which all state-of-the-art similarity measures, including CoCitation, Coupling, Amsler and SimRank, are just its special cases. Based on its recursive nature of P-Rank, we propose a fixed point algorithm to reinforce structural similarity of vertex pairs beyond the localized neighborhood scope toward the entire information network. Our experimental studies demonstrate the power of P-Rank as an effective similarity measure in different information networks. Meanwhile, under the same time/space complexity, P-Rank outperforms SimRank as a comprehensive and more meaningful structural similarity measure, especially in large real information networks.

...read moreread less

224 citations

Proceedings Article•DOI•

MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching

[...]

Zhenjiang Lin¹, Michael R. Lyu¹, Irwin King¹•Institutions (1)

The Chinese University of Hong Kong¹

02 Nov 2009

TL;DR: A novel neighbor-based similarity measure called MatchSim, which uses only the neighborhood structure of web pages, which successfully overcomes a severe counterintuitive loophole in SimRank, due to its strict consistency with the intuitions of similarity.

...read moreread less

Abstract: The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neighbor-based similarity measure called MatchSim, which uses only the neighborhood structure of web pages. Technically, MatchSim recursively defines similarity between web pages by the average similarity of the maximum matching between their neighbors. Our method extends the traditional methods which simply count the numbers of common and/or different neighbors. It also successfully overcomes a severe counterintuitive loophole in SimRank, due to its strict consistency with the intuitions of similarity. We give the computational complexity of MatchSim iteration. The accuracy of MatchSim is compared against others on two real datasets. The results show that our method performs best in most cases.

...read moreread less

51 citations

Proceedings Article•DOI•

Efficient Algorithm for Computing Link-Based Similarity in Real World Networks

[...]

Yuanzhe Cai, Gao Cong¹, Xu Jia, Hongyan Liu², Jun He, Jiaheng Lu, Xiaoyong Du - Show less +3 more•Institutions (2)

Aalborg University¹, Tsinghua University²

06 Dec 2009

TL;DR: This paper proposes a new approximate algorithm, namely Power-SimRank, with guaranteed error bound to efficiently compute link-based similarity measure, and proves the convergence of the proposed algorithm.

...read moreread less

Abstract: Similarity calculation has many applications, such as information retrieval, and collaborative filtering, among many others. It has been shown that link-based similarity measure, such as SimRank, is very effective in characterizing the object similarities in networks, such as the Web, by exploiting the object-to-object relationship. Unfortunately, it is prohibitively expensive to compute the link-based similarity in a relatively large graph. In this paper, based on the observation that link-based similarity scores of real world graphs follow the power-law distribution, we propose a new approximate algorithm, namely Power-SimRank, with guaranteed error bound to efficiently compute link-based similarity measure. We also prove the convergence of the proposed algorithm. Extensive experiments conducted on real world datasets and synthetic datasets show that the proposed algorithm outperforms SimRank by four-five times in terms of efficiency while the error generated by the approximation is small.

...read moreread less

22 citations

Proceedings Article•DOI•

A Graph-Theoretic Algorithm for Automatic Extension of Translation Lexicons

[...]

Beate Dorow¹, Florian Laws¹, Lukas Michelbacher¹, Christian Scheible¹, Jason Utt¹ - Show less +1 more•Institutions (1)

University of Stuttgart¹

31 Mar 2009

TL;DR: A graph-theoretic approach to the identification of yet-unknown word translations based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words.

...read moreread less

Abstract: This paper presents a graph-theoretic approach to the identification of yet-unknown word translations. The proposed algorithm is based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge labels and multiple graphs.

...read moreread less

14 citations

Book Chapter•DOI•

[...]

Xu Jia¹, Yuanzhe Cai¹, Hongyan Liu², Jun He¹, Xiaoyong Du¹ - Show less +1 more•Institutions (2)

Renmin University of China¹, Tsinghua University²

14 Aug 2009

TL;DR: This paper proposes a novel algorithm called SW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores after first several iterations and shows the efficiency of this approach on web datasets.

...read moreread less

Abstract: SimRank is a well-known algorithm for similarity calculation based on link analysis. However, it suffers from high computational cost. It has been shown that the world web graph is a "small world graph". In this paper, we observe that for this kind of small world graph, node pairs whose similarity scores are zero after first several iterations will remain zero in the final output. Based on this observation, we proposed a novel algorithm calledSW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores. Our experimental results on web datasets showed the efficiency of our approach. The larger the proportion of unreachable pairs is in the relationship graph, the more improvement the SW-SimRank algorithm will achieve. In addition, SW-SimRank can be integrated with other SimRank acceleration methods.

...read moreread less

10 citations

Proceedings Article•DOI•

Mining Personal Image Collection for Social Group Suggestion

[...]

Jie Yu¹, Xin Jin², Jiawei Han², Jiebo Luo¹•Institutions (2)

University of Rochester¹, University of Illinois at Urbana–Champaign²

06 Dec 2009

TL;DR: A new approach to cluster popular groups into categories by analyzing the similarity of groups via SimRank is designed, and both visual content and its annotations are integrated to understand the events or topics depicted in the images.

...read moreread less

Abstract: Popular photo-sharing sites have attracted millions of people and helped construct massive social networks in cyberspace. Different from traditional social relationship, users actively interact within groups where common interests are shared on certain types of events or topics captured by photos and videos. Contributing images to a group would greatly promote the interactions between users and expand their social networks. In this work, we intend to produce accurate predictions of suitable photo-sharing groups from a user's images by mining images both on the Web and in the user’s personal collection. To this end, we designed a new approach to cluster popular groups into categories by analyzing the similarity of groups via SimRank. Both visual content and its annotations are integrated to understand the events or topics depicted in the images. Experiments on real user images demonstrate the feasibility of the proposed approach.

...read moreread less

10 citations

Book Chapter•DOI•

An Adaptive Method for the Efficient Similarity Calculation

[...]

Yuanzhe Cai¹, Hongyan Liu², Jun He¹, Xiaoyong Du¹, Xu Jia¹ - Show less +1 more•Institutions (2)

Renmin University of China¹, Tsinghua University²

16 Mar 2009

TL;DR: This paper finds that the convergence behavior of different object pairs is different when the authors use SimRank to compute the similarity of objects, and proposes an adaptive method called Adaptive-SimRank to speed up similarity calculation.

...read moreread less

Abstract: SimRank is a well-known algorithm for similarity calculation based on object-to-object relationship. However, it suffers from high computation cost. In this paper, we find that the convergence behavior of different object pairs is different when we use SimRank to compute the similarity of objects. Many similarity scores converge fast, while others need more time before convergence. Based on this observation, we propose an adaptive method called Adaptive-SimRank to speed up similarity calculation. Using this method, we don't need to recalculate those converged pairs' similarity. The experiments conducted on web datasets and synthetic dataset show that our new method can reduce the running time by nearly 35%.

...read moreread less

6 citations

Journal Article•

S-SimRank:Combining Content and Link Information to Cluster Papers Effectively and Efficiently

[...]

He Jun

01 Jan 2009-Journal of Frontiers of Computer Science and Technology

TL;DR: The experimental results for the ACM data set show that S-SimRank outperforms other algorithms and the mathematic prove for the convergence of S- SimRank is given.

...read moreread less

Abstract: Content analysis and link analysis among documents are two common methods in recommending systemCompared with content analysis,link analysis can discover more implicit relationship between documentsAt the same time,because of the noise,these methods can't gain precise resultTo solve this problem,a new algorithm,S-SimRank(Star-SimRank),is proposed to effectively combine content analysis and link analysis to improve the accuracy of similarity calculationThe experimental results for the ACM data set show that S-SimRank outperforms other algorithmsIn the end,the mathematic prove for the convergence of S-SimRank is given

...read moreread less

5 citations