Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Scaling link-based similarity search

[...]

Dániel Fogaras¹, Balázs Rácz²•Institutions (2)

Budapest University of Technology and Economics¹, Hungarian Academy of Sciences²

10 May 2005

TL;DR: The experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than single-step neighborhoods.

...read moreread less

Abstract: To exploit the similarity information hidden in the hyperlink structure of the web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multi-step neighborhoods of vertices are numerically evaluated by similarity functions including SimRank [20], a recursive refinement of cocitation; PSimRank, a novel variant with better theoretical characteristics; and the Jaccard coefficient, extended to multi-step neighborhoods. Our methods are presented in a general framework of Monte Carlo similarity search algorithms that precompute an index database of random fingerprints, and at query time, similarities are estimated from the fingerprints. The performance and quality of the methods were tested on the Stanford Webbase [19] graph of 80M pages by comparing our scores to similarities extracted from the ODP directory [26]. Our experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than single-step neighborhoods.

...read moreread less

201 citations

Journal Article•DOI•

Simrank++: query rewriting through link analysis of the click graph

[...]

Ioannis Antonellis¹, Hector Garcia Molina¹, Chi-Chao Chang²•Institutions (2)

Stanford University¹, Yahoo!²

01 Aug 2008

TL;DR: It is argued that Simrank fails to properly identify query similarities in the authors' application, and two enhanced versions of Simrank are presented: one that exploits weights on click graph edges and another that exploits "evidence."

...read moreread less

Abstract: We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank [7] as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced versions of Simrank: one that exploits weights on click graph edges and another that exploits "evidence." We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries from Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.

...read moreread less

188 citations

Proceedings Article•DOI•

Fast computation of SimRank for static and dynamic information networks

[...]

Cuiping Li¹, Jiawei Han², Guoming He¹, Xin Jin², Yizhou Sun², Yintao Yu², Tianyi Wu² - Show less +3 more•Institutions (2)

Renmin University of China¹, University of Illinois at Urbana–Champaign²

22 Mar 2010

TL;DR: A family of novel approximate SimRank computation algorithms for static and dynamic information networks are developed and their corresponding theoretical justification and analysis are given.

...read moreread less

Abstract: Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between nodes in a network. SimRank is a simple and influential measure of this kind, based on a solid theoretical "random surfer" model. Existing work computes SimRank similarity scores in an iterative mode. We argue that the iterative method can be infeasible and inefficient when, as in many real-world scenarios, the networks change dynamically and frequently. We envision non-iterative method to bridge the gap. It allows users not only to update the similarity scores incrementally, but also to derive similarity scores for an arbitrary subset of nodes. To enable the non-iterative computation, we propose to rewrite the SimRank equation into a non-iterative form by using the Kronecker product and vectorization operators. Based on this, we develop a family of novel approximate SimRank computation algorithms for static and dynamic information networks, and give their corresponding theoretical justification and analysis. The non-iterative method supports efficient processing of various node analysis including similarity tracking and centrality tracking on evolving information networks. The effectiveness and efficiency of our proposed methods are evaluated on synthetic and real data sets.

...read moreread less

171 citations

Journal Article•DOI•

Accuracy estimate and optimization techniques for SimRank computation

[...]

Dmitry Lizorkin¹, Pavel Velikhov¹, Maxim Grinev¹, Denis Turdakov¹•Institutions (1)

Russian Academy of Sciences¹

01 Aug 2008

TL;DR: This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank iteratively, and introduces a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method.

...read moreread less

Abstract: The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of this kind, based on graph-theoretic model. SimRank is typically computed iteratively, in the spirit of PageRank. However, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity.In this paper we present a technique to estimate the accuracy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n4) to O(n3) in the worst case. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method.As a practical illustration of our techniques we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links.

...read moreread less

168 citations

Proceedings Article•DOI•

SimFusion: measuring similarity using unified relationship matrix

[...]

Wensi Xi¹, Edward A. Fox¹, Weiguo Fan¹, Benyu Zhang², Zheng Chen², Jun Yan³, Dong Zhuang⁴ - Show less +3 more•Institutions (4)

Virginia Tech¹, Microsoft², Peking University³, Beijing Institute of Technology⁴

15 Aug 2005

TL;DR: It is claimed that iterative computations over the URM can help overcome the data sparseness problem and detect latent relationships among heterogeneous data objects, thus, can improve the quality of information applications that require com- bination of information from heterogeneous sources.

...read moreread less

Abstract: In this paper we use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects (e.g., web pages, queries) and their interrelationships (e.g., hyperlinks, user click-through sequences). We claim that iterative computations over the URM can help overcome the data sparseness problem and detect latent relationships among heterogeneous data objects, thus, can improve the quality of information applications that require com- bination of information from heterogeneous sources. To support our claim, we present a unified similarity-calculating algorithm, SimFusion. By iteratively computing over the URM, SimFusion can effectively integrate relationships from heterogeneous sources when measuring the similarity of two data objects. Experiments based on a web search engine query log and a web page collection demonstrate that SimFusion can improve similarity measurement of web objects over both traditional content based algorithms and the cutting edge SimRank algorithm.

...read moreread less

129 citations

Collapse

Network Information

Performance

Metrics

250

Papers

22,828

Citations

No. of papers in the topic in previous years
Year	Papers
2021	15
2020	26
2019	16
2018	17
2017	19
2016	16

SimRank

Papers published on a yearly basis

Papers

Trending Questions (4)

Network Information

Related Topics (5)

Performance

Metrics