Top 2 papers published in the topic of SimRank in 2004

Book Chapter•DOI•

A scalable randomized method to compute link-based similarity rank on the web graph

[...]

Dániel Fogaras¹, Balázs Rácz¹•Institutions (1)

14 Mar 2004

TL;DR: In this article, the authors proposed scalable algorithms for computing SimRank scores, which express the contextual similarities of pages based on the hyperlink structure and scale well to large repositories, fulfilling strict requirements about computational complexity.

...read moreread less

Abstract: Several iterative hyperlink-based similarity measures were published to express the similarity of web pages However, it usually seems hopeless to evaluate complex similarity functions over large repositories containing hundreds of millions of pages.We introduce scalable algorithms computing SimRank scores, which express the contextual similarities of pages based on the hyperlink structure The proposed methods scale well to large repositories, fulfilling strict requirements about computational complexity The algorithms were tested on a set of ten million pages, but parallelization techniques make it possible to compute the SimRank scores even for the entire web with over 4 billion pages The key idea is that randomized Monte Carlo methods combined with indexing techniques yield a scalable approximation of SimRank.

...read moreread less

15 citations

SimFusion: A Unified Similarity Measurement Algorithm for Multi-Type Interrelated Web Objects

[...]

Wensi Xi, Benyu Zhang, Edward A. Fox

01 Jan 2004

TL;DR: By iteratively computing over the URM, the SimFusion algorithm can effectively integrate relationships from heterogeneous sources when measuring the similarity of two web objects and can significantly improve similarity measurement of web objects over both traditional content based similarity-calculating algorithms and the cutting edge SimRank algorithm.

...read moreread less

Abstract: In this paper, we use a Unified Relationship Matrix (URM) to represent a set of heterogeneous web objects (e.g., web pages, queries) and their interrelationships (e.g., hyperlink, user click-through relationships). We claim that iterative computations over the URM can help overcome the data sparseness problem (a common situation in the Web) and detect latent relationships among heterogeneous web objects, thus, can improve the quality of various information applications that require the combination of information from heterogeneous sources. To support our claim, we further propose a unified similarity-calculating algorithm, the SimFusion algorithm. By iteratively computing over the URM, the SimFusion algorithm can effectively integrate relationships from heterogeneous sources when measuring the similarity of two web objects. Experiments based on a real search engine query log and a large real web page collection demonstrate that the SimFusion algorithm can significantly improve similarity measurement of web objects over both traditional content based similarity-calculating algorithms and the cutting edge SimRank algorithm.

...read moreread less

3 citations

Showing papers on "SimRank published in 2004"