Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SimRank*: effective and scalable pairwise similarity search based on graph topology

[...]

Weiren Yu¹, Xuemin Lin², Wenjie Zhang², Jian Pei³, Julie A. McCann⁴ - Show less +1 more•Institutions (4)

Aston University¹, University of New South Wales², Simon Fraser University³, Imperial College London⁴

01 Jun 2019

TL;DR: This paper proposes an effective and scalable similarity model, SimRank*, which can resolve the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, and empirically verify the richer semantics of SimRank, and validate its high computational efficiency and scalability on large graphs with billions of edges.

...read moreread less

Abstract: Given a graph, how can we quantify similarity between two nodes in an effective and scalable way? SimRank is an attractive measure of pairwise similarity based on graph topologies. Its underpinning philosophy that “two nodes are similar if they are pointed to (have incoming edges) from similar nodes” can be regarded as an aggregation of similarities based on incoming paths. Despite its popularity in various applications (e.g., web search and social networks), SimRank has an undesirable trait, i.e., “zero-similarity”: it accommodates only the paths of equal length from a common “center” node, whereas a large portion of other paths are fully ignored. In this paper, we propose an effective and scalable similarity model, SimRank*, to remedy this problem. (1) We first provide a sufficient and necessary condition of the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, Li et al. ’s SimRank model, Random Walk with Restart (RWR), and ASCOS++. (2) We next present our treatment, SimRank*, which can resolve this issue while inheriting the merit of the simple SimRank philosophy. (3) We reduce the series form of SimRank* to a closed form, which looks simpler than SimRank but which enriches semantics without suffering from increased computational overhead. This leads to an iterative form of SimRank*, which requires O(Knm) time and $$O(n^2)$$ memory for computing all $$(n^2)$$ pairs of similarities on a graph of n nodes and m edges for K iterations. (4) To improve the computational time of SimRank* further, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient heuristic to speed up all-pairs SimRank* computation to $$O(Kn{\tilde{m}})$$ time, where $${\tilde{m}}$$ is generally much smaller than m. (5) To scale SimRank* on billion-edge graphs, we propose two memory-efficient single-source algorithms, i.e., ss-gSR* for geometric SimRank*, and ss-eSR* for exponential SimRank*, which can retrieve similarities between all n nodes and a given query on an as-needed basis. This significantly reduces the $$O(n^2)$$ memory of all-pairs search to either $$O(Kn + {\tilde{m}})$$ for geometric SimRank*, or $$O(n + {\tilde{m}})$$ for exponential SimRank*, without any loss of accuracy, where $${\tilde{m}} \ll n^2$$ . (6) We also compare SimRank* with another remedy of SimRank that adds self-loops on each node and demonstrate that SimRank* is more effective. (7) Using real and synthetic datasets, we empirically verify the richer semantics of SimRank*, and validate its high computational efficiency and scalability on large graphs with billions of edges.

...read moreread less

31 citations

Journal Article•DOI•

Simrank: Rapid and sensitive general-purpose k-mer search tool

[...]

Todd Z. DeSantis¹, Keith Keller¹, Ulas Karaoz¹, Alexander V. Alekseyenko², Navjeet Singh¹, Eoin L. Brodie¹, Zhiheng Pei², Gary L. Andersen¹, Niels Larsen³ - Show less +5 more•Institutions (3)

Lawrence Berkeley National Laboratory¹, New York University², Aarhus University³

27 Apr 2011-BMC Ecology

TL;DR: Simrank as discussed by the authors is a stand-alone k-mer tool that allows users to identify database strings the most similar to query strings, which can be used for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration.

...read moreread less

Abstract: Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp . Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

...read moreread less

31 citations

Proceedings Article•DOI•

Simrank++: query rewriting through link analysis of the clickgraph (poster)

[...]

Ioannis Antonellis¹, Hector Garcia-Molina¹, Chi-Chao Chang²•Institutions (2)

Stanford University¹, Yahoo!²

21 Apr 2008

TL;DR: It is argued that Simrank fails to properly identify query similarities in this application, and two enhanced versions of Simrank are presented: one that exploits weights on click graph edges and another that exploits evidence.

...read moreread less

Abstract: We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank [2] as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced versions of Simrank: one that exploits weights on click graph edges and another that exploits evidence." We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries form Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.

...read moreread less

31 citations

Journal Article•DOI•

Scalable and axiomatic ranking of network role similarity

[...]

Ruoming Jin¹, Victor E. Lee², Longjie Li³•Institutions (3)

Kent State University¹, John Carroll University², Lanzhou University³

01 Feb 2014-ACM Transactions on Knowledge Discovery From Data

TL;DR: RoleSim is presented, a new similarity metric that satisfies several axiomatic properties necessary for a role similarity measure or metric that can be computed with a simple iterative algorithm and demonstrated the interpretative power of RoleSim on both both synthetic and real datasets.

...read moreread less

Abstract: A key task in analyzing social networks and other complex networks is role analysis: describing and categorizing nodes according to how they interact with other nodes. Two nodes have the same role if they interact with equivalent sets of neighbors. The most fundamental role equivalence is automorphic equivalence. Unfortunately, the fastest algorithms known for graph automorphism are nonpolynomial. Moreover, since exact equivalence is rare, a more meaningful task is measuring the role similarity between any two nodes. This task is closely related to the structural or link-based similarity problem that SimRank addresses. However, SimRank and other existing similarity measures are not sufficient because they do not guarantee to recognize automorphically or structurally equivalent nodes. This article makes two contributions. First, we present and justify several axiomatic properties necessary for a role similarity measure or metric. Second, we present RoleSim, a new similarity metric that satisfies these axioms and can be computed with a simple iterative algorithm. We rigorously prove that RoleSim satisfies all of these axiomatic properties. We also introduce Iceberg RoleSim, a scalable algorithm that discovers all pairs with RoleSim scores above a user-defined threshold θ. We demonstrate the interpretative power of RoleSim on both both synthetic and real datasets.

...read moreread less

29 citations

Journal Article•DOI•

Efficient top-k simrank-based similarity join

[...]

Wenbo Tao¹, Minghe Yu¹, Guoliang Li¹•Institutions (1)

Tsinghua University¹

01 Nov 2014

TL;DR: This paper encodes each node as a vector by summarizing its neighbors and transform the calculation of the SimRank similarity between two nodes to computing the dot product between the corresponding vectors, and devise an efficient two-step framework to compute top-k similar pairs using the vectors.

...read moreread less

Abstract: SimRank is a popular and widely-adopted similarity measure to evaluate the similarity between nodes in a graph. It is time and space consuming to compute the SimRank similarities for all pairs of nodes, especially for large graphs. In real-world applications, users are only interested in the most similar pairs. To address this problem, in this paper we study the top-k SimRank-based similarity join problem, which finds k most similar pairs of nodes with the largest SimRank similarities among all possible pairs. To the best of our knowledge, this is the first attempt to address this problem. We encode each node as a vector by summarizing its neighbors and transform the calculation of the SimRank similarity between two nodes to computing the dot product between the corresponding vectors. We devise an efficient two-step framework to compute top-k similar pairs using the vectors. For large graphs, exact algorithms cannot meet the high-performance requirement, and we also devise an approximate algorithm which can efficiently identify top-k similar pairs under user-specified accuracy requirement. Experiments on both real and synthetic datasets show our method achieves high performance and good scalability.

...read moreread less

29 citations

Collapse

Network Information

Performance

Metrics

250

Papers

22,828

Citations

No. of papers in the topic in previous years
Year	Papers
2021	15
2020	26
2019	16
2018	17
2017	19
2016	16

SimRank

Papers published on a yearly basis

Papers

Trending Questions (4)

Network Information

Related Topics (5)

Performance

Metrics