scispace - formally typeset
Search or ask a question
Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A Monte Carlo based method to enable the fast top-to-bottom SimRank computation over large undirected graphs, which outperforms the state-of-the-art methods by orders of magnitude and is extended to existing distributed graph processing frameworks to improve its scalability.
Abstract: SimRank is an important measure of vertex-pair similarity according to the structure of graphs. Although progress has been achieved, existing methods still face challenges to handle large graphs. Besides huge index construction and maintenance cost, existing methods may require considerable search space and time overheads in the online SimRank query. In this paper, we design a Monte Carlo based method, UniWalk, to enable the fast top- $k$ SimRank computation over large undirected graphs. UniWalk directly locates the top- $k$ similar vertices for any single source vertex $u$ via $R$ sampling paths originating from $u$ , which avoids selecting candidate vertex set $\mathcal{C}$ and the following $O(|\mathcal{C}|R)$ bidirectional sampling paths. We also devise a path enumeration strategy to improve the SimRank precision by using path probabilities instead of path frequencies when sampling, a space-efficient method to reduce intermediate results, and a path-sharing strategy to lower the redundant path sampling cost for multiple source vertices. Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability. We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude.

14 citations

Proceedings ArticleDOI
31 Mar 2009
TL;DR: A graph-theoretic approach to the identification of yet-unknown word translations based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words.
Abstract: This paper presents a graph-theoretic approach to the identification of yet-unknown word translations. The proposed algorithm is based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge labels and multiple graphs.

14 citations

Journal ArticleDOI
TL;DR: CiteRank, a combination of a similarity ranking with a static ranking, implies that CiteRank can improve the effectiveness of research paper searching on social bookmarking websites.
Abstract: Search engines and social bookmarking systems are important tools for web resource discovery. The performance and capabilities of web search engines are vital. This paper proposes CiteRank, a combination of a similarity ranking with a static ranking. Similarity ranking measures the match between a query and a research paper index; while a static ranking, or a query independent ranking, measures the quality of a research paper. For this particular study, a group of factors containing: number of groups contained the posted paper, year of publication, research paper posted time, and priority of a research paper was used to determine a static ranking score. The NDCG was used as an evaluation metric. CiteRank was compared with SimRank and StaticRank. The results of the experiment showed that CiteRank produces a better ranking than the other methods. This implies that CiteRank can improve the effectiveness of research paper searching on social bookmarking websites.

13 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel method for attributed graph partitioning based on fuzzy clustering that devises a unified similarity measure using SimRank to construct the fuzzy similarity matrix of the attributed graph and deduces the corresponding fuzzy equivalent matrix using fuzzy set theory.
Abstract: Graph partitioning methods in data mining have been widely used to discover protein complexes in protein–protein interaction (PPI) network. However, PPI networks with attributes need more effective attribute graph partitioning methods. Attribute graph partitioning aims to obtain high quality partitions satisfying the requirement: nodes in the same partition not only connect to each other more densely but also share more similar attribute values. In this paper, we propose a novel method for attributed graph partitioning based on fuzzy clustering. This method firstly devises a unified similarity measure using SimRank to construct the fuzzy similarity matrix of the attributed graph and can integrate structural and attribute similarities of nodes into a flexible weighted framework. Then it deduces the corresponding fuzzy equivalent matrix using fuzzy set theory. Finally, the result of partitioning can be obtained using fuzzy clustering algorithm. We conduct some experiments on several typical attributed graphs, which can also simulate PPI networks with attributes. The results show that our method is very effective to identify high quality partitions of attributed graphs and even performs better than some representative methods.

13 citations

Journal ArticleDOI
01 Feb 2018
TL;DR: The efficient dynamical computation of all-pairs SimRanks on time-varying graphs is studied and it is shown that the SimRank update in response to every link update is expressible as a rank-one Sylvester matrix equation.
Abstract: SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838---849, 2015) and READS (Zhang et al. in PVLDB 10(5):601---612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.'s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding $$O({r}^{4}n^2)$$O(r4n2) time and $$O({r}^{2}n^2)$$O(r2n2) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update $${\varvec{\Delta }}{} \mathbf{S}$$ΔS in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring $$O(Kn^2)$$O(Kn2) time and $$O(n^2)$$O(n2) memory in the worst case to update $$n^2$$n2 pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the "affected areas" of $${\varvec{\Delta }}{} \mathbf{S}$$ΔS to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to $$O(K(m+|{\textsf {AFF}}|))$$O(K(m+|AFF|)), where m is the number of edges in the old graph, and $$|{\textsf {AFF}}| \ (\le n^2)$$|AFF|(≤n2) is the size of "affected areas" in $${\varvec{\Delta }}{} \mathbf{S}$$ΔS, and in practice, $$|{\textsf {AFF}}| \ll n^2$$|AFF|źn2. (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles "similar sink edges" simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just $$O(Kn+m)$$O(Kn+m) memory, without the need to store all $$(n^2)$$(n2) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.

12 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
77% related
Ontology (information science)
57K papers, 869.1K citations
77% related
Scalability
50.9K papers, 931.6K citations
74% related
Tree (data structure)
44.9K papers, 749.6K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202115
202026
201916
201817
201719
201616