Showing papers on "SimRank published in 2012"

PDF

Open Access

Proceedings Article•DOI•

[...]

Pei Lee¹, Laks V. S. Lakshmanan¹, Jeffrey Xu Yu²•Institutions (2)

University of British Columbia¹, The Chinese University of Hong Kong²

01 Apr 2012

TL;DR: An algorithmic framework called TopSim is proposed based on transforming the top-k SimRank problem on a graph G to one of finding thetop-k nodes with highest authority on the product graph G G, which further accelerate Top Sim by merging similarity paths and develop a more efficient algorithm called Top Sim-SM.

...read moreread less

Abstract: Search for objects similar to a given query object in a network has numerous applications including web search and collaborative filtering. We use the notion of structural similarity to capture the commonality of two objects in a network, e.g., if two nodes are referenced by the same node, they may be similar. Meeting-based methods including SimRank and P-Rank capture structural similarity very well. Deriving inspiration from PageRank, SimRank has gained popularity by a natural intuition and domain independence. Since it's computationally expensive, subsequent work has focused on optimizing and approximating the computation of SimRank. In this paper, we approach SimRank from a top-k querying perspective where given a query node v, we are interested in finding the top-k nodes that have the highest SimRank score w.r.t. v. The only known approaches for answering such queries are either a naive algorithm of computing the similarity matrix for all node pairs or computing the similarity vector by comparing the query node v with each other node independently, and then picking the top-k. None of these approaches can handle top-k structural similarity search efficiently by scaling to very large graphs consisting of millions of nodes. We propose an algorithmic framework called TopSim based on transforming the top-k SimRank problem on a graph G to one of finding the top-k nodes with highest authority on the product graph G G. We further accelerate Top Sim by merging similarity paths and develop a more efficient algorithm called Top Sim-SM. Two heuristic algorithms, Trun-Top Sim-SM and Prio-Top Sim-SM, are also proposed to approximate Top Sim-SM on scale-free graphs to trade accuracy for speed, based on truncated random walk and prioritizing propagation respectively. We analyze the accuracy and performance of Top Sim family algorithms and report the results of a detailed experimental study.

...read moreread less

80 citations

Journal Article•DOI•

MatchSim: a novel similarity measure based on maximum neighborhood matching

[...]

Zhenjiang Lin¹, Michael R. Lyu¹, Irwin King¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Jul 2012-Knowledge and Information Systems

TL;DR: It is shown that MatchSim conforms to the basic intuition of similarity; therefore, it can overcome the counterintuitive contradiction in SimRank and be viewed as an extension of the traditional neighbor-counting scheme by taking the similarities between neighbors into account, leading to higher flexibility.

...read moreread less

Abstract: Measuring object similarity in a graph is a fundamental data- mining problem in various application domains, including Web linkage mining, social network analysis, information retrieval, and recommender systems. In this paper, we focus on the neighbor-based approach that is based on the intuition that “similar objects have similar neighbors” and propose a novel similarity measure called MatchSim. Our method recursively defines the similarity between two objects by the average similarity of the maximum-matched similar neighbor pairs between them. We show that MatchSim conforms to the basic intuition of similarity; therefore, it can overcome the counterintuitive contradiction in SimRank. Moreover, MatchSim can be viewed as an extension of the traditional neighbor-counting scheme by taking the similarities between neighbors into account, leading to higher flexibility. We present the MatchSim score computation process and prove its convergence. We also analyze its time and space complexity and suggest two accelerating techniques: (1) proposing a simple pruning strategy and (2) adopting an approximation algorithm for maximum matching computation. Experimental results on real-world datasets show that although our method is less efficient computationally, it outperforms classic methods in terms of accuracy.

...read moreread less

58 citations

Journal Article•DOI•

A space and time efficient algorithm for SimRank computation

[...]

Weiren Yu¹, Wenjie Zhang¹, Xuemin Lin¹, Qing Zhang², Jiajin Le³ - Show less +1 more•Institutions (3)

University of New South Wales¹, Commonwealth Scientific and Industrial Research Organisation², Donghua University³

01 May 2012-World Wide Web

TL;DR: Novel optimization techniques such that each iteration takes time and space, and a reordering technique combined with an over-relaxation method is developed, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well.

...read moreread less

Abstract: SimRank has become an important similarity measure to rank web documents based on a graph model on hyperlinks. The existing approaches for conducting SimRank computation adopt an iteration paradigm. The most efficient deterministic technique yields $O\left(n^3\right)$ worst-case time per iteration with the space requirement $O\left(n^2\right)$ , where n is the number of nodes (web documents). In this paper, we propose novel optimization techniques such that each iteration takes $O \left(\min \left\{ n \cdot m , n^r \right\}\right)$ time and $O \left( n + m \right)$ space, where m is the number of edges in a web-graph model and r???log2 7. In addition, we extend the similarity transition matrix to prevent random surfers getting stuck, and devise a pruning technique to eliminate impractical similarities for each iteration. Moreover, we also develop a reordering technique combined with an over-relaxation method, not only speeding up the convergence rate of the existing techniques, but achieving I/O efficiency as well. We conduct extensive experiments on both synthetic and real data sets to demonstrate the efficiency and effectiveness of our iteration techniques.

...read moreread less

58 citations

Proceedings Article•DOI•

Delta-SimRank computing on MapReduce

[...]

Liangliang Cao¹, Brian Cho², Hyun Duk Kim², Zhen Li², Min-Hsuan Tsai², Indranil Gupta² - Show less +2 more•Institutions (2)

IBM¹, University of Illinois at Urbana–Champaign²

12 Aug 2012

TL;DR: The proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm, can effectively reduce the computational cost and can also benefit the applications with non-static network structures.

...read moreread less

Abstract: Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed SimRank algorithm, and leads to up to 30 times speed-up in the best case1.

...read moreread less

22 citations

Proceedings Article•DOI•

E-rank: A Structural-Based Similarity Measure in Social Networks

[...]

Mingxi Zhang¹, Zhenying He¹, Hao Hu¹, Wei Wang¹•Institutions (1)

Fudan University¹

04 Dec 2012

TL;DR: This paper proposes a novel structural similarity measure, E-Rank (Entity Rank), towards effectively computing the structural similarity of entities in SNs, based on the intuition that two entities are similar if they can arrive at common entities.

...read moreread less

Abstract: With the social networks (SNs) becoming ubiquitous and massive, the issue of similarity computation among entities becomes more challenging and draws extensive interests from various research fields. SimRank is a well known similarity measure, however it considers only the meetings between two nodes that walk along equal length paths since the path length increases strictly with the iteration increasing during the similarity computation, besides, it does not differentiate importance for each link. In this paper, we propose a novel structural similarity measure, E-Rank (Entity Rank), towards effectively computing the structural similarity of entities in SNs, based on the intuition that two entities are similar if they can arrive at common entities. E-Rank can be well applied to social networks for measuring similarities of entities. Extensive experiments demonstrate the effectiveness of E-Rank by comparing with the state-of-the-art measures.

...read moreread less

16 citations

Journal Article•DOI•

Using Graphics Processors for High Performance SimRank Computation

[...]

Guoming He¹, Cuiping Li¹, Hong Chen¹, Xiaoyong Du, Haijun Feng¹ - Show less +1 more•Institutions (1)

Renmin University of China¹

01 Sep 2012-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper exploits the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs and proposes the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs.

...read moreread less

Abstract: Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a first-order Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the link-updating problem but also the node-updating problem. We give the corresponding theoretical justification and analysis, propose three optimization strategies to further improve the computation efficiency, and extend the proposed algorithm to dynamic graphs. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.

...read moreread less

4 citations

Proceedings Article•DOI•

Privacy-Preserving SimRank over Distributed Information Network

[...]

Yu-Wei Chu¹, Chih-Hua Tai², Ming-Syan Chen¹, Philip S. Yu³•Institutions (3)

National Taiwan University¹, National Taipei University², University of Illinois at Chicago³

10 Dec 2012

TL;DR: This paper addresses the problem of link-based similarity measure of nodes in an information network distributed over different parties and proposes a privacy-preserving Sim Rank protocol based on fully-homomorphic encryption to provide cryptographic protection for the links.

...read moreread less

Abstract: Information network analysis has drawn a lot attention in recent years. Among all the aspects of network analysis, similarity measure of nodes has been shown useful in many applications, such as clustering, link prediction and community identification, to name a few. As linkage data in a large network is inherently sparse, it is noted that collecting more data can improve the quality of similarity measure. This gives different parties a motivation to cooperate. In this paper, we address the problem of link-based similarity measure of nodes in an information network distributed over different parties. Concerning the data privacy, we propose a privacy-preserving Sim Rank protocol based on fully-homomorphic encryption to provide cryptographic protection for the links.

...read moreread less

4 citations

Journal Article•

Synonym Recognition Based on User Behaviors in E-commerce

[...]

Guan Yi¹•Institutions (1)

Harbin Institute of Technology¹

01 Jan 2012-Journal of Chinese information processing

TL;DR: This paper presents a method to recognize synonyms based on user behaviors to deal with the considerable new words, typos, and near-synonyms in this domain using Gradient Boost Decision Tree.

...read moreread less

Abstract: Focused on the synonym recognition in e-commercethis paper presents a method to recognize synonyms based on user behaviors to deal with the considerable new words,typos,and near-synonyms in this domainFirstly,candidate synonym sets are retrieved by analyzing the titles and their corresponding queries based on SimRank theoryThen,features including literal feature,title feature,query feature,click feature are extractedFinally,Gradient Boost Decision Tree model is adopted to determine whether candidate synonyms are true or notThe experimental result shows that Gradient Boost Decision Tree(GBDT) is more suitable for this task,achieving a precision of 5652%

...read moreread less

3 citations

Journal Article•

Clustering Product Features in Opinion Mining

[...]

Lin Hong-fei¹•Institutions (1)

Dalian University of Technology¹

01 Jan 2012-Journal of Chinese information processing

TL;DR: This paper first extracts product feature expressions and sentimental words in pairs to build a bipartite graph, and then adopts the Weight Normalized SimRank to compute similarity between different feature expressions in the bipartites, and finally optimizes the Bayesian classifier in Semi-Supervised Learning via the similarity.

...read moreread less

Abstract: This paper focuses on clustering different feature expressions in product reviews into proper groups.In product reviews,the same features may have different expressions,e.g."appearance" and "design" of a mobile phone actuallyindicate the same feature.Considering the fact that different expressions are always used with same sentimental words in a sentence,this paper first extracts product feature expressions and sentimental words in pairs to build a bipartite graph,and then adopts the Weight Normalized SimRank to compute similarity between different feature expressions in the bipartite graph,and finally optimizes the Bayesian classifier in Semi-Supervised Learning via the similarity.Experimental results show that the proposed method is valid.

...read moreread less

2 citations

Posted Content•

Implementation of Privacy-preserving SimRank over Distributed Information Network

[...]

Yu-Wei Chu, Chih-Hua Tai, Ming-Syan Chen, Philip S. Yu

29 Sep 2012-arXiv: Cryptography and Security

TL;DR: This paper addresses the problem of link-based similarity measure of nodes in an information network distributed over different parties and proposes a privacy-preserving SimRank protocol based on fully-homomorphic encryption to provide cryptographic protection for the links.

...read moreread less

Abstract: Information network analysis has drawn a lot attention in recent years. Among all the aspects of network analysis, similarity measure of nodes has been shown useful in many applications, such as clustering, link prediction and community identification, to name a few. As linkage data in a large network is inherently sparse, it is noted that collecting more data can improve the quality of similarity measure. This gives different parties a motivation to cooperate. In this paper, we address the problem of link-based similarity measure of nodes in an information network distributed over different parties. Concerning the data privacy, we propose a privacy-preserving SimRank protocol based on fully-homomorphic encryption to provide cryptographic protection for the links.

...read moreread less

1 citations