scispace - formally typeset
Search or ask a question
Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.


Papers
More filters
Journal ArticleDOI
Guoming He1, Cuiping Li1, Hong Chen1, Xiaoyong Du, Haijun Feng1 
TL;DR: This paper exploits the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs and proposes the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs.
Abstract: Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a first-order Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the link-updating problem but also the node-updating problem. We give the corresponding theoretical justification and analysis, propose three optimization strategies to further improve the computation efficiency, and extend the proposed algorithm to dynamic graphs. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.

4 citations

21 Jun 2011
TL;DR: A stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings, and provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.
Abstract: BackgroundTerabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.ResultsHere we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset.ConclusionsSimrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

4 citations

Book ChapterDOI
27 Aug 2013
TL;DR: A novel approach to conversational recommendation, UtilSim, where utilities corresponding to products get continually updated as a user iteratively interacts with the system, helping her discover her hidden preferences in the process.
Abstract: Conversational Recommender Systems belong to a class of knowledge based systems which simulate a customer’s interaction with a shopkeeper with the help of repeated user feedback till the user settles on a product. One of the modes for getting user feedback is Preference Based Feedback, which is especially suited for novice users(having little domain knowledge), who find it easy to express preferences across products as a whole, rather than specific product features. Such kind of novice users might not be aware of the specific characteristics of the items that they may be interested in, hence, the shopkeeper/system should show them a set of products during each interaction, which can constructively stimulate their preferences, leading them to a desirable product in subsequent interactions. We propose a novel approach to conversational recommendation, UtilSim, where utilities corresponding to products get continually updated as a user iteratively interacts with the system, helping her discover her hidden preferences in the process. We show that UtilSim, which combines domain-specific “dominance” knowledge with SimRank based similarity, significantly outperforms the existing conversational approaches using Preference Based Feedback in terms of recommendation efficiency.

4 citations

Proceedings ArticleDOI
24 Aug 2014
TL;DR: This tutorial uses the Netflix use case as a driving example of a prototypical industrial-scale recommender system and reviews the usage of modern algorithmic approaches that include algorithms such as Factorization Machines, Restricted Boltzmann Machines, SimRank, Deep Neural Networks, or Listwise Learning-to-rank.
Abstract: In 2006, Netflix announced a $1M prize competition to advance recommendation algorithms. The recommendation problem was simplified as the accuracy in predicting a user rating measured by the Root Mean Squared Error. While that formulation helped get the attention of the research community in the area, it may have put an excessive focus on what is simply one of possible approaches to recommendations. In this tutorial we will describe different components of modern recommender systems such as: personalized ranking, similarity, explanations, context-awareness, or search as recommendation. In the first part, we will use the Netflix use case as a driving example of a prototypical industrial-scale recommender system. We will also review the usage of modern algorithmic approaches that include algorithms such as Factorization Machines, Restricted Boltzmann Machines, SimRank, Deep Neural Networks, or Listwise Learning-to-rank. In the second part, we will focus on the area of context-aware recommendations where the two dimensional user-item recommender problem is turned into an n-dimensional space.

4 citations

01 Jan 2011
TL;DR: This thesis proposes an item-based top-N recommendation algorithm called GCP, which refines the "1 item"-based traditional CP (Conditional Probability) algorithm by taking the "multi-item"-based conditional probabilities into account and presents the item-graph model, which is used for tracking the relationships between items.
Abstract: Techniques for measuring similarity between objects in a graph is required by many applications in different domains, such as Web mining, social networks, information retrieval, citation analysis, and recommender systems. In this thesis, we first focus on the neighbor-based approach, which is based on the intuition that "similar objects have similar neighbors." Early neighbor-based similarity measures simply count the common and/or different neighbors between objects, such as Co-citation. They perform poorly due to lack of flexibility when dealing with sparse datasets such as the Web. SimRank takes similarities between neighbors into account. However, it has a serious counter-intuitive loophole. The primary objective of this thesis is to study how to improve the effectiveness of similarity measurement by making good use of objects' neighborhood structures. Consequently, we propose three neighbor-based techniques. First, we propose the MatchSim algorithm, which relaxes the "neighbor counting" strategy by recursively defining similarity between objects by the average similarity between their maximum-matched similar neighbor-pairs. Moreover, it conforms to the basic intuitions of similarity, thus can avoid the counter-intuitive problem in SimRank. Second, we propose the PageSim algorithm, which takes the influences of indirect neighbors into consideration by applying feature propagation strategy. In PageSim, each object has a unique feature and propagates this feature to its (direct and indirect) neighbors via links. Similarity between objects is then calculated by comparing the features they have. Approximation techniques are suggested for the proposed algorithms to improve their computational efficiency. Experimental results on real-world datasets show that they outperform classical algorithms in terms of effectiveness. Third, we propose a simple but important model called the Extended Neighborhood Structure (ENS), which defines a bi-directional (inlink and outlink) and multi-hop neighborhood structure. Several classical algorithms are extended based on this model. Experiments show the extended algorithms outperform their original versions significantly in accuracy. Last, we focus on the top-N recommendation problem, which is described as "given the preference information of users, recommending a user top-N items that he might like, based on his basket (the items he likes)." First, we present the item-graph model, which is constructed directly from the user-item matrix and is used for tracking the relationships between items. Second, we propose an item-based top-N recommendation algorithm called GCP ( Generalized Conditional Probability), which refines the "1 item"-based traditional CP (Conditional Probability) algorithm by taking the "multi-item"-based conditional probabilities into account. The item-graph is used for approximately calculate these probabilities. The GCP algorithm is tested against the traditional CP and COS algorithms on MovieLens dataset. Experimental results show that GCP performs the best in terms of accuracy.

4 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
77% related
Ontology (information science)
57K papers, 869.1K citations
77% related
Scalability
50.9K papers, 931.6K citations
74% related
Tree (data structure)
44.9K papers, 749.6K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202115
202026
201916
201817
201719
201616