Topic
SimRank
About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This paper exploits the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs and proposes the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs.
Abstract: Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a first-order Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the link-updating problem but also the node-updating problem. We give the corresponding theoretical justification and analysis, propose three optimization strategies to further improve the computation efficiency, and extend the proposed algorithm to dynamic graphs. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.
4 citations
21 Jun 2011
TL;DR: A stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings, and provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.
Abstract: BackgroundTerabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.ResultsHere we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset.ConclusionsSimrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.
4 citations
••
27 Aug 2013TL;DR: A novel approach to conversational recommendation, UtilSim, where utilities corresponding to products get continually updated as a user iteratively interacts with the system, helping her discover her hidden preferences in the process.
Abstract: Conversational Recommender Systems belong to a class of knowledge based systems which simulate a customer’s interaction with a shopkeeper with the help of repeated user feedback till the user settles on a product. One of the modes for getting user feedback is Preference Based Feedback, which is especially suited for novice users(having little domain knowledge), who find it easy to express preferences across products as a whole, rather than specific product features. Such kind of novice users might not be aware of the specific characteristics of the items that they may be interested in, hence, the shopkeeper/system should show them a set of products during each interaction, which can constructively stimulate their preferences, leading them to a desirable product in subsequent interactions. We propose a novel approach to conversational recommendation, UtilSim, where utilities corresponding to products get continually updated as a user iteratively interacts with the system, helping her discover her hidden preferences in the process. We show that UtilSim, which combines domain-specific “dominance” knowledge with SimRank based similarity, significantly outperforms the existing conversational approaches using Preference Based Feedback in terms of recommendation efficiency.
4 citations
••
24 Aug 2014TL;DR: This tutorial uses the Netflix use case as a driving example of a prototypical industrial-scale recommender system and reviews the usage of modern algorithmic approaches that include algorithms such as Factorization Machines, Restricted Boltzmann Machines, SimRank, Deep Neural Networks, or Listwise Learning-to-rank.
Abstract: In 2006, Netflix announced a $1M prize competition to advance recommendation algorithms. The recommendation problem was simplified as the accuracy in predicting a user rating measured by the Root Mean Squared Error. While that formulation helped get the attention of the research community in the area, it may have put an excessive focus on what is simply one of possible approaches to recommendations. In this tutorial we will describe different components of modern recommender systems such as: personalized ranking, similarity, explanations, context-awareness, or search as recommendation. In the first part, we will use the Netflix use case as a driving example of a prototypical industrial-scale recommender system. We will also review the usage of modern algorithmic approaches that include algorithms such as Factorization Machines, Restricted Boltzmann Machines, SimRank, Deep Neural Networks, or Listwise Learning-to-rank. In the second part, we will focus on the area of context-aware recommendations where the two dimensional user-item recommender problem is turned into an n-dimensional space.
4 citations
01 Jan 2011
TL;DR: This thesis proposes an item-based top-N recommendation algorithm called GCP, which refines the "1 item"-based traditional CP (Conditional Probability) algorithm by taking the "multi-item"-based conditional probabilities into account and presents the item-graph model, which is used for tracking the relationships between items.
Abstract: Techniques for measuring similarity between objects in a graph is required by many applications in different domains, such as Web mining, social networks, information retrieval, citation analysis, and recommender systems. In this thesis, we first focus on the neighbor-based approach, which is based on the intuition that "similar objects have similar neighbors."
Early neighbor-based similarity measures simply count the common and/or different neighbors between objects, such as Co-citation. They perform poorly due to lack of flexibility when dealing with sparse datasets such as the Web. SimRank takes similarities between neighbors into account. However, it has a serious counter-intuitive loophole. The primary objective of this thesis is to study how to improve the effectiveness of similarity measurement by making good use of objects' neighborhood structures.
Consequently, we propose three neighbor-based techniques. First, we propose the MatchSim algorithm, which relaxes the "neighbor counting" strategy by recursively defining similarity between objects by the average similarity between their maximum-matched similar neighbor-pairs. Moreover, it conforms to the basic intuitions of similarity, thus can avoid the counter-intuitive problem in SimRank. Second, we propose the PageSim algorithm, which takes the influences of indirect neighbors into consideration by applying feature propagation strategy. In PageSim, each object has a unique feature and propagates this feature to its (direct and indirect) neighbors via links. Similarity between objects is then calculated by comparing the features they have. Approximation techniques are suggested for the proposed algorithms to improve their computational efficiency. Experimental results on real-world datasets show that they outperform classical algorithms in terms of effectiveness. Third, we propose a simple but important model called the Extended Neighborhood Structure (ENS), which defines a bi-directional (inlink and outlink) and multi-hop neighborhood structure. Several classical algorithms are extended based on this model. Experiments show the extended algorithms outperform their original versions significantly in accuracy.
Last, we focus on the top-N recommendation problem, which is described as "given the preference information of users, recommending a user top-N items that he might like, based on his basket (the items he likes)." First, we present the item-graph model, which is constructed directly from the user-item matrix and is used for tracking the relationships between items. Second, we propose an item-based top-N recommendation algorithm called GCP ( Generalized Conditional Probability), which refines the "1 item"-based traditional CP (Conditional Probability) algorithm by taking the "multi-item"-based conditional probabilities into account. The item-graph is used for approximately calculate these probabilities. The GCP algorithm is tested against the traditional CP and COS algorithms on MovieLens dataset. Experimental results show that GCP performs the best in terms of accuracy.
4 citations