scispace - formally typeset
Search or ask a question

Showing papers on "SimRank published in 2011"


Journal ArticleDOI
TL;DR: A method to detect co-saliency from an image pair that may have some objects in common and employ a normalized single-pair SimRank algorithm to compute the similarity score is introduced.
Abstract: In this paper, we introduce a method to detect co-saliency from an image pair that may have some objects in common. The co-saliency is modeled as a linear combination of the single-image saliency map (SISM) and the multi-image saliency map (MISM). The first term is designed to describe the local attention, which is computed by using three saliency detection techniques available in literature. To compute the MISM, a co-multilayer graph is constructed by dividing the image pair into a spatial pyramid representation. Each node in the graph is described by two types of visual descriptors, which are extracted from a representation of some aspects of local appearance, e.g., color and texture properties. In order to evaluate the similarity between two nodes, we employ a normalized single-pair SimRank algorithm to compute the similarity score. Experimental evaluation on a number of image pairs demonstrates the good performance of the proposed method on the co-saliency detection task.

322 citations


Proceedings ArticleDOI
21 Aug 2011
TL;DR: RoleSim as mentioned in this paper is a role similarity metric which satisfies axioms and which can be computed with a simple iterative algorithm, and rigorously prove that RoleSim satisfies all the axiomatic properties and demonstrate its superior interpretative power on both synthetic and real datasets.
Abstract: A key task in analyzing social networks and other complex networks is role analysis: describing and categorizing nodes by how they interact with other nodes. Two nodes have the same role if they interact with equivalent sets of neighbors. The most fundamental role equivalence is automorphic equivalence. Unfortunately, the fastest algorithm known for graph automorphism is nonpolynomial. Moreover, since exact equivalence is rare, a more meaningful task is measuring the role similarity between any two nodes. This task is closely related to the link-based similarity problem that SimRank addresses. However, SimRank and other existing simliarity measures are not sufficient because they do not guarantee to recognize automorphically or structurally equivalent nodes. This paper makes two contributions. First, we present and justify several axiomatic properties necessary for a role similarity measure or metric. Second, we present RoleSim, a role similarity metric which satisfies these axioms and which can be computed with a simple iterative algorithm. We rigorously prove that RoleSim satisfies all the axiomatic properties and demonstrate its superior interpretative power on both synthetic and real datasets.

109 citations


Journal ArticleDOI
01 Aug 2011
TL;DR: This paper proposes link-based similarity join (LS-join), which extends the similarity join operator to link- based measures, and improves the solutions for PPR and SR, which involve expensive random-walk operations.
Abstract: Graphs can be found in applications like social networks, bibliographic networks, and biological databases. Understanding the relationship, or links, among graph nodes enables applications such as link prediction, recommendation, and spam detection. In this paper, we propose link-based similarity join (LS-join), which extends the similarity join operator to link-based measures. Given two sets of nodes in a graph, the LS-join returns all pairs of nodes that are highly similar to each other, with respect to an e-function. The e-function generalizes common measures like Personalized PageRank (PPR) and SimRank (SR). We study an efficient LS-join algorithm on a large graph. We further improve our solutions for PPR and SR, which involve expensive random-walk operations. We validate our solutions by performing extensive experiments on three real graph datasets.

32 citations


Journal ArticleDOI
TL;DR: Simrank as discussed by the authors is a stand-alone k-mer tool that allows users to identify database strings the most similar to query strings, which can be used for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration.
Abstract: Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp . Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

31 citations


Posted Content
TL;DR: RoleSim is presented, a role similarity metric which satisfies all the axiomatic properties and which can be computed with a simple iterative algorithm and demonstrated its superior interpretative power on both synthetic and real datasets.
Abstract: A key task in social network and other complex network analysis is role analysis: describing and categorizing nodes according to how they interact with other nodes. Two nodes have the same role if they interact with equivalent sets of neighbors. The most fundamental role equivalence is automorphic equivalence. Unfortunately, the fastest algorithms known for graph automorphism are nonpolynomial. Moreover, since exact equivalence may be rare, a more meaningful task is to measure the role similarity between any two nodes. This task is closely related to the structural or link-based similarity problem that SimRank attempts to solve. However, SimRank and most of its offshoots are not sufficient because they do not fully recognize automorphically or structurally equivalent nodes. In this paper we tackle two problems. First, what are the necessary properties for a role similarity measure or metric? Second, how can we derive a role similarity measure satisfying these properties? For the first problem, we justify several axiomatic properties necessary for a role similarity measure or metric: range, maximal similarity, automorphic equivalence, transitive similarity, and the triangle inequality. For the second problem, we present RoleSim, a new similarity metric with a simple iterative computational method. We rigorously prove that RoleSim satisfies all the axiomatic properties. We also introduce an iceberg RoleSim algorithm which can guarantee to discover all pairs with RoleSim score no less than a user-defined threshold $\theta$ without computing the RoleSim for every pair. We demonstrate the superior interpretative power of RoleSim on both both synthetic and real datasets.

23 citations


Proceedings ArticleDOI
25 Jul 2011
TL;DR: A method by using clustering, SimRank and adapted SimRank algorithms to recommend matching candidates for online dating networks can achieve nearly double the performance of the traditional collaborative filtering and common neighbor methods of recommendation.
Abstract: A new relationship type of social networks - online dating - are gaining popularity. With a large member base, users of a dating network are overloaded with choices about their ideal partners. Recommendation methods can be utilized to overcome this problem. However, traditional recommendation methods do not work effectively for online dating networks where the dataset is sparse and large, and a two-way matching is required. This paper applies social networking concepts to solve the problem of developing a recommendation method for online dating networks. We propose a method by using clustering, SimRank and adapted SimRank algorithms to recommend matching candidates. Empirical results show that the proposed method can achieve nearly double the performance of the traditional collaborative filtering and common neighbor methods of recommendation.

21 citations


Journal ArticleDOI
TL;DR: CiteRank, a combination of a similarity ranking with a static ranking, implies that CiteRank can improve the effectiveness of research paper searching on social bookmarking websites.
Abstract: Search engines and social bookmarking systems are important tools for web resource discovery. The performance and capabilities of web search engines are vital. This paper proposes CiteRank, a combination of a similarity ranking with a static ranking. Similarity ranking measures the match between a query and a research paper index; while a static ranking, or a query independent ranking, measures the quality of a research paper. For this particular study, a group of factors containing: number of groups contained the posted paper, year of publication, research paper posted time, and priority of a research paper was used to determine a static ranking score. The NDCG was used as an evaluation metric. CiteRank was compared with SimRank and StaticRank. The results of the experiment showed that CiteRank produces a better ranking than the other methods. This implies that CiteRank can improve the effectiveness of research paper searching on social bookmarking websites.

13 citations


Proceedings ArticleDOI
24 Oct 2011
TL;DR: A Two-Stage SimRank algorithm based on SimRank and some clustering algorithms to compute the similarity among queries is proposed and used to discover relevant terms for query expansion and Experimental results show that this approach can discover qualified terms effectively and improve retrieval performance.
Abstract: It is commonly believed that query logs from Web search are a gold mine for search business, because they reflect users' preference over Web pages presented by search engines, so a lot of studies based on query logs have been carried out in the last few years. In this study, we assume that two queries are relevant to each other when they have same clicked page in their result lists, and we also consider the queries' topics of user's need. Thus, we propose a Two-Stage SimRank (called TSS in this paper) algorithm based on SimRank and some clustering algorithms to compute the similarity among queries, and then use it to discover relevant terms for query expansion, considering the information of topics and the global relationships of queries concurrently, with a query log collected by a practical search engine. Experimental results on two TREC test collections show that our approach can discover qualified terms effectively and improve retrieval performance.

6 citations


21 Jun 2011
TL;DR: A stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings, and provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.
Abstract: BackgroundTerabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.ResultsHere we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset.ConclusionsSimrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

4 citations


01 Jan 2011
TL;DR: This thesis proposes an item-based top-N recommendation algorithm called GCP, which refines the "1 item"-based traditional CP (Conditional Probability) algorithm by taking the "multi-item"-based conditional probabilities into account and presents the item-graph model, which is used for tracking the relationships between items.
Abstract: Techniques for measuring similarity between objects in a graph is required by many applications in different domains, such as Web mining, social networks, information retrieval, citation analysis, and recommender systems. In this thesis, we first focus on the neighbor-based approach, which is based on the intuition that "similar objects have similar neighbors." Early neighbor-based similarity measures simply count the common and/or different neighbors between objects, such as Co-citation. They perform poorly due to lack of flexibility when dealing with sparse datasets such as the Web. SimRank takes similarities between neighbors into account. However, it has a serious counter-intuitive loophole. The primary objective of this thesis is to study how to improve the effectiveness of similarity measurement by making good use of objects' neighborhood structures. Consequently, we propose three neighbor-based techniques. First, we propose the MatchSim algorithm, which relaxes the "neighbor counting" strategy by recursively defining similarity between objects by the average similarity between their maximum-matched similar neighbor-pairs. Moreover, it conforms to the basic intuitions of similarity, thus can avoid the counter-intuitive problem in SimRank. Second, we propose the PageSim algorithm, which takes the influences of indirect neighbors into consideration by applying feature propagation strategy. In PageSim, each object has a unique feature and propagates this feature to its (direct and indirect) neighbors via links. Similarity between objects is then calculated by comparing the features they have. Approximation techniques are suggested for the proposed algorithms to improve their computational efficiency. Experimental results on real-world datasets show that they outperform classical algorithms in terms of effectiveness. Third, we propose a simple but important model called the Extended Neighborhood Structure (ENS), which defines a bi-directional (inlink and outlink) and multi-hop neighborhood structure. Several classical algorithms are extended based on this model. Experiments show the extended algorithms outperform their original versions significantly in accuracy. Last, we focus on the top-N recommendation problem, which is described as "given the preference information of users, recommending a user top-N items that he might like, based on his basket (the items he likes)." First, we present the item-graph model, which is constructed directly from the user-item matrix and is used for tracking the relationships between items. Second, we propose an item-based top-N recommendation algorithm called GCP ( Generalized Conditional Probability), which refines the "1 item"-based traditional CP (Conditional Probability) algorithm by taking the "multi-item"-based conditional probabilities into account. The item-graph is used for approximately calculate these probabilities. The GCP algorithm is tested against the traditional CP and COS algorithms on MovieLens dataset. Experimental results show that GCP performs the best in terms of accuracy.

4 citations