scispace - formally typeset
Search or ask a question
Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper defines random walks on uncertain graphs and shows that the definition of random walks satisfies Markov’s property, which makes all existing SimRank computation algorithms on deterministic graphs inapplicable to uncertain graphs.
Abstract: SimRank is a similarity measure between vertices in a graph. Recently, many algorithms have been proposed to efficiently evaluate SimRank similarities. However, the existing algorithms either overlook uncertainty in graph structures or depends on an unreasonable assumption. In this paper, we study SimRank on uncertain graphs. Following the random-walk-based formulation of SimRank on deterministic graphs and the possible world model of uncertain graphs, we first define random walks on uncertain graphs and show that our definition of random walks satisfies Markov’s property. We formulate our SimRank measure based on random walks on uncertain graphs. We discover a critical difference between random walks on uncertain graphs and random walks on deterministic graphs, which makes all existing SimRank computation algorithms on deterministic graphs inapplicable to uncertain graphs. For SimRank computation, we consider computing both single-pair SimRank and single-source top- $K$ SimRank. We propose three algorithms, namely the sampling algorithm with high efficiency, the two-phase algorithm with comparable efficiency and higher accuracy, and a speeding-up algorithm with much higher efficiency. Meanwhile, we present an optimized algorithm for efficient computing the single-source top- $K$ SimRank. The experimental results verify the effectiveness of our SimRank measure and the efficiency of the proposed SimRank computation algorithms.

4 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel local push based algorithm for computing and tracking all-pairs SimRank and develops an iterative parallel two-step framework for local push to take advantage of modern hardwares with multicore CPUs.
Abstract: Measuring similarity among data objects is important in data analysis and mining. SimRank is a popular link-based similarity measurement among nodes in a graph. To compute the all-pairs SimRank matrix accurately, iterative methods are usually used. For static graphs, current iterative solutions are not efficient enough, both in time and space, due to the unnecessary cost and storage by the nature of iterative updating. For dynamic graphs, all current incremental solutions for updating the SimRank matrix are based on an approximated SimRank definition, and thus have no accuracy guarantee. In this paper, we propose a novel local push based algorithm for computing and tracking all-pairs SimRank. Furthermore, we develop an iterative parallel two-step framework for local push to take advantage of modern hardwares with multicore CPUs. We show that our algorithms outperform the state-of-the-art methods.

4 citations

Journal ArticleDOI
TL;DR: A new approach to express the similarity between users profiles is proposed by developing a structural similarity measure to calculate the similarityBetween user profiles based on SimRank measure or similarity, and the properties of bipartite graphs, to take advantage of the information provided by the relational structure between user profiles and their interests.
Abstract: . The user profile is a very important tool in several fields such as recommendation systems, customization systems etc., it is used to narrow the number of data or results provided for a specific user, also to minimize the cost and the time of processing of multiple systems. Whatever the user profile model used, it’s updating and enrichment is a very essential step in the information research process in order to obtain more interesting and satisfactory results, which lead the information systems to develop several techniques aiming to enrich them based especially on similarity methods between user profiles. The similarity methods are used for several tasks such as the detection of duplicate profiles in online social network, also to answer the problem of cold start, and to predict users who can become friends as well as their future intentions, etc. In this paper, we propose a new approach to express the similarity between users profiles by developing a structural similarity measure to calculate the similarity between user profiles based on SimRank measure or similarity ,and the properties of bipartite graphs, in order to take advantage of the information provided by the relational structure between user profiles and their interests, our method is characterized by the similarity propagation between graph's nodes over iterations from source nodes to their successors, so our method finds profiles similar to the query profile, whether the links are direct or indirect between profiles.

4 citations

01 Nov 2017
TL;DR: In this paper, an aggregated user-user similarity measure was proposed for the user-based CF model, which is a weighted aggregation of the SimRank++ similarity on user-item bipartite graph and the cosine similarity of the Linked Open Data (LOD)-based user profiles derived from both the rating data and the items' descriptive attributes found from LOD resources.
Abstract: This paper addresses the sparsity problem in collaborative filtering (CF) by developing an aggregated useruser similarity measure suitable for the user-based CF model. The aggregated similarity measure is a weighted aggregation of the SimRank++ similarity on the user-item bipartite graph and the cosine similarity of the Linked Open Data (LOD)-based user profiles derived from both the rating data and the items' descriptive attributes found from LOD resources. To validate the effectiveness of the aggregated similarity and evaluate the accuracy of rating predictions with the user-based CF method, comparative experiments between four similarity measures, the Pearson correlation coefficient, the SimRank++ similarity, the cosine similarity and the aggregated similarity, were conducted on the MovieLens 100k dataset and DBpedia. The experimental results indicate that the proposed aggregated similarity measure overall outperforms the other three similarity measures in terms of both Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), especially in the cases of 30-100 nearest neighbors.

4 citations

Patent
30 Apr 2014
TL;DR: In this paper, a Chinese term semantic similarity calculating method driven by data is proposed, where the semantic similarity between words of each term and the terms composed of the words can also be measured.
Abstract: The invention discloses a Chinese term semantic similarity calculating method driven by data. The Chinese term semantic similarity calculating method driven by the data comprises the following steps that a text story set is initialized; a relevancy relation graph model is established; the relevancy relation graph model is trimmed through a tf-idf divisibility value; the trimmed relevancy relation graph model is used as the Simrank algorithm to be input, and the semantic similarity between term pairs is calculated through iteration of Simrank; a flexible semantic similarity measurement model is defined with the semantic similarity as the core; collaborative segmentation is conducted on a Chinese news text based on the flexible semantic similarity measurement model. By the adoption of the Chinese term semantic similarity calculating method driven by the data, terms belonging to the same theme can be better distinguished from terms belonging to another theme, and the semantic similarity between words of each term and the term composed of the words can also be measured. An experiment on a standard data set shows that compared with an existing method, the F1-measure absolute value of the result of collaborative segmentation of a news story is increased by 11 percent.

4 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
77% related
Ontology (information science)
57K papers, 869.1K citations
77% related
Scalability
50.9K papers, 931.6K citations
74% related
Tree (data structure)
44.9K papers, 749.6K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202115
202026
201916
201817
201719
201616