scispace - formally typeset
Search or ask a question

Showing papers on "SimRank published in 2021"


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper presented a multidimensional link prediction model for We the Media networks using public opinions on Weibocom data, which can evaluate the effects of different dimensions of public opinion factors on the prediction of user-node links.

11 citations


Journal ArticleDOI
TL;DR: This work introduces to the research community of expert and intelligent systems, for the first time, a stream-based version of SimRank algorithm, which is able to run over time-evolving graphs, which outperforms other state-of-the-art methods in terms of accuracy and diversity.
Abstract: Recommender systems are among the most widespread applications of artificial intelligence techniques. For instance, news recommender systems serve users in managing the overload of information they come across when accessing news portals. Obviously, in the news domain time-awareness of recommendation approaches are crucial. However, most of these approaches missed to consider user sessions, which group the items that a user interacted with. In this paper, we study the problem of session-based recommendations by running SimRank on time-evolving heterogeneous graphs. In particular, we construct a dynamic heterogeneous multi-partite graph and adjust SimRank to run on it by using different (i) sliding time window sizes, (ii) sub-graphs used for model learning and (iii) sequential article weighting strategies. We evaluate our algorithms on two real-life datasets, and we show that our method outperforms other state-of-the-art methods in terms of accuracy and diversity. The significance and impact of this work is important because it introduces to the research community of expert and intelligent systems, for the first time, a stream-based version of SimRank algorithm, which is able to run over time-evolving graphs.

8 citations


Journal ArticleDOI
TL;DR: A novel similarity model, namely RoleSim*, is proposed, which accurately evaluates pairwise role similarities in a more comprehensive manner and achieves higher accuracy than its competitors while scaling well on sizable graphs with billions of edges.
Abstract: RoleSim and SimRank are among the popular graph-theoretic similarity measures with many applications in, e.g., web search, collaborative filtering, and sociometry. While RoleSim addresses the automorphic (role) equivalence of pairwise similarity which SimRank lacks, it ignores the neighboring similarity information out of the automorphically equivalent set. Consequently, two pairs of nodes, which are not automorphically equivalent by nature, cannot be well distinguished by RoleSim if the averages of their neighboring similarities over the automorphically equivalent set are the same. To alleviate this problem: 1) We propose a novel similarity model, namely RoleSim*, which accurately evaluates pairwise role similarities in a more comprehensive manner. RoleSim* not only guarantees the automorphic equivalence that SimRank lacks, but also takes into account the neighboring similarity information outside the automorphically equivalent sets that are overlooked by RoleSim. 2) We prove the existence and uniqueness of the RoleSim* solution, and show its three axiomatic properties (i.e., symmetry, boundedness, and non-increasing monotonicity). 3) We provide a concise bound for iteratively computing RoleSim* formula, and estimate the number of iterations required to attain a desired accuracy. 4) We induce a distance metric based on RoleSim* similarity, and show that the RoleSim* metric fulfills the triangular inequality, which implies the sum-transitivity of its similarity scores. 5) We present a threshold-based RoleSim* model that reduces the computational time further with provable accuracy guarantee. 6) We propose a single-source RoleSim* model, which scales well for sizable graphs. 7) We also devise methods to scale RoleSim* based search by incorporating its triangular inequality property with partitioning techniques. Our experimental results on real datasets demonstrate that RoleSim* achieves higher accuracy than its competitors while scaling well on sizable graphs with billions of edges.

6 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel local push based algorithm for computing and tracking all-pairs SimRank and develops an iterative parallel two-step framework for local push to take advantage of modern hardwares with multicore CPUs.
Abstract: Measuring similarity among data objects is important in data analysis and mining. SimRank is a popular link-based similarity measurement among nodes in a graph. To compute the all-pairs SimRank matrix accurately, iterative methods are usually used. For static graphs, current iterative solutions are not efficient enough, both in time and space, due to the unnecessary cost and storage by the nature of iterative updating. For dynamic graphs, all current incremental solutions for updating the SimRank matrix are based on an approximated SimRank definition, and thus have no accuracy guarantee. In this paper, we propose a novel local push based algorithm for computing and tracking all-pairs SimRank. Furthermore, we develop an iterative parallel two-step framework for local push to take advantage of modern hardwares with multicore CPUs. We show that our algorithms outperform the state-of-the-art methods.

4 citations


Proceedings ArticleDOI
26 Oct 2021
TL;DR: AdaSim as discussed by the authors is a recursive similarity measure based on the Ada philosophy, which is applicable to both directed and undirected graphs and provides identical accuracy to that of Ada on the first iteration.
Abstract: In the literature, various link-based similarity measures such as Adamic/Adar (in short Ada), SimRank, and random walk with restart (RWR) have been proposed. Contrary to SimRank and RWR, Ada is a non-recursive measure, which exploits the local graph structure in similarity computation. Motivated by Ada's promising results in various graph-related tasks, along with the fact that SimRank is a recursive generalization of the co -citation measure, in this paper, we propose AdaSim, a recursive similarity measure based on the Ada philosophy. Our AdaSim provides identical accuracy to that of Ada on the first iteration and it is applicable to both directed and undirected graphs. To accelerate our iterative form, we also propose a matrix form that is dramatically faster while providing the exact AdaSim scores. We conduct extensive experiments with five real-world datasets to evaluate both the effectiveness and efficiency of our AdaSim in comparison with those of existing similarity measures and graph embedding methods in the task of similarity computation of nodes. Our experimental results show that 1) AdaSim significantly improves the effectiveness of Ada and outperforms other competitors, 2) its efficiency is comparable to that of SimRank* while being better than the others, 3) AdaSim is not sensitive to the parameter tuning, and 4) similarity measures are better than embedding methods to compute similarity of nodes.

3 citations


Journal ArticleDOI
05 Jun 2021
TL;DR: This paper presents ExactSim, the first algorithm that computes the exact single-source and top-k SimRank results on large graphs with precision up to 7 decimal places with high probability, and presents the first experimental study of the accuracy/cost trade-offs of existing approximate SimRank algorithms on large real-world graphs and synthetic graphs.
Abstract: SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-k SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than $$10^6$$ nodes. Consequently, no existing work has evaluated the actual accuracy of various single-source and top-k SimRank algorithms on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-k SimRank results on large graphs. This algorithm produces ground truths with precision up to 7 decimal places with high probability. With the ground truths computed by ExactSim, we present the first experimental study of the accuracy/cost trade-offs of existing approximate SimRank algorithms on large real-world graphs and synthetic graphs. Finally, we use the ground truths to exploit various properties of SimRank distributions on large graphs.

3 citations


Journal ArticleDOI
TL;DR: A novel matrix random sampling approach to accelerate computation speed and reduce memory cost and a fast sparse matrix-matrix multiplication technique which makes the time complexity of single-source query free of the graph size.

3 citations


Journal ArticleDOI
10 Jul 2021
TL;DR: In this paper, the authors introduced the idea of SimRank (that is, the similarity of a station is due to the similarity between its bicycle source station and destination station), and assigned weights to association relationships to define the similarity algorithm w-SimRank of stations.
Abstract: With the increasing popularity of shared bikes, the indiscriminate parking of bicycles in cities has increasingly become a difficulty in urban management. The tidal phenomenon of large numbers of urban residents during rush hour is the root cause of the indiscriminate parking of bicycles in many subway stations and commercial areas. Optimizing the scheduling strategy of shared bikes is one of the effective solutions to solve the problem of random parking and reduce the scheduling cost. The cycle is a short - distance vehicle, and its circulation law is in line with the characteristics of the small world of urban traffic. That is, most of the bikes flow within the small world region, while only a small part of the bikes flow between the small world regions. With the massive accumulation of bike-sharing borrow and return data, the method of clustering the borrow and return stations and dividing regions according to the clustering results has attracted the attention of industry experts and researchers. it is effectively to apply in intelligent scheduling related industries. Although there have been some studies on the station clustering in the current literature, because these studies are basically based on the fixed features of the site (site location, pile number, etc.), the results cannot find an effective small world region of bicycles. In order to find out the effective small world region of bicycles, we introduced the idea of SimRank (that is, the similarity of a station is due to the similarity of its bicycle source station and destination station), and assigned weights to association relationships (the number of times of borrowing and returning) to define the similarity algorithm w-SimRank of stations. Then, the station clustering was done in line with skyline thinking. Finally, in order to verify the effectiveness of the algorithm, we implemented the station clustering based on SimRank algorithm, and compared the clustering effect with the W-SimRank algorithm proposed in this paper to verify the effectiveness of the W-SimRank algorithm, and analyzed the influence of the key parameters of the algorithm on the algorithm. And then

2 citations


Book ChapterDOI
01 Jan 2021
TL;DR: This chapter describes additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account and investigates the performance of a new, graph neural network-based framework, SEAL, which has been shown in past trials to perform better than heuristic-based approaches such as the Katz index, SimRank and rooted PageRank.
Abstract: Twitter is a prominent multilingual social networking site where users can post messages known as “tweets”. Twitter, like other social networking sites such as Facebook, allows users to categorize tweets by the use of “hashtags”. Communication on Twitter can be mapped in terms of hashtag graphs, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex in hashtag graphs can be weighted with the number of tweets a hashtag has occurred in, and edges can be weighted with the number of tweets both hashtags have co-occurred in, creating a “weighted hashtag graph”. In this chapter, we describe additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account. We base our novel predictive additions on the assumption that more popular hashtags have a higher probability to appear with other hashtags in the future. We then apply these improved methods to three sets of Twitter data with the intent of predicting hashtag co-occurrences in the future. In addition to these methods, we investigate the performance of a new, graph neural network-based framework, SEAL, which has been shown in past trials to perform better than heuristic-based approaches such as the Katz index, SimRank and rooted PageRank. Experiments were conducted on real-life data sets consisting of over 3,000,000 combined unique tweets and over 250,000 combined unique hashtags. Results from the experiments show that simpler heuristic-based scoring methods have marginal performance that decreases with the addition of more data over time. On the other hand, SEAL is shown to have superior performance in hashtag graph link prediction over the approaches it has been previously compared against in other domains. The AUC score of 0.959 obtained in our experiments by using SEAL significantly exceeds those of our benchmark approaches for link prediction, which include the Katz index, SimRank, and rooted PageRank.

2 citations


Journal ArticleDOI
TL;DR: Camo is presented, an efficient algorithm for retrieving the top-k similarities from an arbitrary set of pairs and two types of indexes are introduced to boost the efficiency of Carmo.
Abstract: Measuring similarities among different nodes is important in graph analysis tasks, such as link prediction, and recommendation. Among different similarity measures, SimRank is one of the most popular and promising ones, and has received a lot of research attention. While most current studies focus on single-pair, single-source/top-k, and all-pairs SimRank computation, few of them have studied finding similar pairs given a set of node pairs, which has attractive applications in personalized search and recommendation tasks. In this paper, we present Carmo, an efficient algorithm for retrieving the top-k similarities from an arbitrary set of pairs. In addition, we introduce two types of indexes to boost the efficiency of Carmo: one is hub-based, the other is tree-based. We show the effectiveness and efficiency of our proposed methods by extensive experiments.

2 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed JacSim* to solve the pairwise normalization problem of SimRank by exploiting the paths neglected by JacSim in similarity computation. But, the JacSim*) matrix form is not sensitive to the number of node-pairs with common neighbors and it has simpler, easier to understand, and easier to implement formulas in both iterative and matrix forms than those of JacSim.
Abstract: Despite the fact that SimRank has been successfully applied to various applications as a link-based similarity measure, it suffers from a counter-intuitive property called a pairwise normalization problem ; JacSim is a powerful variant of SimRank that alleviates this problem. In this paper, we first point out three existing drawbacks of JacSim and then propose JacSim* to effectively solve them; JacSim* exploits those paths neglected by JacSim in similarity computation, its matrix form provides the exact similarity scores while not being sensitive to the number of node-pairs with common neighbors, and it has simpler, easier to understand, and easier to implement formulas in both iterative and matrix forms than those of JacSim. We conduct extensive experiments with eight real-world datasets to evaluate both the accuracy and performance of JacSim* in comparison with those of JacSim. Our experimental results demonstrate that JacSim* shows better accuracy than JacSim and the JacSim* matrix form is dramatically faster than its own iterative form and also than the two forms of JacSim with all datasets.

Proceedings ArticleDOI
26 Oct 2021
TL;DR: Wang et al. as mentioned in this paper proposed RG-SimRank (Random surfer Graph-based SimRank), which adopts SimRank to compute similarities in random surfer graph instead of the original network, which has a same form of SimRank and hence inherits the optimization techniques on similarity computation.
Abstract: Link-based similarity computation arises in many real applications, including web search, clustering and recommender system. Lots of similarity measures are devoted recently, but there is one undesirable drawback, called ''path missing'' issue, i.e., the paths between objects are not fully considered for similarity computation. For example, SimRank considers only in-coming paths of equal length from a common ''center'' object, and a large portion of other paths are fully neglected. A comprehensive measure can be modeled by tallying all the possible paths between objects, but a large number of traverses would be required for these paths to fetch the similarities, which might increase the computational difficulty. In this paper, we propose a comprehensive similarity measure, namely RG-SimRank (Random surfer Graph-based SimRank), which resolves the "path missing'' issue with inheriting the philosophy of SimRank. We build a random surfer graph by allowing the surfer to stay at current object, go to other objects against in-links or along out-links. RG-SimRank adopts SimRank to compute similarities in random surfer graph instead of the original network, which has a same form of SimRank and hence inherits the optimization techniques on similarity computation. We prove that RG-SimRank considers all the possible paths of any direction and any length. And it provides a general solution to assess similarities, under which lots of existing similarity measures become its special cases. Other similarity measures besides SimRank can also be enhanced similarly using random surfer graph. Extensive experiments on real datasets demonstrate the performance of the proposed approach.

Journal ArticleDOI
TL;DR: This work proposes a novel framework named SEGNN, which aims at finding and using the sparse representation knowledge to improve the result of image detection and applies a novel SimRank method, to justify the rationality of the semantic reasoning.

Journal ArticleDOI
01 Jan 2021-PLOS ONE
TL;DR: Zhang et al. as mentioned in this paper proposed a link-based semantic similarity search method, namely PictureSim, for effectively searching similar pictures by building a picture-tag network, in which tags and pictures are treated as nodes, and relationships between pictures and tags are regarded as edges.
Abstract: Searching similar pictures for a given picture is an important task in numerous applications, including image recommendation system, image classification and image retrieval. Previous studies mainly focused on the similarities of content, which measures similarities based on visual features, such as color and shape, and few of them pay enough attention to semantics. In this paper, we propose a link-based semantic similarity search method, namely PictureSim, for effectively searching similar pictures by building a picture-tag network. The picture-tag network is built by "description" relationships between pictures and tags, in which tags and pictures are treated as nodes, and relationships between pictures and tags are regarded as edges. Then we design a TF-IDF-based model to removes the noisy links, so the traverses of these links can be reduced. We observe that "similar pictures contain similar tags, and similar tags describe similar pictures", which is consistent with the intuition of the SimRank. Consequently, we utilize the SimRank algorithm to compute the similarity scores between pictures. Compared with content-based methods, PictureSim could effectively search similar pictures semantically. Extensive experiments on real datasets to demonstrate the effectiveness and efficiency of the PictureSim.

Journal ArticleDOI
01 Jul 2021
TL;DR: Various web page ranking algorithms such as Page Rank, Time Rank, EigenRumor, Distance Rank, SimRank, etc. are analyzed and compared based on some parameters, including the mining technique to which the algorithm belongs, the methodology used for ranking web pages, time complexity, input parameters, and the result relevancy to the user query.
Abstract: Due to the daily expansion of the web, the amount of information has increased significantly. Thus, the need for retrieving relevant information has also increased. In order to explore the internet, users depend on various search engines. Search engines face a significant challenge in returning the most relevant results for a user's query. The search engine's performance is determined by the algorithm used to rank web pages, which prioritizes the pages with the most relevancy to appear at the top of the result page. In this paper, various web page ranking algorithms such as Page Rank, Time Rank, EigenRumor, Distance Rank, SimRank, etc. are analyzed and compared based on some parameters, including the mining technique to which the algorithm belongs (for instance, Web Content Mining, Web Structure Mining, and Web Usage Mining), the methodology used for ranking web pages, time complexity (amount of time to run an algorithm), input parameters (parameters utilized in the ranking process such as InLink, OutLink, Tag name, Keyword, etc.), and the result relevancy to the user query.