scispace - formally typeset
Search or ask a question

Showing papers on "SimRank published in 2018"


Journal ArticleDOI
TL;DR: This paper proposes a novel Mashup service clustering approach based on a structural similarity and a genetic algorithm based clustering algorithm that can cluster Mashup services efficiently without any constraints on the number of clusters, and its performance is better than other Mashupservice clustering approaches based on semantic metrics.

48 citations


Proceedings ArticleDOI
16 Apr 2018
TL;DR: This paper proposes a novel local push based algorithm for computing all-pairs SimRank and shows that its algorithms outperform the state-of-the-art static and dynamic all-Pair SimRank algorithms.
Abstract: SimRank is a popular link-based similarity measurement among nodes in a graph. To compute the all-pairs SimRank matrix accurately, iterative methods are usually used. For static graphs, current iterative solutions are not efficient enough, both in time and space, due to unnecessary cost and storage by the nature of iterative updating. For dynamic graphs, all current incremental solutions for updating the Sim-Rank matrix are based on an approximated SimRank definition, and thus have no accuracy guarantee. In this paper, we propose a novel local push based algorithm for computing all-pairs SimRank. We show that our algorithms outperform the state-of-the-art static and dynamic all-pairs SimRank algorithms.

20 citations


Journal ArticleDOI
TL;DR: A Monte Carlo based method to enable the fast top-to-bottom SimRank computation over large undirected graphs, which outperforms the state-of-the-art methods by orders of magnitude and is extended to existing distributed graph processing frameworks to improve its scalability.
Abstract: SimRank is an important measure of vertex-pair similarity according to the structure of graphs. Although progress has been achieved, existing methods still face challenges to handle large graphs. Besides huge index construction and maintenance cost, existing methods may require considerable search space and time overheads in the online SimRank query. In this paper, we design a Monte Carlo based method, UniWalk, to enable the fast top- $k$ SimRank computation over large undirected graphs. UniWalk directly locates the top- $k$ similar vertices for any single source vertex $u$ via $R$ sampling paths originating from $u$ , which avoids selecting candidate vertex set $\mathcal{C}$ and the following $O(|\mathcal{C}|R)$ bidirectional sampling paths. We also devise a path enumeration strategy to improve the SimRank precision by using path probabilities instead of path frequencies when sampling, a space-efficient method to reduce intermediate results, and a path-sharing strategy to lower the redundant path sampling cost for multiple source vertices. Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability. We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude.

14 citations


Journal ArticleDOI
01 Feb 2018
TL;DR: The efficient dynamical computation of all-pairs SimRanks on time-varying graphs is studied and it is shown that the SimRank update in response to every link update is expressible as a rank-one Sylvester matrix equation.
Abstract: SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838---849, 2015) and READS (Zhang et al. in PVLDB 10(5):601---612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.'s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding $$O({r}^{4}n^2)$$O(r4n2) time and $$O({r}^{2}n^2)$$O(r2n2) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update $${\varvec{\Delta }}{} \mathbf{S}$$ΔS in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring $$O(Kn^2)$$O(Kn2) time and $$O(n^2)$$O(n2) memory in the worst case to update $$n^2$$n2 pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the "affected areas" of $${\varvec{\Delta }}{} \mathbf{S}$$ΔS to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to $$O(K(m+|{\textsf {AFF}}|))$$O(K(m+|AFF|)), where m is the number of edges in the old graph, and $$|{\textsf {AFF}}| \ (\le n^2)$$|AFF|(≤n2) is the size of "affected areas" in $${\varvec{\Delta }}{} \mathbf{S}$$ΔS, and in practice, $$|{\textsf {AFF}}| \ll n^2$$|AFF|źn2. (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles "similar sink edges" simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just $$O(Kn+m)$$O(Kn+m) memory, without the need to store all $$(n^2)$$(n2) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.

12 citations


Journal ArticleDOI
TL;DR: This paper proposes a method for community discovery using distributed robust NMF with SimRank similarity measure that has better performance and robustness and good scalability and hence can be used to discover communities in the large-scale complex networks.
Abstract: Nonnegative matrix factorization (NMF) has become a powerful model for community discovery in complex networks Existing NMF-based methods for community discovery often factorize the corresponding adjacent matrix of complex networks to obtain its community indicator matrix However, the adjacent matrix cannot represent the global structure feature of complex networks very well, and this leads to the performance degradation of community discovery Besides, most of existing methods are not robust and scalable enough, so they are not effective to deal with complex networks with noises and large scales Aiming at these problems above, in this paper we propose a method for community discovery using distributed robust NMF with SimRank similarity measure This method selects SimRank measure to construct the feature matrix, which can more accurately represent the global structure feature of complex networks To improve the robustness, we select $$\ell _{2,1}$$ norm instead of the widely used Frobenius norm to construct its NMF-based community discovery model In addition, to improve the scalability, we implement its key components by using MapReduce distributed computing framework, including computing SimRank feature matrix and iteratively solving the NMF-based model for community discovery We conduct extensive experiments on several typical complex networks The results show that our method has better performance and robustness than other representative NMF-based methods for community discovery Moreover, our method presents good scalability and hence can be used to discover communities in the large-scale complex networks

10 citations


Journal ArticleDOI
TL;DR: This article presents a novel service clustering approach that adopts a bipartite network to describe the topological structure of service usage histories and uses a SimRank algorithm to measure theTopological similarity of services.
Abstract: This article describes how the number of services and their types being so numerous makes accurately discovering desired services become a problem. Service clustering is an effective way to facilitate service discovery. However, the existing approaches are usually designed for a single type of service documents, neglecting to fully use the topic and topological information in service profiles and usage histories. To avoid these limitations, this article presents a novel service clustering approach. It adopts a bipartite network to describe the topological structure of service usage histories and uses a SimRank algorithm to measure the topological similarity of services; It applies Latent Dirichlet Allocation to extract topics from service profiles and further quantifies the topic similarity of services; It quantifies the similarity of services by integrating topological and topic similarities; It uses the Chameleon clustering algorithm to cluster the services. The empirical evaluation on real-world data set highlights the benefits provided by the combination of topological and topic similarities.

10 citations


Proceedings ArticleDOI
01 Mar 2018
TL;DR: This paper discusses the similarity search algorithms, PathSim and SimRank, and suggests that the efficiency of the website improves if the algorithms are used in respective scenarios.
Abstract: Recommender systems and web search engines have gained a lot of importance in today's digital platform. In today's digital world everything (from buying to selling) has come to internet platform. Due to huge amount of data large scale processing is required. Today large amount of data is obtained from e-commerce services, application data, web data etc. This large-scale data processing involves many similarity search algorithms for giving recommendations. Many e-commerce services and applications use similarity search for giving valuable suggestions and showing the related documents. In this paper, we discuss the similarity search algorithms, PathSim and SimRank. We compare and contrast both the algorithms by taking different datasets. We suggest that the efficiency of the website improves if the algorithms are used in respective scenarios. The time complexities of both the algorithms are compared to check.

8 citations


Patent
16 Oct 2018
TL;DR: In this paper, a collaborative filtering video recommendation method for considering user preference dynamic changes is proposed, which comprises the steps of data pre-processing, model training and sorting, wherein the data preprocessing is mainly that original data is processed to generate a formative leaning sample set required for model training; and a training model mainly learns user characteristics and video characteristics according to generated samples, and is mainly composed of a parameter matrix, a BPR model and a SimRank model.
Abstract: The invention discloses a collaborative filtering video recommendation method for considering user preference dynamic changes The method comprises the steps of data pre-processing, model training andsorting, wherein the data pre-processing is mainly that original data is processed to generate a formative leaning sample set required for model training; and a training model mainly learns user characteristics and video characteristics according to generated samples, and is mainly composed of a parameter matrix, a BPR model and a SimRank model When a system is ready to recommend videos to users, a recommendation engine firstly reads the users and videos recorded by a background and corresponding metadata into a pre-processing module; then a training module firstly initializes to-be-learnedcharacteristic parameters, BPR leaning and SimRank learning are carried out respectively on input corresponding leaning samples according to the data pre-processing module; and lastly, the videos aresorted and recommended according to the trained user characteristics and video characteristics The collaborative filtering video recommendation method for considering the user preference dynamic changes has the advantages that under the condition of not increasing the time complexity, the user preference is modeled dynamically, thereby improving the accuracy of recommendation

7 citations


Journal ArticleDOI
TL;DR: The heterogeneous information network is introduced to build a weighted travel network with spatial–temporal GPS trajectories and shows that a meta-path combination is more effective than the state-of-the-art approaches and can be efficiently computed.
Abstract: To provide travel recommendations and planning in the intelligent transportation system (ITS), we must have the ability to find similar travel patterns among users based on their real mobility traces. To measure the similarity of user’s travel behavior, various methods have been proposed, but they usually only rely on a single attributes-related metric. In comparison, studies of the semantic relationships between travel attributes remain scarce, making it difficult to construct a complete mobility pattern that reveals the relevance between users or groups. In this paper, we introduced the heterogeneous information network to build a weighted travel network with spatial–temporal GPS trajectories. The heterogeneous network allows clustering the similar users based on the connections between different attributes instead of attribute values. On this basis, we defined the meta-paths for travel and used each meta-path to formulate a similarity measure over users by improving existing PathSim (Meta-path-based similarity measures) and SimRank. Next, we aggregated different similarities, where each meta-path was automatically weighted by the learning algorithm to make predictions. The experimental results showed that the recall of the similarity measurement algorithm using multiple meta-paths has improved, which yielded better results than the performance of the algorithm using a single meta-path. The performance of the improved PathSim model under different scales of data was 15% higher than the performance of the improved SimRank model in terms of precision and 21% higher in terms of recall. Due to the area under curve values, our experiments also show that a meta-path combination is more effective than the state-of-the-art approaches and can be efficiently computed.

6 citations


Journal ArticleDOI
Jinshan Qi1, Xun Liang1, Xiaoping Zhou1, Zhiyu Li1, Yu Liu1, Hengchao Cheng1 
07 May 2018-PLOS ONE
TL;DR: A novel edge weighting method, which balances both local and global weighting based on the idea of shared neighbor ranging between users and the interpersonal significance of the social network community, and which outperforms several conventional weighting methods.
Abstract: Community discovery is one of the most popular issues in analyzing and understanding a network. Previous research suggests that the discovery can be enhanced by assigning weights to the edges of the network. This paper proposes a novel edge weighting method, which balances both local and global weighting based on the idea of shared neighbor ranging between users and the interpersonal significance of the social network community. We assume that users belonging to the same community have similar relationship network structures. By controlling the measure of "neighborhood", this method can adequately adapt to real-world networks. Therefore, the famous similarity calculation method-SimRank-can be regarded as a special case of our method. According to the practical significance of social networks, we propose a new evaluation method that uses the communication rate to measure its divided demerit to better express users' interaction relations than the ordinary modularity Q. Furthermore, the fast Newman algorithm is extended to weighted networks. In addition, we use four real networks in the largest Chinese micro-blog website Sina. The results of experiments demonstrate that the proposed method easily meets the balancing requirements and is more robust to different kinds of networks. The experimental results also indicate that the proposed algorithm outperforms several conventional weighting methods.

5 citations


Journal ArticleDOI
TL;DR: A new computational method based on the SimRank and density-based clustering recommender model for miRNA-disease associations prediction (SRMDAP) is presented, suggesting the excellent performance of the SRMDAP in predicting miRNAs and diseases.
Abstract: Aberrant expression of microRNAs (miRNAs) can be applied for the diagnosis, prognosis, and treatment of human diseases. Identifying the relationship between miRNA and human disease is important to further investigate the pathogenesis of human diseases. However, experimental identification of the associations between diseases and miRNAs is time-consuming and expensive. Computational methods are efficient approaches to determine the potential associations between diseases and miRNAs. This paper presents a new computational method based on the SimRank and density-based clustering recommender model for miRNA-disease associations prediction (SRMDAP). The AUC of 0.8838 based on leave-one-out cross-validation and case studies suggested the excellent performance of the SRMDAP in predicting miRNA-disease associations. SRMDAP could also predict diseases without any related miRNAs and miRNAs without any related diseases.

Book ChapterDOI
15 Nov 2018
TL;DR: How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields, however, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size.
Abstract: How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size.

Book ChapterDOI
17 Sep 2018
TL;DR: This paper provides a comparison of several models for future course grade prediction based on three matrix factorization methods and attempts to improve the existing techniques by combining Matrix factorization with prior knowledge about the similarity between students and courses calculated using the SimRank algorithm.
Abstract: The accurate estimation of students’ grades in prospective courses is important as it can support the procedure of making an informed choice concerning the selection of next semester courses. As a consequence, the process of creating personal academic pathways is facilitated. This paper provides a comparison of several models for future course grade prediction based on three matrix factorization methods. We attempt to improve the existing techniques by combining matrix factorization with prior knowledge about the similarity between students and courses calculated using the SimRank algorithm. The evaluation of the proposed models is conducted on an internal dataset of anonymized student record data.

Proceedings ArticleDOI
17 Oct 2018
TL;DR: This work proposes a graph-theoretic similarity measure that is natively multiperspective, and introduces a novel model for learning and reflecting diverse similarity perceptions given the hypergraph, yielding the similarity score between any pair of objects from any perspective.
Abstract: Determining the similarity between two objects is pertinent to many applications. When the basis for similarity is a set of object-to-object relationships, it is natural to rely on graph-theoretic measures. One seminal technique for measuring the structural-context similarity between a pair of graph vertices is SimRank, whose underlying intuition is that two objects are similar if they are connected by similar objects. However, by design, SimRank as well as its variants capture only a single view or perspective of similarity. Meanwhile, in many real-world scenarios, there emerge multiple perspectives of similarity, i.e., two objects may be similar from one perspective, but dissimilar from another. For instance, human subjects may generate varied, yet valid, clusterings of objects. In this work, we propose a graph-theoretic similarity measure that is natively multiperspective. In our approach, the observed object-to-object relationships due to various perspectives are integrated into a unified graph-based representation, stylised as a hypergraph to retain the distinct perspectives. We then introduce a novel model for learning and reflecting diverse similarity perceptions given the hypergraph, yielding the similarity score between any pair of objects from any perspective. In addition to proposing an algorithm for computing the similarity scores, we also provide theoretical guarantees on the convergence of the algorithm. Experiments on public datasets show that the proposed model deals better with multiperspectivity than the baselines.

Patent
18 Dec 2018
TL;DR: In this paper, a disease-associated LncRNA prediction method and device based on dichotomous network was proposed, wherein the method comprises the following steps: constructing a dichotomyous network based on disease and LncRN according to a data set of a known association relationship between Lnc RNA and disease; calculating disease similarity I and LNCRNA similarity I based on common neighbor; calculating the disease similarity II and L NCRNA similarity II based on SimRank similarity; obtaining extended disease similarity and extended Lnc RNRNA similarity; refluxing the extended disease and extended RN
Abstract: The invention discloses a disease-associated LncRNA prediction method and device based on dichotomous network, wherein the method comprises the following steps: constructing a dichotomous network based on disease and LncRNA according to a data set of a known association relationship between LncRNA and disease; calculating disease similarity I and LncRNA similarity I based on common neighbor; calculating the disease similarity II and LncRNA similarity II based on SimRank similarity; obtaining extended disease similarity and extended LncRNA similarity; refluxing the extended disease similarity and extended LncRNA similarity to binary networks to calculate the degree of association between disease and LncRNA The invention can construct a dichotomous network through the information of known disease-related LncRNA to infer the potential connection between the two, thereby greatly reducing the workload of the experiment

Proceedings ArticleDOI
01 Jan 2018
TL;DR: The goal of the presented system is to identify how the user- item ratings can affect in user friendship relations to make a correct recommendation and the carried out experimental analysis used to evaluate the accuracy of the system.
Abstract: This paper presents a recommender system based on a game theory in which the recommendations are made from user-item ratings. The user-item ratings are the most essential factor for a social network to maintain its social relationships among users. It is not possible for a social network to force all of its users to rate items and such techniques are not formed yet. In this paper, game theory and SimRank (Similarity Based on Random Walk) are used as a core algorithm to build the recommender system. The user-item ratings dataset is decomposed into similar groups based on the user ratings by the game theory. The similarities among the ’similar interest’ users are calculated with the SimRank algorithm. Based on the user similarity information, user profile and rating dataset, the presented system would provide proper recommendation of items to its users. The goal of the presented system is to identify how the user- item ratings can affect in user friendship relations to make a correct recommendation and the carried out experimental analysis used to evaluate the accuracy of the system.

Journal ArticleDOI
TL;DR: A new dimensionality reduction method, called SSPP, that finds a subspace preserving semantic similarity among data represented with SimRank similarity on a bipartite graph that outperforms the baseline and previous methods.