Showing papers on "SimRank published in 2018"

PDF

Open Access

Journal Article•DOI•

Structure-aware Mashup service Clustering for cloud-based Internet of Things using genetic algorithm based clustering algorithm

[...]

Weifeng Pan¹, Chunlai Chai¹•Institutions (1)

Zhejiang Gongshang University¹

01 Oct 2018-Future Generation Computer Systems

TL;DR: This paper proposes a novel Mashup service clustering approach based on a structural similarity and a genetic algorithm based clustering algorithm that can cluster Mashup services efficiently without any constraints on the number of clusters, and its performance is better than other Mashupservice clustering approaches based on semantic metrics.

...read moreread less

48 citations

Proceedings Article•DOI•

Efficient SimRank Tracking in Dynamic Graphs

[...]

Yue Wang¹, Xiang Lian², Lei Chen¹•Institutions (2)

Hong Kong University of Science and Technology¹, Kent State University²

16 Apr 2018

TL;DR: This paper proposes a novel local push based algorithm for computing all-pairs SimRank and shows that its algorithms outperform the state-of-the-art static and dynamic all-Pair SimRank algorithms.

...read moreread less

Abstract: SimRank is a popular link-based similarity measurement among nodes in a graph. To compute the all-pairs SimRank matrix accurately, iterative methods are usually used. For static graphs, current iterative solutions are not efficient enough, both in time and space, due to unnecessary cost and storage by the nature of iterative updating. For dynamic graphs, all current incremental solutions for updating the Sim-Rank matrix are based on an approximated SimRank definition, and thus have no accuracy guarantee. In this paper, we propose a novel local push based algorithm for computing all-pairs SimRank. We show that our algorithms outperform the state-of-the-art static and dynamic all-pairs SimRank algorithms.

...read moreread less

20 citations

Journal Article•DOI•

UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph

[...]

Junshuai Song¹, Xiongcai Luo¹, Jun Gao¹, Chang Zhou², Hu Wei², Jeffery Xu Yu³ - Show less +2 more•Institutions (3)

Peking University¹, Alibaba Group², The Chinese University of Hong Kong³

01 May 2018-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A Monte Carlo based method to enable the fast top-to-bottom SimRank computation over large undirected graphs, which outperforms the state-of-the-art methods by orders of magnitude and is extended to existing distributed graph processing frameworks to improve its scalability.

...read moreread less

Abstract: SimRank is an important measure of vertex-pair similarity according to the structure of graphs. Although progress has been achieved, existing methods still face challenges to handle large graphs. Besides huge index construction and maintenance cost, existing methods may require considerable search space and time overheads in the online SimRank query. In this paper, we design a Monte Carlo based method, UniWalk, to enable the fast top- $k$ SimRank computation over large undirected graphs. UniWalk directly locates the top- $k$ similar vertices for any single source vertex $u$ via $R$ sampling paths originating from $u$ , which avoids selecting candidate vertex set $\mathcal{C}$ and the following $O(|\mathcal{C}|R)$ bidirectional sampling paths. We also devise a path enumeration strategy to improve the SimRank precision by using path probabilities instead of path frequencies when sampling, a space-efficient method to reduce intermediate results, and a path-sharing strategy to lower the redundant path sampling cost for multiple source vertices. Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability. We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude.

...read moreread less

14 citations

Journal Article•DOI•

Dynamical SimRank search on time-varying networks

[...]

Weiren Yu¹, Xuemin Lin², Wenjie Zhang², Julie A. McCann³•Institutions (3)

Aston University¹, University of New South Wales², Imperial College London³

01 Feb 2018

TL;DR: The efficient dynamical computation of all-pairs SimRanks on time-varying graphs is studied and it is shown that the SimRank update in response to every link update is expressible as a rank-one Sylvester matrix equation.

...read moreread less

Abstract: SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838---849, 2015) and READS (Zhang et al. in PVLDB 10(5):601---612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.'s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding $$O({r}^{4}n^2)$$O(r4n2) time and $$O({r}^{2}n^2)$$O(r2n2) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update $${\varvec{\Delta }}{} \mathbf{S}$$ΔS in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring $$O(Kn^2)$$O(Kn2) time and $$O(n^2)$$O(n2) memory in the worst case to update $$n^2$$n2 pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the "affected areas" of $${\varvec{\Delta }}{} \mathbf{S}$$ΔS to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to $$O(K(m+|{\textsf {AFF}}|))$$O(K(m+|AFF|)), where m is the number of edges in the old graph, and $$|{\textsf {AFF}}| \ (\le n^2)$$|AFF|(≤n2) is the size of "affected areas" in $${\varvec{\Delta }}{} \mathbf{S}$$ΔS, and in practice, $$|{\textsf {AFF}}| \ll n^2$$|AFF|źn2. (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles "similar sink edges" simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just $$O(Kn+m)$$O(Kn+m) memory, without the need to store all $$(n^2)$$(n2) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.

...read moreread less

12 citations

Journal Article•DOI•

Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure

[...]

Chaobo He¹, Xiang Fei², Hanchao Li², Yong Tang³, Hai Liu³, Shuangyin Liu¹ - Show less +2 more•Institutions (3)

Zhongkai University of Agriculture and Engineering¹, Coventry University², South China Normal University³

01 Oct 2018-The Journal of Supercomputing

TL;DR: This paper proposes a method for community discovery using distributed robust NMF with SimRank similarity measure that has better performance and robustness and good scalability and hence can be used to discover communities in the large-scale complex networks.

...read moreread less

Abstract: Nonnegative matrix factorization (NMF) has become a powerful model for community discovery in complex networks Existing NMF-based methods for community discovery often factorize the corresponding adjacent matrix of complex networks to obtain its community indicator matrix However, the adjacent matrix cannot represent the global structure feature of complex networks very well, and this leads to the performance degradation of community discovery Besides, most of existing methods are not robust and scalable enough, so they are not effective to deal with complex networks with noises and large scales Aiming at these problems above, in this paper we propose a method for community discovery using distributed robust NMF with SimRank similarity measure This method selects SimRank measure to construct the feature matrix, which can more accurately represent the global structure feature of complex networks To improve the robustness, we select $$\ell _{2,1}$$ norm instead of the widely used Frobenius norm to construct its NMF-based community discovery model In addition, to improve the scalability, we implement its key components by using MapReduce distributed computing framework, including computing SimRank feature matrix and iteratively solving the NMF-based model for community discovery We conduct extensive experiments on several typical complex networks The results show that our method has better performance and robustness than other representative NMF-based methods for community discovery Moreover, our method presents good scalability and hence can be used to discover communities in the large-scale complex networks

...read moreread less

10 citations

Journal Article•DOI•

Topology and Topic-Aware Service Clustering

[...]

Weifeng Pan¹, Jilei Dong², Kun Liu³, Jing Wang⁴•Institutions (4)

Zhejiang Gongshang University¹, University of Connecticut², Hubei University³, Jiangxi University of Finance and Economics⁴

01 Jul 2018-International Journal of Web Services Research

TL;DR: This article presents a novel service clustering approach that adopts a bipartite network to describe the topological structure of service usage histories and uses a SimRank algorithm to measure theTopological similarity of services.

...read moreread less

Abstract: This article describes how the number of services and their types being so numerous makes accurately discovering desired services become a problem. Service clustering is an effective way to facilitate service discovery. However, the existing approaches are usually designed for a single type of service documents, neglecting to fully use the topic and topological information in service profiles and usage histories. To avoid these limitations, this article presents a novel service clustering approach. It adopts a bipartite network to describe the topological structure of service usage histories and uses a SimRank algorithm to measure the topological similarity of services; It applies Latent Dirichlet Allocation to extract topics from service profiles and further quantifies the topic similarity of services; It quantifies the similarity of services by integrating topological and topic similarities; It uses the Chameleon clustering algorithm to cluster the services. The empirical evaluation on real-world data set highlights the benefits provided by the combination of topological and topic similarities.

...read moreread less

10 citations

Proceedings Article•DOI•

Data Mining Techniques used in the Recommendation of E-commerce services

[...]

I. SriUsha¹, K. Rasmitha Choudary¹, Kavitha C. R¹, T. Sasikala¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Mar 2018

TL;DR: This paper discusses the similarity search algorithms, PathSim and SimRank, and suggests that the efficiency of the website improves if the algorithms are used in respective scenarios.

...read moreread less

Abstract: Recommender systems and web search engines have gained a lot of importance in today's digital platform. In today's digital world everything (from buying to selling) has come to internet platform. Due to huge amount of data large scale processing is required. Today large amount of data is obtained from e-commerce services, application data, web data etc. This large-scale data processing involves many similarity search algorithms for giving recommendations. Many e-commerce services and applications use similarity search for giving valuable suggestions and showing the related documents. In this paper, we discuss the similarity search algorithms, PathSim and SimRank. We compare and contrast both the algorithms by taking different datasets. We suggest that the efficiency of the website improves if the algorithms are used in respective scenarios. The time complexities of both the algorithms are compared to check.

...read moreread less

8 citations

Patent•

Collaborative filtering video recommendation method for considering user preference dynamic changes

[...]

Shan Cunyu, Ye Baoliu, Lu Sanglu

16 Oct 2018

TL;DR: In this paper, a collaborative filtering video recommendation method for considering user preference dynamic changes is proposed, which comprises the steps of data pre-processing, model training and sorting, wherein the data preprocessing is mainly that original data is processed to generate a formative leaning sample set required for model training; and a training model mainly learns user characteristics and video characteristics according to generated samples, and is mainly composed of a parameter matrix, a BPR model and a SimRank model.

...read moreread less

Abstract: The invention discloses a collaborative filtering video recommendation method for considering user preference dynamic changes The method comprises the steps of data pre-processing, model training andsorting, wherein the data pre-processing is mainly that original data is processed to generate a formative leaning sample set required for model training; and a training model mainly learns user characteristics and video characteristics according to generated samples, and is mainly composed of a parameter matrix, a BPR model and a SimRank model When a system is ready to recommend videos to users, a recommendation engine firstly reads the users and videos recorded by a background and corresponding metadata into a pre-processing module; then a training module firstly initializes to-be-learnedcharacteristic parameters, BPR leaning and SimRank learning are carried out respectively on input corresponding leaning samples according to the data pre-processing module; and lastly, the videos aresorted and recommended according to the trained user characteristics and video characteristics The collaborative filtering video recommendation method for considering the user preference dynamic changes has the advantages that under the condition of not increasing the time complexity, the user preference is modeled dynamically, thereby improving the accuracy of recommendation

...read moreread less

7 citations

Journal Article•DOI•

[...]

Lei Tang¹, Yaling Zhao¹, Zongtao Duan¹, Chen Jun¹•Institutions (1)

Chang'an University¹

09 Nov 2018-IEEE Access

TL;DR: The heterogeneous information network is introduced to build a weighted travel network with spatial–temporal GPS trajectories and shows that a meta-path combination is more effective than the state-of-the-art approaches and can be efficiently computed.

...read moreread less

Abstract: To provide travel recommendations and planning in the intelligent transportation system (ITS), we must have the ability to find similar travel patterns among users based on their real mobility traces. To measure the similarity of user’s travel behavior, various methods have been proposed, but they usually only rely on a single attributes-related metric. In comparison, studies of the semantic relationships between travel attributes remain scarce, making it difficult to construct a complete mobility pattern that reveals the relevance between users or groups. In this paper, we introduced the heterogeneous information network to build a weighted travel network with spatial–temporal GPS trajectories. The heterogeneous network allows clustering the similar users based on the connections between different attributes instead of attribute values. On this basis, we defined the meta-paths for travel and used each meta-path to formulate a similarity measure over users by improving existing PathSim (Meta-path-based similarity measures) and SimRank. Next, we aggregated different similarities, where each meta-path was automatically weighted by the learning algorithm to make predictions. The experimental results showed that the recall of the similarity measurement algorithm using multiple meta-paths has improved, which yielded better results than the performance of the algorithm using a single meta-path. The performance of the improved PathSim model under different scales of data was 15% higher than the performance of the improved SimRank model in terms of precision and 21% higher in terms of recall. Due to the area under curve values, our experiments also show that a meta-path combination is more effective than the state-of-the-art approaches and can be efficiently computed.

...read moreread less

6 citations

Journal Article•DOI•

Micro-blog user community discovery using generalized SimRank edge weighting method.

[...]

Jinshan Qi¹, Xun Liang¹, Xiaoping Zhou¹, Zhiyu Li¹, Yu Liu¹, Hengchao Cheng¹ - Show less +2 more•Institutions (1)

Renmin University of China¹

07 May 2018-PLOS ONE

TL;DR: A novel edge weighting method, which balances both local and global weighting based on the idea of shared neighbor ranging between users and the interpersonal significance of the social network community, and which outperforms several conventional weighting methods.

...read moreread less

Abstract: Community discovery is one of the most popular issues in analyzing and understanding a network. Previous research suggests that the discovery can be enhanced by assigning weights to the edges of the network. This paper proposes a novel edge weighting method, which balances both local and global weighting based on the idea of shared neighbor ranging between users and the interpersonal significance of the social network community. We assume that users belonging to the same community have similar relationship network structures. By controlling the measure of "neighborhood", this method can adequately adapt to real-world networks. Therefore, the famous similarity calculation method-SimRank-can be regarded as a special case of our method. According to the practical significance of social networks, we propose a new evaluation method that uses the communication rate to measure its divided demerit to better express users' interaction relations than the ordinary modularity Q. Furthermore, the fast Newman algorithm is extended to weighted networks. In addition, we use four real networks in the largest Chinese micro-blog website Sina. The results of experiments demonstrate that the proposed method easily meets the balancing requirements and is more robust to different kinds of networks. The experimental results also indicate that the proposed algorithm outperforms several conventional weighting methods.

...read moreread less

5 citations

Journal Article•DOI•

SRMDAP: SimRank and Density-Based Clustering Recommender Model for miRNA-Disease Association Prediction

[...]

Xiaoying Li¹, Yaping Lin¹, Changlong Gu¹, Zejun Li², Zejun Li¹ - Show less +1 more•Institutions (2)

Hunan University¹, Hunan Institute of Technology²

21 Mar 2018-BioMed Research International

TL;DR: A new computational method based on the SimRank and density-based clustering recommender model for miRNA-disease associations prediction (SRMDAP) is presented, suggesting the excellent performance of the SRMDAP in predicting miRNAs and diseases.

...read moreread less

Abstract: Aberrant expression of microRNAs (miRNAs) can be applied for the diagnosis, prognosis, and treatment of human diseases. Identifying the relationship between miRNA and human disease is important to further investigate the pathogenesis of human diseases. However, experimental identification of the associations between diseases and miRNAs is time-consuming and expensive. Computational methods are efficient approaches to determine the potential associations between diseases and miRNAs. This paper presents a new computational method based on the SimRank and density-based clustering recommender model for miRNA-disease associations prediction (SRMDAP). The AUC of 0.8838 based on leave-one-out cross-validation and case studies suggested the excellent performance of the SRMDAP in predicting miRNA-disease associations. SRMDAP could also predict diseases without any related miRNAs and miRNAs without any related diseases.

...read moreread less

Book Chapter•DOI•

A Parallel Method for All-Pair SimRank Similarity Computation.

[...]

Xuan Huang¹, Xingkun Gao¹, Jie Tang¹, Gangshan Wu¹•Institutions (1)

Nanjing University¹

15 Nov 2018

TL;DR: How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields, however, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size.

...read moreread less

Abstract: How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size.

...read moreread less

Book Chapter•DOI•

Initialization of Matrix Factorization Methods for University Course Recommendations Using SimRank Similarities.

[...]

Alisa Krstova¹, Bozhidar Stevanoski¹, Marija Mihova¹, Vangel V. Ajanovski¹•Institutions (1)

Saints Cyril and Methodius University of Skopje¹

17 Sep 2018

TL;DR: This paper provides a comparison of several models for future course grade prediction based on three matrix factorization methods and attempts to improve the existing techniques by combining Matrix factorization with prior knowledge about the similarity between students and courses calculated using the SimRank algorithm.

...read moreread less

Abstract: The accurate estimation of students’ grades in prospective courses is important as it can support the procedure of making an informed choice concerning the selection of next semester courses. As a consequence, the process of creating personal academic pathways is facilitated. This paper provides a comparison of several models for future course grade prediction based on three matrix factorization methods. We attempt to improve the existing techniques by combining matrix factorization with prior knowledge about the similarity between students and courses calculated using the SimRank algorithm. The evaluation of the proposed models is conducted on an internal dataset of anonymized student record data.

...read moreread less

Proceedings Article•DOI•

Multiperspective Graph-Theoretic Similarity Measure

[...]

Dung D. Le¹, Hady W. Lauw¹•Institutions (1)

Singapore Management University¹

17 Oct 2018

TL;DR: This work proposes a graph-theoretic similarity measure that is natively multiperspective, and introduces a novel model for learning and reflecting diverse similarity perceptions given the hypergraph, yielding the similarity score between any pair of objects from any perspective.

...read moreread less

Abstract: Determining the similarity between two objects is pertinent to many applications. When the basis for similarity is a set of object-to-object relationships, it is natural to rely on graph-theoretic measures. One seminal technique for measuring the structural-context similarity between a pair of graph vertices is SimRank, whose underlying intuition is that two objects are similar if they are connected by similar objects. However, by design, SimRank as well as its variants capture only a single view or perspective of similarity. Meanwhile, in many real-world scenarios, there emerge multiple perspectives of similarity, i.e., two objects may be similar from one perspective, but dissimilar from another. For instance, human subjects may generate varied, yet valid, clusterings of objects. In this work, we propose a graph-theoretic similarity measure that is natively multiperspective. In our approach, the observed object-to-object relationships due to various perspectives are integrated into a unified graph-based representation, stylised as a hypergraph to retain the distinct perspectives. We then introduce a novel model for learning and reflecting diverse similarity perceptions given the hypergraph, yielding the similarity score between any pair of objects from any perspective. In addition to proposing an algorithm for computing the similarity scores, we also provide theoretical guarantees on the convergence of the algorithm. Experiments on public datasets show that the proposed model deals better with multiperspectivity than the baselines.

...read moreread less

Patent•

Disease-associated LncRNA prediction method and device based on dichotomous network

[...]

Wang Lei, Jingwen Yu, Kuang Lin'ai, Xuan Zhanwei, Li Xueyong, Zhiping Chen - Show less +2 more

18 Dec 2018

TL;DR: In this paper, a disease-associated LncRNA prediction method and device based on dichotomous network was proposed, wherein the method comprises the following steps: constructing a dichotomyous network based on disease and LncRN according to a data set of a known association relationship between Lnc RNA and disease; calculating disease similarity I and LNCRNA similarity I based on common neighbor; calculating the disease similarity II and L NCRNA similarity II based on SimRank similarity; obtaining extended disease similarity and extended Lnc RNRNA similarity; refluxing the extended disease and extended RN

...read moreread less

Abstract: The invention discloses a disease-associated LncRNA prediction method and device based on dichotomous network, wherein the method comprises the following steps: constructing a dichotomous network based on disease and LncRNA according to a data set of a known association relationship between LncRNA and disease; calculating disease similarity I and LncRNA similarity I based on common neighbor; calculating the disease similarity II and LncRNA similarity II based on SimRank similarity; obtaining extended disease similarity and extended LncRNA similarity; refluxing the extended disease similarity and extended LncRNA similarity to binary networks to calculate the degree of association between disease and LncRNA The invention can construct a dichotomous network through the information of known disease-related LncRNA to infer the potential connection between the two, thereby greatly reducing the workload of the experiment

...read moreread less

Proceedings Article•DOI•

A Framework for Recommender System Based on Game Theory in Social Networks

[...]

Lu Yang¹, Tao Hong², Anilkmar Kothalil Gopalakrishnan²•Institutions (2)

Jimei University¹, Assumption University²

01 Jan 2018

TL;DR: The goal of the presented system is to identify how the user- item ratings can affect in user friendship relations to make a correct recommendation and the carried out experimental analysis used to evaluate the accuracy of the system.

...read moreread less

Abstract: This paper presents a recommender system based on a game theory in which the recommendations are made from user-item ratings. The user-item ratings are the most essential factor for a social network to maintain its social relationships among users. It is not possible for a social network to force all of its users to rate items and such techniques are not formed yet. In this paper, game theory and SimRank (Similarity Based on Random Walk) are used as a core algorithm to build the recommender system. The user-item ratings dataset is decomposed into similar groups based on the user ratings by the game theory. The similarities among the ’similar interest’ users are calculated with the SimRank algorithm. Based on the user similarity information, user profile and rating dataset, the presented system would provide proper recommendation of items to its users. The goal of the presented system is to identify how the user- item ratings can affect in user friendship relations to make a correct recommendation and the carried out experimental analysis used to evaluate the accuracy of the system.

...read moreread less

Journal Article•DOI•

[...]

Atsushi Tatsuma¹, Masaki Aono¹•Institutions (1)

Toyohashi University of Technology¹

05 Jan 2018-Ieej Transactions on Electrical and Electronic Engineering

TL;DR: A new dimensionality reduction method, called SSPP, that finds a subspace preserving semantic similarity among data represented with SimRank similarity on a bipartite graph that outperforms the baseline and previous methods.

...read moreread less