Showing papers on "SimRank published in 2015"

PDF

Open Access

Proceedings Article•DOI•

Uncovering Crowdsourced Manipulation of Online Reviews

[...]

Amir Fayazi¹, Kyumin Lee², James Caverlee¹, Anna Squicciarini³•Institutions (3)

Texas A&M University¹, Utah State University², Pennsylvania State University³

09 Aug 2015

TL;DR: A novel sampling method for identifying products that have been targeted for manipulation and a seed set of deceptive reviewers who have been enlisted through crowdsourcing platforms are proposed, outperforming both traditional detection methods and a SimRank-based alternative clustering approach.

...read moreread less

Abstract: Online reviews are a cornerstone of consumer decision making. However, their authenticity and quality has proven hard to control, especially as polluters target these reviews toward promoting products or in degrading competitors. In a troubling direction, the widespread growth of crowdsourcing platforms like Mechanical Turk has created a large-scale, potentially difficult-to-detect workforce of malicious review writers. Hence, this paper tackles the challenge of uncovering crowdsourced manipulation of online reviews through a three-part effort: (i) First, we propose a novel sampling method for identifying products that have been targeted for manipulation and a seed set of deceptive reviewers who have been enlisted through crowdsourcing platforms. (ii) Second, we augment this base set of deceptive reviewers through a reviewer-reviewer graph clustering approach based on a Markov Random Field where we define individual potentials (of single reviewers) and pair potentials (between two reviewers). (iii) Finally, we embed the results of this probabilistic model into a classification framework for detecting crowd-manipulated reviews. We find that the proposed approach achieves up to 0.96 AUC, outperforming both traditional detection methods and a SimRank-based alternative clustering approach.

...read moreread less

82 citations

Journal Article•DOI•

Efficient partial-pairs simrank search on large networks

[...]

Weiren Yu¹, Julie A. McCann¹•Institutions (1)

Imperial College London¹

01 Jan 2015

TL;DR: A novel "seed germination" model that computes partial-pairs SimRank in O(k|E| min{|A|, |B|}) time and O(|E | + k|V|) memory for k iterations on a graph of |V| nodes and |E| edges, allowing scores to be assessed accurately on graphs with tens of millions of links.

...read moreread less

Abstract: The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that "two nodes are similar if their in-neighbors are similar". While most existing work retrieving SimRank only considers all-pairs SimRank s(*, *) and single-source SimRank s(*, j) (scores between every node and query j), there are appealing applications for partial-pairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partial-pairs SimRank assessment aims to retrieve only {s(a, b)}∀aeA,∀beB. However, the best-known solution appears not self-contained since it hinges on the premise that the SimRank scores with node-pairs in an h-go cover set must be given beforehand.This paper focuses on efficient assessment of partial-pairs SimRank in a self-contained manner. (1) We devise a novel "seed germination" model that computes partial-pairs SimRank in O(k|E| min{|A|, |B|}) time and O(|E| + k|V|) memory for k iterations on a graph of |V| nodes and |E| edges. (2) We further eliminate unnecessary edge access to improve the time of partial-pairs SimRank to O(m min{|A|, |B|}), where m ≤ min{k|E|, Δ2k}, and Δ is the maximum degree. (3) We show that our partial-pairs SimRank model also can handle the computations of all-pairs and single-source SimRanks. (4) We empirically verify that our algorithms are (a) 38x faster than the best-known competitors, and (b) memory-efficient, allowing scores to be assessed accurately on graphs with tens of millions of links.

...read moreread less

63 citations

Journal Article•DOI•

[...]

Yingxia Shao¹, Bin Cui¹, Lei Chen², Mingming Liu¹, Xing Xie³ - Show less +1 more•Institutions (3)

Peking University¹, Hong Kong University of Science and Technology², Microsoft³

01 Apr 2015

TL;DR: This paper proposes a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search) and demonstrates that TSF can handle dynamic billion-edge graphs with high performance.

...read moreread less

Abstract: SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large dynamic graphs. In this paper, we propose a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search). In the preprocessing stage, TSF samples a set of one-way graphs to index raw random walks in a novel manner within O(NRg) time and space, where N is the number of vertices and Rg is the number of one-way graphs. The one-way graph can be efficiently updated in accordance with the graph modification, thus TSF is well suited to dynamic graphs. During the query stage, TSF can search similar vertices fast by naturally pruning unqualified vertices based on the connectivity of one-way graphs. Furthermore, with additional Rq samples, TSF can estimate the SimRank score with probability [EQUATION] if the error of approximation is bounded by 1 -- e. Finally, to guarantee the scalability of TSF, the one-way graphs can also be compactly stored on the disk when the memory is limited. Extensive experiments have demonstrated that TSF can handle dynamic billion-edge graphs with high performance.

...read moreread less

57 citations

Proceedings Article•DOI•

An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data

[...]

Phuong T. Nguyen¹, Paolo Tomeo¹, Tommaso Di Noia¹, Eugenio Di Sciascio¹•Institutions (1)

Polytechnic University of Bari¹

18 May 2015

TL;DR: Two existing metrics, SimRank and PageRank, are reviewed and investigated and their suitability and performance for computing similarity between resources in RDF graphs and their usage to feed a content-based recommender system are investigated.

...read moreread less

Abstract: The Web of Data is the natural evolution of the World Wide Web from a set of interlinked documents to a set of interlinked entities. It is a graph of information resources interconnected by semantic relations, thereby yielding the name Linked Data. The proliferation of Linked Data is for sure an opportunity to create a new family of data-intensive applications such as recommender systems. In particular, since content-based recommender systems base on the notion of similarity between items, the selection of the right graph-based similarity metric is of paramount importance to build an effective recommendation engine. In this paper, we review two existing metrics, SimRank and PageRank, and investigate their suitability and performance for computing similarity between resources in RDF graphs and investigate their usage to feed a content-based recommender system. Finally, we conduct experimental evaluations on a dataset for musical artists and bands recommendations thus comparing our results with two other content-based baselines measuring their performance with precision and recall, catalog coverage, items distribution and novelty metrics.

...read moreread less

54 citations

Journal Article•DOI•

Walking in the cloud: parallel SimRank at scale

[...]

Zhenguo Li¹, Yixiang Fang², Qin Liu³, Jiefeng Cheng¹, Reynold Cheng², John C. S. Lui³ - Show less +2 more•Institutions (3)

Huawei¹, University of Hong Kong², The Chinese University of Hong Kong³

01 Sep 2015

TL;DR: This work is the first to report results on clue-web, which is 10x larger than the largest graph ever reported for SimRank computation, and is orders of magnitude more efficient and scalable than existing solutions for large-scale problems.

...read moreread less

Abstract: Despite its popularity, SimRank is computationally costly, in both time and space. In particular, its recursive nature poses a great challenge in using modern distributed computing power, and also prevents querying similarities individually. Existing solutions suffer greatly from these practical issues. In this paper, we break such dependency for maximum efficiency possible. Our method consists of offline and online phases. In offline phase, a length-n indexing vector is derived by solving a linear system in parallel. At online query time, the similarities are computed instantly from the index vector. Throughout, the Monte Carlo method is used to maximally reduce time and space. Our algorithm, called CloudWalker, is highly parallelizable, with only linear time and space. Remarkably, it responses to both single-pair and single-source queries in constant time. CloudWalker is orders of magnitude more efficient and scalable than existing solutions for large-scale problems. Implemented on Spark with 10 machines and tested on the web-scale clue-web graph with 1 billion nodes and 43 billion edges, it takes 110 hours for offline indexing, 64 seconds for a single-pair query, and 188 seconds for a single-source query. To the best of our knowledge, our work is the first to report results on clue-web, which is 10x larger than the largest graph ever reported for SimRank computation.

...read moreread less

53 citations

Journal Article•DOI•

[...]

Mingxi Zhang¹, Hao Hu¹, Zhenying He¹, Wei Wang¹•Institutions (1)

Fudan University¹

01 Feb 2015-Expert Systems With Applications

TL;DR: A structural-based similarity measure, NetSim, towards efficiently computing similarity between centers in an x-star network, which requires less time and space cost than existing methods since the scale of attribute network is significantly smaller than the whole x- star network.

...read moreread less

Abstract: The efficiency improvement is evident for similarity computation.The effectiveness of returned result is good for similarity search.The pruning algorithm is presented for supporting fast online query processing.The accuracy loss of pruning algorithm can be controlled by setting thresholds. An x-star network is an information network which consists of centers with connections among themselves, and different type attributes linking to these centers. As x-star networks become ubiquitous, extracting knowledge from x-star networks has become an important task. Similarity search in x-star network aims to find the centers similar to a given query center, which has numerous applications including collaborative filtering, community mining and web search. Although existing methods yield promising similar results, such as SimRank and P-Rank, they are not applicable for massive x-star networks. In this paper, we propose a structural-based similarity measure, NetSim, towards efficiently computing similarity between centers in an x-star network. The similarity between attributes is computed in the pre-processing stage by the expected meeting probability over attribute network that is extracted from the whole structure of x-star network. The similarity between centers is computed online according to the attribute similarities based on the intuition that similar centers are linked with similar attributes. NetSim requires less time and space cost than existing methods since the scale of attribute network is significantly smaller than the whole x-star network. For supporting fast online query processing, we develop a pruning algorithm by building a pruning index, which prunes candidate centers that are not promising. Extensive experiments demonstrate the effectiveness and efficiency of our method through comparing with the state-of-the-art measures.

...read moreread less

36 citations

Proceedings Article•DOI•

High Quality Graph-Based Similarity Search

[...]

Weiren Yu¹, Julie A. McCann¹•Institutions (1)

Imperial College London¹

09 Aug 2015

TL;DR: The scheme, SR#, is efficient and semantically meaningful, and gives mathematical insights to the semantic difference between SimRank and its variant, and correct an argument: if D is replaced by a scaled identity matrix, top-K rankings will not be affected much.

...read moreread less

Abstract: SimRank is an influential link-based similarity measure that has been used in many fields of Web search and sociometry. The best-of-breed method by Kusumoto et. al., however, does not always deliver high-quality results, since it fails to accurately obtain its diagonal correction matrix D. Besides, SimRank is also limited by an unwanted "connectivity trait": increasing the number of paths between nodes a and b often incurs a decrease in score s(a,b). The best-known solution, SimRank++, cannot resolve this problem, since a revised score will be zero if a and b have no common in-neighbors. In this paper, we consider high-quality similarity search. Our scheme, SR#, is efficient and semantically meaningful: (1) We first formulate the exact D, and devise a "varied-D" method to accurately compute SimRank in linear memory. Moreover, by grouping computation, we also reduce the time of from quadratic to linear in the number of iterations. (2) We design a "kernel-based" model to improve the quality of SimRank, and circumvent the "connectivity trait" issue. (3) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument: "if D is replaced by a scaled identity matrix, top-K rankings will not be affected much". The experiments confirm that SR# can accurately extract high-quality scores, and is much faster than the state-of-the-art competitors.

...read moreread less

35 citations

Journal Article•DOI•

Probabilistic SimRank computation over uncertain graphs

[...]

Lingxia Du¹, Cuiping Li¹, Hong Chen¹, Liwen Tan¹, Yinglong Zhang¹ - Show less +1 more•Institutions (1)

Renmin University of China¹

20 Feb 2015-Information Sciences

TL;DR: This paper investigates the problem of node similarity computation on large uncertain graphs and proposes a probabilistic framework to compute it, and proposes an efficient dynamic programming algorithm to degrade the time complexity from exponential to polynomial.

...read moreread less

24 citations

Journal Article•DOI•

ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank

[...]

Hung-Hsuan Chen¹, C. Lee Giles²•Institutions (2)

Industrial Technology Research Institute¹, Pennsylvania State University²

12 Oct 2015-ACM Transactions on Knowledge Discovery From Data

TL;DR: This article argues that SimRank and its families, such as P-Rank and SimRank++, fail to capture similar node pairs in certain conditions, and presents new similarity measures ASCOS and ASCOS++ to address the problem.

...read moreread less

Abstract: In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as P-Rank and SimRank++, fail to capture similar node pairs in certain conditions, especially when two nodes can only reach each other through paths of odd lengths. We present new similarity measures ASCOS and ASCOS++ to address the problem. ASCOS outputs a more complete similarity score than SimRank and SimRank’s families. ASCOS++ enriches ASCOS to include edge weight into the measure, giving all edges and network weights an opportunity to make their contribution. We show that both ASCOS++ and ASCOS can be reformulated and applied on a distributed environment for parallel contribution. Experimental results show that ASCOS++ reports a better score than SimRank and several famous similarity measures. Finally, we re-examine previous use cases of SimRank, and explain appropriate and inappropriate use cases. We suggest future SimRank users following the rules proposed here before naively applying it. We also discuss the relationship between ASCOS++ and PageRank.

...read moreread less

22 citations

Proceedings Article•DOI•

Scalable SimRank join algorithm

[...]

Takanori Maehara, Mitsuru Kusumoto¹, Ken-ichi Kawarabayashi•Institutions (1)

National Institute of Informatics¹

13 Apr 2015

TL;DR: This paper proposes a scalable approximation algorithm with an arbitrary accuracy for the similarity join problem with the SimRank similarity measure that scales up to the network of 5M vertices and 70M edges.

...read moreread less

Abstract: Similarity join finds all pairs of objects (i, j) with similarity score s(i, j) greater than some specified threshold θ. This is a fundamental query problem in the database research community, and is used in many practical applications, such as duplicate detection, merge/purge, record linkage, object matching, and reference conciliation.

...read moreread less

21 citations

Journal Article•DOI•

Fast All-Pairs SimRank Assessment on Large Graphs and Bipartite Domains

[...]

Weiren Yu¹, Xuemin Lin, Wenjie Zhang, Julie A. McCann¹•Institutions (1)

Imperial College London¹

01 Jul 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is devised to accelerate SimRank computation to O(Kd'n2) time, which achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores.

...read moreread less

Abstract: SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in $O(Kmn)$ time on a graph of $n$ vertices and $m$ edges, for $K$ iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy $\epsilon$ , the existing SimRank needs $K=\lceil \log _C \,\epsilon \rceil$ iterations, where $C$ is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is then devised to accelerate SimRank computation to $O(K d^{\prime } n^2)$ time, where $d^{\prime }$ is typically much smaller than $\frac{m}{n}$ . (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from $O(Kmn)$ to $O(Km^{\prime }n)$ time, where $m^{\prime } \ ({\le} m)$ is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than $m$ . Using real and synthetic data, we empirically verify that (1) our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude; (2) the revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores; (3) our finer-grained partial max memoization for the Minimax SimRank variation in bipartite domains is 5X-12X faster than the baselines.

...read moreread less

Proceedings Article•DOI•

Gauging Correct Relative Rankings For Similarity Search

[...]

Weiren Yu¹, Julie A. McCann¹•Institutions (1)

Imperial College London¹

17 Oct 2015

TL;DR: This paper proposes efficient ranking criteria that can secure correct relative orders of node-pairs with respect to SimRank scores when they are computed in an iterative fashion and shows the superiority of this criteria in harvesting top-K Sim Rank scores and bucket orders from a full ranking list.

...read moreread less

Abstract: One of the important tasks in link analysis is to quantify the similarity between two objects based on hyperlink structure. SimRank is an attractive similarity measure of this type. Existing work mainly focuses on absolute SimRank scores, and often harnesses an iterative paradigm to compute them. While these iterative scores converge to exact ones with the increasing number of iterations, it is still notoriously difficult to determine how well the relative orders of these iterative scores can be preserved for a given iteration. In this paper, we propose efficient ranking criteria that can secure correct relative orders of node-pairs with respect to SimRank scores when they are computed in an iterative fashion. Moreover, we show the superiority of our criteria in harvesting top-K SimRank scores and bucket orders from a full ranking list. Finally, viable empirical studies verify the usefulness of our techniques for SimRank top-K ranking and bucket ordering.

...read moreread less

Journal Article•DOI•

A comprehensive structural-based similarity measure in directed graphs

[...]

Mingxi Zhang¹, Hao Hu¹, Zhenying He¹, Liping Gao¹, Liujie Sun² - Show less +1 more•Institutions (2)

Fudan University¹, University of Shanghai for Science and Technology²

01 Nov 2015-Neurocomputing

TL;DR: This paper defines effective relationship strength (ERS) to distinguish link importance by utilizing node activity, node attraction and link frequency, and formalizes ESimRank equation by combining ERS and the expected meeting probabilities of any path length.

...read moreread less

Journal Article•DOI•

Efficient link-based similarity search in web networks

[...]

Mingxi Zhang¹, Hao Hu¹, Zhenying He¹, Liping Gao¹, Liujie Sun² - Show less +1 more•Institutions (2)

Fudan University¹, University of Shanghai for Science and Technology²

01 Dec 2015-Expert Systems With Applications

TL;DR: A link-based similarity search method towards efficiently finding similar entities in web networks, WebSim, which defines the similarity between entities as the 2-hop similarity of SimRank and develops a pruning algorithm to support fast query processing.

...read moreread less

Abstract: The pre-computation cost in the off-line stage is significantly reduced.The efficiency of query processing is optimized by proposing a pruning algorithm.The accuracy loss of pruning algorithm is controlled by tuning threshold.The effectiveness of returned result is effective and acceptable. Similarity search in web networks, aiming to find entities similar to the given entity, is one of the core tasks in network analysis. With the proliferation of web applications, including web search and recommendation system, SimRank has been a well-known measure for evaluating entity similarity in a network. However, the existing work computes SimRank iteratively over a huge similarity matrix, which is expensive in terms of time and space cost and cannot efficiently support similarity search over large networks. In this paper, we propose a link-based similarity search method, WebSim, towards efficiently finding similar entities in web networks. WebSim defines the similarity between entities as the 2-hop similarity of SimRank. To reduce computation cost, we divide the similarity search process into two stages: off-line stage and on-line stage. In the off-line stage, the 1-hop similarities are computed, and an optimized algorithm is designed to reduce the unnecessary accumulation operations on zero similarities. In the on-line stage, the 2-hop similarities are computed, and a pruning algorithm is developed to support fast query processing through searching similar entries from a partial sums index derived from the 1-hop similarities. The index items that are lower than a given threshold are skipped to reduce the searching space. Compared to the iterative SimRank computation, the time and space cost of similarity computation is significantly reduced, since WebSim maintains only the similarity matrix of 1-hop that is much smaller than that of multi-hop. Experiments through comparison with SimRank and its optimized algorithms demonstrate that WebSim has on average a 99.83% reduction in the time cost and a 92.12% reduction in the space cost of similarity computation, and achieves on average 99.98% NDCG.

...read moreread less

Proceedings Article•DOI•

Co-Simmate: Quick Retrieving All Pairwise Co-Simrank Scores

[...]

Yu Weiren, Julie A. McCann¹•Institutions (1)

Imperial College London¹

01 Jul 2015

TL;DR: This study devise a model, Co-Simmate, to speed up the retrieval of all pairs of Co-Simranks to O(log2 (log(1/e))*n^3) time, and integrate it with a matrix decomposition based method on singular graphs to attain higher efficiency.

...read moreread less

Abstract: Co-Simrank is a useful Simrank-like measure of similarity based on graph structure. The existing method iteratively computes each pair of Co-Simrank score from a dot product of two Pagerank vectors, entailing O(log(1/e)*n^3) time to compute all pairs of Co-Simranks in a graph with n nodes, to attain a desired accuracy e. In this study, we devise a model, Co-Simmate, to speed up the retrieval of all pairs of Co-Simranks to O(log2 (log(1/e))*n^3) time. Moreover, we show the optimality of Co-Simmate among other hop-(u^k) variations, and integrate it with a matrix decomposition based method on singular graphs to attain higher efficiency. The viable experiments verify the superiority of Co-Simmate to others.

...read moreread less

Proceedings Article•DOI•

A Novel Edge Weighting Method to Enhance Network Community Detection

[...]

Haiyan Zhang¹, Chenxi Zhou¹, Xun Liang¹, Xi Zhao¹, Yaping Li¹ - Show less +1 more•Institutions (1)

Renmin University of China¹

01 Oct 2015

TL;DR: The problem of the local and global weighting balance is first proposed and the SimRank is next introduced as a novel edge weighting method and the fast Newman algorithm is extended to be applicable for a weighted network.

...read moreread less

Abstract: Community detection is one of the most popular issues in analyzing and understanding the networks. Existing works show that community detection can be enhanced by proper assignments of weights onto the edges of a network. Large numbers of edge weighting schemes have been developed to cope with this problem. However, hardly has a satisfied balance between the local and global weightings been found. In this paper, the problem of the local and global weighting balance is first proposed and discussed. The SimRank is next introduced as a novel edge weighting method. Furthermore, the fast Newman algorithm is extended to be applicable for a weighted network. Combined with the edge weighting techniques, the extended algorithm enhances the performance of the original algorithm significantly through exhaustive experiments. And by comparing with several weighting methods, the experiments demonstrate that the proposed algorithm is superior and more robust for different kinds of networks.

...read moreread less

Posted Content•

Tensor SimRank for Heterogeneous Information Networks.

[...]

Ben Usman, Ivan V. Oseledets

24 Feb 2015-arXiv: Artificial Intelligence

TL;DR: A generalization of SimRank similarity measure for heterogeneous information networks is proposed and it is shown that the intraclass similarity score s(a, b) is high if the set of objects that are related with a are pair-wise similar according to all imposed relations.

...read moreread less

Abstract: We propose a generalization of SimRank similarity measure for heterogeneous information networks. Given the information network, the intraclass similarity score s(a, b) is high if the set of objects that are related with a and the set of objects that are related with b are pair-wise similar according to all imposed relations.

...read moreread less

Journal Article•

Survey of Image Retrieval Techniques and Algorithms for Image-rich Information Networks

[...]

Vishal S. Kore, Bharat Tidke, Pankaj R. Chandre

18 Feb 2015-International Journal of Computer Applications

TL;DR: The concept of image-rich information networks, image retrieval system and techniques like CBIR and TBIR, and the comparative study of image ranking and retrieval algorithms like simrank, k-simRank, HMOK-simrank are explained in this paper.

...read moreread less

Abstract: Social networking sites allow users to share images, Ecommerce web sites also contains millions of images and thus forms image-rich information networks. Retrieving images from image-rich information networks is very challenging task, due to existence of information like text, user, image, feature, tags and group. The concept of image-rich information networks, image retrieval system and techniques like CBIR and TBIR are explained in this paper. The comparative study of image ranking and retrieval algorithms like simrank, k-simrank, HMOK-simrank is also mentioned in this paper. General Terms Image-rich information networks, CBIR, TBIR.

...read moreread less

Patent•

[...]

Li Cuiping

23 Sep 2015

TL;DR: In this article, the authors proposed a node similarity calculation method based on SimRank, which comprises the steps as follows: 1) using an adjacent matrix form to express a multi-relational network; 2) establishing an Eigen-SimRank model and analyzing correlation matrix information needed to calculate node similarity matrix S; 3) calculating the node similarity in the multirelational network according to the correlation Matrix information needed for calculating node similarity matrices S if a network structure is not changed.

...read moreread less

Abstract: The invention relates to a node similarity calculation method based on SimRank. The method comprises the steps as follows: 1) using an adjacent matrix form to express a multi-relational network; using non-iterative node similarity matrix S to express the node similarity of the multi-relational network; 2) establishing an Eigen-SimRank model and analyzing correlation matrix information needed to calculate node similarity matrix S; 3) calculating the node similarity in the multi-relational network according to the correlation matrix information needed to calculate node similarity matrix S if a network structure is not changed; 4) using an Eigen-SimRank dynamic update algorithm to update the correlation matrix information if the network structure is changed and calculating new correlation matrix information needed by a similarity matrix after obtaining the change of network structure; 5) calculating node similarity according to the updated correlation matrix information; 6) analyzing a similarity value among nodes in the multi-relational network according to a similarity calculation result obtained by calculating. The node similarity calculation method based on SimRank of the invention could be widely applied to the field of node similarity calculation in the network structure.

...read moreread less

Book Chapter•DOI•

SimRank Based Top-k Query Aggregation for Multi-Relational Networks

[...]

Jing Xu¹, Cuiping Li¹, Hong Chen¹, Hui Sun¹•Institutions (1)

Renmin University of China¹

08 Jun 2015

TL;DR: This paper adopts SimRank as the similarity computation measure and re-write the original inefficient iterative equation into a non-iterative one, called Eigen-SimRank, which is focused on multi-relational networks.

...read moreread less

Abstract: SimRank is one measure that compute the similarities between nodes in applications, where the returning of top-k query lists is often required. In this paper, we adopt SimRank as the similarity computation measure and re-write the original inefficient iterative equation into a non-iterative one, we call it Eigen-SimRank. We focus on multi-relational networks, where there may exist different kinds of relationships among nodes and query results may change with different perspectives. In order to compute a top-k query list under any perspective especially compound perspective, we suggest dynamic updating algorithm and rank aggregation methods. We evaluate our algorithms in the experiment section.

...read moreread less

Journal Article•

Efficient computation of simrank for static and dynamic datasets using mapreduce framework

[...]

Soujanya Duvvi¹, Venkata Ramana Kondapalli¹•Institutions (1)

Andhra University¹

23 Oct 2015-Journal of Global Research in Computer Sciences

TL;DR: This work uses SimRank to find similarity between neighbours in a contextual way and evaluates in a numerical way, and uses Jaccard Similarly for calculating similarity by using LSH and various other methods.

...read moreread less

Abstract: The growth of data dynamically over the internet and the need to store, access information efficiently brings up new challenges of finding related documents, similar nodes, domain & inter-domain similarities etc. Though SimRank is applicable to wide range of areas, we use this similarity ranking to find similarity between neighbours in a contextual way and evaluate in a numerical way. Here we use Jaccard Similarly for calculating similarity by using LSH and various other methods. We further optimize the Jaccard Algorithm by using Token Optimization join method. The obtained result is further evaluated with a combination of four other parameters and from the result obtained the similarity values of nodes that are greater than the optimal threshold value φ are retrieved from the huge graph.

...read moreread less

Proceedings Article•DOI•

A Framework for Discovering Similar Products from Online Bookstore

[...]

Mingxi Zhang¹, Chao Song¹•Institutions (1)

University of Shanghai for Science and Technology¹

14 Jun 2015

TL;DR: This paper firstly builds the co-purchasing network by using the relationships between different type products, and then compute the similarity between products using SimRank, and gives some experimental results by implementing this method on Amazon dataset.

...read moreread less

Abstract: Online bookstores have attracted millions of people and helped provide them hopeful books. Similarity search over on-line book store mainly focuses on finding the top-K most similar products for a given query. In this paper, we discuss how to find similar products for a given query product, and propose a framework for finding similar products from online bookstore. We firstly build the co-purchasing network by using the relationships between different type products, and then compute the similarity between products using SimRank. Finally, we give some experimental results by implementing this method on Amazon dataset, which demonstrate that the proposed method can find the underlying results over real dataset.

...read moreread less

Journal Article•DOI•

An Emphasized Dual Similarity Measure Integration for Online Image Retrieval System Using SimRank

[...]

Raj Kumar R, Krishnamurthy M

15 Jun 2015-International Journal of Computing Algorithm

TL;DR: This paper proposes a Mok-SimRank to compute link-based similarity and a dual similarity integration algorithm for both link and content based similarity and shows that this approach is significantly better than traditional methods in terms of relevance.

...read moreread less

Abstract: In the real world scenario the use of image grows rapidly, the image rich network is the one that comprises of billions of images. The social media websites, such as Picasa, Flickr and Facebook comprises billions of end user posted images along with their annotations. Similarly the electronic commerce website such as Flipkart, Myntra and Amazon are also furnished with product related images. In this paper, we introduce how to perform efficient and optimum information retrieval in online image rich system. We propose a Mok-SimRank to compute link-based similarity and a dual similarity integration algorithm for both link and content based similarity. Experimental results on online electronic commerce site show that our approach is significantly better than traditional methods in terms of relevance.

...read moreread less

An Efficient Methodology for Image Rich Information Retrieval

[...]

Ashwini Jaid, Komal Savant, Sonali Varma, Pushpa Jat, Sushama Shinde - Show less +1 more

01 Jan 2015

TL;DR: An algorithm Integrated Weighted Similarity Learning (IWSL) is proposed to account for both link-based and content based similarities by considering the network structure and mutually reinforcing link similarity and feature weight learning.

...read moreread less

Abstract: Social multimedia sharing and hosting websites, such as Flickr and Facebook, contain billions of user-submitted images. Popular Internet commerce websites such as Amazon.com are also furnished with tremendous amounts of product-related images. In addition, images in such social networks are also accompanied by annotations, comments, and other information, thus forming heterogeneous image-rich information networks. In this paper, the concept of (heterogeneous) image-rich information network and the problem of how to perform information retrieval and recommendation in such networks is introduced. A fast algorithm, heterogeneous minimum order k-SimRank (HMok- SimRank) is proposed to compute link-based similarity in weighted heterogeneous information networks. Then, we propose an algorithm Integrated Weighted Similarity Learning (IWSL) to account for both link-based and content based similarities by considering the network structure and mutually reinforcing link similarity and feature weight learning. Both local and global feature learning methods are designed. Experimental results on Flickr and Amazon data sets show that our approach is significantly better than traditional methods in terms of both relevance and speed. A new product search and recommendation system for e-commerce has been implemented based on our algorithm.

...read moreread less

Posted Content•

SimRank Computation on Uncertain Graphs

[...]

Rong Zhu¹, Zhaonian Zou¹, Jianzhong Li¹•Institutions (1)

Harbin Institute of Technology¹

09 Dec 2015-arXiv: Databases

TL;DR: Following the random-walk-based formulation of SimRank on deterministic graphs and the possible worlds model of uncertain graphs, the definition of random walks satisfies Markov's property for the first time and the SimRank measure is formulated based on random walks on uncertain graphs.

...read moreread less

Abstract: SimRank is a similarity measure between vertices in a graph, which has become a fundamental technique in graph analytics. Recently, many algorithms have been proposed for efficient evaluation of SimRank similarities. However, the existing SimRank computation algorithms either overlook uncertainty in graph structures or is based on an unreasonable assumption (Du et al). In this paper, we study SimRank similarities on uncertain graphs based on the possible world model of uncertain graphs. Following the random-walk-based formulation of SimRank on deterministic graphs and the possible worlds model of uncertain graphs, we define random walks on uncertain graphs for the first time and show that our definition of random walks satisfies Markov's property. We formulate the SimRank measure based on random walks on uncertain graphs. We discover a critical difference between random walks on uncertain graphs and random walks on deterministic graphs, which makes all existing SimRank computation algorithms on deterministic graphs inapplicable to uncertain graphs. To efficiently compute SimRank similarities, we propose three algorithms, namely the baseline algorithm with high accuracy, the sampling algorithm with high efficiency, and the two-phase algorithm with comparable efficiency as the sampling algorithm and about an order of magnitude smaller relative error than the sampling algorithm. The extensive experiments and case studies verify the effectiveness of our SimRank measure and the efficiency of our SimRank computation algorithms.

...read moreread less

Book Chapter•DOI•

Friendship Link Recommendation Based on Content Structure Information

[...]

Xiaoming Zhang¹, Qiao Deng¹, Zhoujun Li¹•Institutions (1)

Beihang University¹

08 Jun 2015

TL;DR: A model to recommend user’s potential friends by incorporating users’ generated content and structure features and a weighted SimRank algorithm is proposed to recommend the most similar users as the friends.

...read moreread less

Abstract: Intuitively, a friendship link between two users can be recommended based on the similarity of their generated text content or structure information. Although this problem has been extensively studied, the challenge of how to effectively incorporate the information from the social interaction and user generated content remains largely open. We propose a model (LRCS) to recommend user’s potential friends by incorporating user’s generated content and structure features. First, network users are clustered based on the similarity of user’s interest and structural features. Users in the same cluster with the query user are considered as the candidate friends. Then, a weighted SimRank algorithm is proposed to recommend the most similar users as the friends. Experiments on two real-life datasets show the superiority of our approach.

...read moreread less