scispace - formally typeset
Search or ask a question
Topic

SimRank

About: SimRank is a research topic. Over the lifetime, 250 publications have been published within this topic receiving 21163 citations.


Papers
More filters
Book ChapterDOI
16 Jun 2014
TL;DR: This work designs accurate and tight upper bounds of Personalized PageRank (PPR) and SR based on human intuition and demonstrates effectiveness of the novel upper bounds in the scenario of top-k similar nodes query, where the upper bounds accelerate speed of the query.
Abstract: Link-based similarity measures play significant role in many graph based applications. Consequently, measuring nodes similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and SimRank (SR) have emerged as the most popular and influential link-based similarity measures. In practice, PPR and SR scores are achieved by iterative computing. With increasing of iterations, the computations incur heavy overhead. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guarantee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing accurate and tight upper bounds of PPR and SR in the paper. Our upper bounds are designed based on following human intuition: “the smaller the difference between the two consecutive iteration step results is, the smaller the difference between iterative similarity scores and theoretical ones is”. Furthermore, we demonstrate effectiveness of our novel upper bounds in the scenario of top-k similar nodes query, where our upper bounds accelerate speed of the query. At last, we run a comprehensive set of experiments on real data sets to verify effectiveness and efficiency of our upper bounds

1 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: A novel tripartite extension of SimRank is formulated using the network of lenders, loans and borrowers to capture the inherent pattern in the system to validate the effectiveness of the modeling and the proposed disambiguation scheme for borrowers.
Abstract: Microfinance institutions aim at offering financial services to people in low-income category, who typically lack access to traditional banking systems. Till date, greater than 15 billion U.S dollars has been infused into microfinancing, assisting more than 160 million people in developing countries. With the tremendous growth in the World Wide Web, a number of microfinance institutions have recently moved online. One such noble initiative is KIVA, a crowd sourced online microfinance platform which connects borrowers (small entrepreneurs and individuals) to lenders through the field partners. One particular interest to such microfinancing institutions, is the analysis of the network of borrowers which can help them improve the percentage of loan requests fulfilled. KIVA provides a rich dataset capturing the lending activities on the website. In this paper, we analyze the data to find and extract the structure in the KIVA framework. We formulate a novel tripartite extension of SimRank using the network of lenders, loans and borrowers to capture the inherent pattern in the system. We also propose a Multipartite extension of SimRank useful for real world settings. Extensive experiments validate the effectiveness of our modeling and the proposed disambiguation scheme for borrowers.

1 citations

31 Dec 2016
TL;DR: This thesis proposes a novel framework for predicting the location of a social media user by leveraging structural-context similarity over Wikipedia links and provides a list of ranked "probable" cities based on the distances between candidate locations and their weights.
Abstract: LEVERAGING STRUCTURAL-CONTEXT SIMILARITY OF WIKIPEDIA LINKS TO PREDICT TWITTER USER LOCATIONS Twitter is a widely used social media service. Several efforts have targeted understanding the patterns of information dissemination underlying this social network. A user’s location is one of the most important information items relative to analyzing content. However, location information tends to be unavailable because most users do not (want to) include geo-tags in their tweets. To predict a user’s location, existing approaches require voluminous training data sets of geo-tagged tweets. However, some of the characteristics of tweets, such as compact, non-traditional linguistic expressions, have posed significant challenges when applying model-fitting approaches. In this thesis, we propose a novel framework for predicting the location of a social media user by leveraging structural-context similarity over Wikipedia links. We measure SimRanks between pages over the Wikipedia dump dataset and build a knowledge base, mapping location information (e.g., cities and states) to related vocabularies along with the likelihood for these mappings. Our results evolve as the users’ tweet stream grows. We have implemented this framework using Apache Storm to observe real-time tweets. Finally, our framework provides a list of ranked "probable" cities based on the distances between candidate locations and their weights. This thesis includes empirical evaluations that demonstrate performance that is in line with current state-of-the-art location prediction approaches.

1 citations

Journal ArticleDOI
TL;DR: A factorized similarity learning (FSL) is proposed to integrate the link, node content, and user supervision into a uniform framework by using matrix factorization, and the final similarities are approximated by the span of low-rank matrices.
Abstract: The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as SimRank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a factorized similarity learning (FSL) is proposed to integrate the link, node content, and user supervision into a uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low-rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge loss alternatively. To facilitate efficient computation on large-scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA data sets. The results show that FSL is robust and efficient and outperforms the state of the art. The code for the learning algorithm used in our experiments is available at http://www.ifp.illinois.edu/~chang87/.

1 citations

Journal ArticleDOI
TL;DR: Results show that the distributed SimRank algorithm proposed based on Mapreduce was used to measure the similarity of graph and can efficiently complete graph nodes similarity measure and clustering the large graph effectively.
Abstract: Graph clustering is an important technology in graph analysis area, the measure of similarity between node of graph is the presise for graph clustering. SimRank algorithm is a kind of universal structure similarity calculation model which is proposed by Jeh and Widom. SimRank algorithm using iterative method to calculate the similarity between nodes, so the time and space complexity is very high. With the rapid increase of data, the ability of single machine can not meet the requirement of the large-scale data calculation. In this paper, the distributed SimRank algorithm was proposed based on Mapreduce and was used to measure the similarity of graph. Then the distributed AP clustering algorithm was designed for clustering analysis graph nodes. The experimental was executed to compare the clustering running time and speedup and results show that the method can efficiently complete graph nodes similarity measure and clustering the large graph effectively.

1 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
77% related
Graph (abstract data type)
69.9K papers, 1.2M citations
77% related
Ontology (information science)
57K papers, 869.1K citations
77% related
Scalability
50.9K papers, 931.6K citations
74% related
Tree (data structure)
44.9K papers, 749.6K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202115
202026
201916
201817
201719
201616