Showing papers on "SimRank published in 2007"

PDF

Open Access

Posted Content•

Simrank++: Query rewriting through link analysis of the click graph

[...]

Ioannis Antonellis, Hector Garcia-Molina, Chi-Chao Chang

04 Dec 2007-arXiv: Digital Libraries

TL;DR: In this paper, the problem of query rewriting for sponsored search is addressed by using Simrank as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in.

...read moreread less

Abstract: We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced version of Simrank: one that exploits weights on click graph edges and another that exploits ``evidence.'' We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries form Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.

...read moreread less

42 citations

Journal Article•DOI•

Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

[...]

D. Fogaras¹, B. Racz²•Institutions (2)

Google¹, Hungarian Academy of Sciences²

01 May 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The first to evaluate SimRank on real Web data and show that there is a significant gap between exact and approximate approaches, and suggest that the exact computation, in general, is infeasible for large-scale inputs.

...read moreread less

Abstract: To exploit the similarity information hidden in the hyperlink structure of the Web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multistep neighborhoods of vertices are numerically evaluated by similarity functions including SimRank, a recursive refinement of cocitation, and PSimRank, a novel variant with better theoretical characteristics. Our methods are presented in a general framework of Monte Carlo similarity search algorithms that precompute an index database of random fingerprints, and at query time, similarities are estimated from the fingerprints. We justify our approximation method by asymptotic worst-case lower bounds: we show that there is a significant gap between exact and approximate approaches, and suggest that the exact computation, in general, is infeasible for large-scale inputs. We were the first to evaluate SimRank on real Web data. On the Stanford WebBase graph of 80M pages the quality of the methods increased significantly in each refinement step until step four

...read moreread less

16 citations

Assessing program code through static structural similarity

[...]

Kevin A. Naudé

01 Jan 2007

TL;DR: A novel graph similarity measure, the Weighted Assignment Similarity measure, which is related to SimRank, but derives propagation scores from only the locally optimal mapping between child vertices, and a method for incorporating these local attribute similarities into the larger similarity propagation method.

...read moreread less

Abstract: Learning to write software requires much practice and frequent assessment. Consequently, the use of computers to assist in the assessment of computer programs has been important in supporting large classes at universities. The main approaches to the problem are dynamic analysis (testing student programs for expected output) and static analysis (direct analysis of the program code). The former is very sensitive to all kinds of errors in student programs, while the latter has traditionally only been used to assess quality, and not correctness. This research focusses on the application of static analysis, particularly structural similarity, to marking student programs. Existing traditional measures of similarity are limiting in that they are usually only effective on tree structures. In this regard they do not easily support dependencies in program code. Contemporary measures of structural similarity, such as similarity flooding, usually rely on an internal normalisation of scores. The effect is that the scores only have relative meaning, and cannot be interpreted in isolation, ie. they are not meaningful for assessment. The SimRank measure is shown to have the same problem, but not because of normalisation. The problem with the SimRank measure arises from the fact that its scores depend on all possible mappings between the children of vertices being compared. The main contribution of this research is a novel graph similarity measure, the Weighted Assignment Similarity measure. It is related to SimRank, but derives propagation scores from only the locally optimal mapping between child vertices. The resulting similarity scores may be regarded as the percentage of mutual coverage between graphs. The measure is proven to converge for all directed acyclic graphs, and an efficient implementation is outlined for this case. Attributes on graph vertices and edges are often used to capture domain specific information which is not structural in nature. It has been suggested that these should influence the similarity propagation, but no clear method for doing this has been reported. The second important contribution of this research is a general method for incorporating these local attribute similarities into the larger similarity propagation method. An example of attributes in program graphs are identifier names. The choice of identifiers in programs is arbitrary as they are purely symbolic. A problem facing any comparison between programs is that they are unlikely to use the same set of identifiers. This problem indicates that a mapping between the identifier sets is required. The third contribution of this research is a method for applying the structural similarity measure in a two step process to find an optimal identifier mapping. This approach is both novel and valuable as it cleverly reuses the similarity measure as an existing resource. In general, programming assignments allow a large variety of solutions. Assessing student programs through structural similarity is only feasible if the diversity in the solution space can…

...read moreread less

4 citations