Journal ArticleDOI
SALSA: the stochastic approach for link-structure analysis
Ronny Lempel,Shlomo Moran +1 more
Reads0
Chats0
TLDR
It is proved that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach, and comparisions reveal a topological Phenomenon called the TKC effect which prevents the Mutual Reinforcement approach from identifying meaningful authorities.Abstract:
Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.read more
Citations
More filters
ReportDOI
Large-scale Graph Computation on Just a PC
TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.
Book
Mining the Web: Discovering Knowledge from Hypertext Data
TL;DR: This chapter discusses the infrastructure of the Web, the future of Web mining, and applications of semi-supervised learning for text and similarity and clustering.
Book
Data-Intensive Text Processing with MapReduce
Jimmy Lin,Chris Dyer +1 more
TL;DR: This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model using the open-source Hadoop implementation, with a focus on scalability and the tradeoffs associated with distributed processing of large datasets.
Proceedings ArticleDOI
WTF: the who to follow service at Twitter
TL;DR: An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.
Journal ArticleDOI
A Survey on PageRank Computing
TL;DR: The theoretical foundations of the PageRank formulation are examined, the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability.
References
More filters
Journal ArticleDOI
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Journal ArticleDOI
Authoritative sources in a hyperlinked environment
TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Journal ArticleDOI
Co-citation in the scientific literature: A new measure of the relationship between two documents
TL;DR: A new form of document coupling called co-citation is defined as the frequency with which two documents are cited together, and clusters of co- cited papers provide a new way to study the specialty structure of science.
Journal ArticleDOI
Citation analysis as a tool in journal evaluation.
TL;DR: In 1971, the Institute for Scientfic Information decided to undertake a systematic analysis of journal citation patterns across the whole of science and technology.