scispace - formally typeset
Search or ask a question
Author

Taher H. Haveliwala

Other affiliations: Stanford University
Bio: Taher H. Haveliwala is an academic researcher from Google. The author has contributed to research in topics: Web search query & Web page. The author has an hindex of 29, co-authored 49 publications receiving 7585 citations. Previous affiliations of Taher H. Haveliwala include Stanford University.

Papers
More filters
Proceedings ArticleDOI
07 May 2002
TL;DR: A set of PageRank vectors are proposed, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic, and are shown to generate more accurate rankings than with a single, generic PageRank vector.
Abstract: In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared.

1,765 citations

Journal ArticleDOI
TL;DR: It is shown that using linear combinations of these (precomputed) biased PageRank vectors to generate context-specific importance scores for pages at query time, can generate more accurate rankings than with a single, generic PageRank vector.
Abstract: The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared. By using linear combinations of these (precomputed) biased PageRank vectors to generate context-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. We describe techniques for efficiently implementing a large-scale search system based on the topic-sensitive PageRank scheme.

1,161 citations

Patent
22 Jun 2004
TL;DR: In this paper, a search system monitors the input of a search query by a user, and sends a portion of the query as a partial query to the search engine for possible selection.
Abstract: A search system monitors the input of a search query by a user. Before the user finishes entering the search query, the search system identifies and sends a portion of the query as a partial query to the search engine. Based on the partial query, the search engine creates a set of predicted queries. This process may take into account prior queries submitted by a community of users, and may take into account a user profile. The predicted queries are be sent back to the user for possible selection. The search system may also cache search results corresponding to one or more of the predicted queries in anticipation of the user selecting one of the predicted queries. The search engine may also return at least a portion of the search results corresponding to one or more of the predicted queries.

545 citations

Proceedings ArticleDOI
20 May 2003
TL;DR: In Quadratic Extrapolation, the first eigenvalue of a Markov matrix is known to be 1 to compute the nonprincipal eigenvectors using successive iterates of the Power Method, a fast method for determining the dominant eigenvector of a matrix that is too large for standard fast methods to be practical.
Abstract: We present a novel algorithm for the fast computation of PageRank, a hyperlink-based estimate of the ''importance'' of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Quadratic Extrapolation, accelerates the convergence of the Power Method by periodically subtracting off estimates of the nonprincipal eigenvectors from the current iterate of the Power Method. In Quadratic Extrapolation, we take advantage of the fact that the first eigenvalue of a Markov matrix is known to be 1 to compute the nonprincipal eigenvectors using successive iterates of the Power Method. Empirically, we show that using Quadratic Extrapolation speeds up PageRank computation by 25-300% on a Web graph of 80 million nodes, with minimal overhead. Our contribution is useful to the PageRank community and the numerical linear algebra community in general, as it is a fast method for determining the dominant eigenvector of a matrix that is too large for standard fast methods to be practical.

497 citations

Patent
21 Jun 2005
TL;DR: In this article, personalized advertisements are provided to a user using a search engine to obtain documents relevant to a search query, where advertisements are personalized in response to a query profile that is derived from personalized search results.
Abstract: Personalized advertisements are provided to a user using a search engine to obtain documents relevant to a search query. The advertisements are personalized in response to a search profile that is derived from personalized search results. The search results are personalized based on a user profile of the user providing the query. The user profile describes interests of the user, and can be derived from a variety of sources, including prior search queries, prior search results, expressed interests, demographic, geographic, psychographic, and activity information.

387 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains, and implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space.
Abstract: Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.

5,701 citations

Proceedings ArticleDOI
23 May 1998
TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Abstract: We present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For data sets of size n living in R d , the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree. The article is based on the material from the authors' STOC'98 and FOCS'01 papers. It unifies, generalizes and simplifies the results from those papers.

4,478 citations

Journal IssueDOI
TL;DR: Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.
Abstract: Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the link-prediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a network. Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures. © 2007 Wiley Periodicals, Inc.

4,181 citations

Proceedings ArticleDOI
20 May 2003
TL;DR: An algorithm to decrease the number of downloads of inauthentic files in a peer-to-peer file-sharing network that assigns each peer a unique global trust value, based on the peer's history of uploads is described.
Abstract: Peer-to-peer file-sharing networks are currently receiving much attention as a means of sharing and distributing information. However, as recent experience shows, the anonymous, open nature of these networks offers an almost ideal environment for the spread of self-replicating inauthentic files.We describe an algorithm to decrease the number of downloads of inauthentic files in a peer-to-peer file-sharing network that assigns each peer a unique global trust value, based on the peer's history of uploads. We present a distributed and secure method to compute global trust values, based on Power iteration. By having peers use these global trust values to choose the peers from whom they download, the network effectively identifies malicious peers and isolates them from the network.In simulations, this reputation system, called EigenTrust, has been shown to significantly decrease the number of inauthentic files on the network, even under a variety of conditions where malicious peers cooperate in an attempt to deliberately subvert the system.

3,715 citations

Journal ArticleDOI
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
Abstract: We describe the University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications The Collection is widely used by the numerical linear algebra community for the development and performance evaluation of sparse matrix algorithms It allows for robust and repeatable experiments: robust because performance results with artificially generated matrices can be misleading, and repeatable because matrices are curated and made publicly available in many formats Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs) We provide software for accessing and managing the Collection, from MATLAB™, Mathematica™, Fortran, and C, as well as an online search capability Graph visualization of the matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task

3,456 citations