scispace - formally typeset
Search or ask a question
Proceedings Article

The PageRank Citation Ranking : Bringing Order to the Web

11 Nov 1999-Vol. 98, pp 161-172
TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Abstract: The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.
Citations
More filters
Journal ArticleDOI
TL;DR: Developments in this field are reviewed, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.
Abstract: Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.

17,647 citations

Journal ArticleDOI
01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,696 citations

Journal Article
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

13,327 citations

01 Jan 2006
TL;DR: Platform-independent and open source igraph aims to satisfy all the requirements of a graph package while possibly remaining easy to use in interactive mode as well.
Abstract: There is no other package around that satisfies all the following requirements: •Ability to handle large graphs efficiently •Embeddable into higher level environments (like R [6] or Python [7]) •Ability to be used for quick prototyping of new algorithms (impossible with “click & play” interfaces) •Platform-independent and open source igraph aims to satisfy all these requirements while possibly remaining easy to use in interactive mode as well.

8,850 citations


Cites background from "The PageRank Citation Ranking : Bri..."

  • ...Centrality measures The following centrality measures [7] can be calculated: • degree • closeness • vertex and edge betweenness • eigenvector centrality • page rank [12]....

    [...]

Posted Content
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Abstract: Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

7,926 citations


Cites methods from "The PageRank Citation Ranking : Bri..."

  • ...These methods also bear close relationships to more classic approaches to spectral clustering [23], multi-dimensional scaling [19], as well as the PageRank algorithm [25]....

    [...]

References
More filters
Journal ArticleDOI
01 Sep 1997
TL;DR: The varieties of link information (not just hyperlinks) on the Web, how the Web differs from conventional hypertext, and how the links can be exploited to build useful applications are discussed.
Abstract: Web information retrieval tools typically make use of only the text on pages, ignoring valuable information implicitly contained in links. At the other extreme, viewing the Web as a traditional hypertext system would also be mistake, because heterogeneity, cross-domain links, and the dynamic nature of the Web mean that many assumptions of typical hypertext systems do not apply. The novelty of the Web leads to new problems in information access, and it is necessary to make use of the new kinds of information available, such as multiple independent categorization, naming, and indexing of pages. This paper discusses the varieties of link information (not just hyperlinks) on the Web, how the Web differs from conventional hypertext, and how the links can be exploited to build useful applications. Specific applications presented as part of the ParaSite system find individuals' homepages, new locations of moved pages, and unindexed information.

208 citations

Proceedings ArticleDOI
01 May 1995
TL;DR: This work proposes an algorithm based on content and structural analysis to form hierarchies from hypermedia networks using multiple hierarchical views, which can be visualized in various ways to help the user better comprehend the information.
Abstract: Our work concerns visualizing the information space of hypermedia systems using multiple hierarchical views. Although overview diagrams are useful for helping the user to navigate in a hypermedia system, for any real-world system they become too complicated and large to be really useful. This is because these diagrams represent complex network structures which are very difficult to visualize and comprehend. On the other hand, effective visualizations of hierarchies have been developed. Our strategy is to provide the user with different hierarchies, each giving a different perspective to the underlying information space, to help the user better comprehend the information. We propose an algorithm based on content and structural analysis to form hierarchies from hypermedia networks. The algorithm is automatic but can be guided by the user. The multiple hierarchies can be visualized in various ways. We give examples of the implementation of the algorithm on two hypermedia systems.

146 citations

Journal ArticleDOI
TL;DR: A general theory of epidemics can explain the growth of symbolic logic from 1847 to 1962 and an epidemic model predicts the rise and fall of particular research areas within symbolic logic.
Abstract: The spread of ideas within a scientific community and the spread of infectious disease are both special cases of a general communication process. Thus a general theory of epidemics can explain the growth of symbolic logic from 1847 to 1962. An epidemic model predicts the rise and fall of particular research areas within symbolic logic. A Markov chain model of individual movement between research areas indicates that once an individual leaves an area he is not expected to return.

44 citations

Proceedings ArticleDOI
07 May 1995
TL;DR: This paper talks about a method to show the context of nodes in the World-Wide Web with respect to landmark nodes and implemented the method in the Navigational View Builder, a tool for forming effective visualizations of hyperme@a systems.
Abstract: This paper talks about a method to show the context of nodes in the World-Wide Web. World-Wide Web presents a lot of information to the user. Consequently, it suffers from the famous lost in hyperspace problem. One way to solve the problem is to show the user where they are in the context of the overall information space. Since the overall information space is large, we need to show the node’s context with respect to only the important nodes. In this paper we discuss our method of showing the context and show some examples of our implementation. KEVWORDS: Hyperme&~ VMmlization, Structural analysis, World-Wide Web. INTRODUCTION One of the major problems with current hypermedia systems is being lost in hyperspace. For example in Mosaic[l], the popular interface to the World-Wide Web, the most widely used hyperme&a system today, the process of jumping from one location to another can easily confuse the user. One of the main reason for this is that the user does not know the context of the node with respect to the overall information space. Sirrdlarly when the user uses the Open URL command to jump to a particular node, some information about the node’s context would be very useful. One common strategy to solve this problem is to use an overview diagram showing the overall graph structure. However, the problem with these are that for any large information space like the WWW, these diagrams are too confusing for the user. Therefore, instead of showing the whole space, we need to show how the node can be reached from important nodes (known as landmarks in the hypermedia literature). This is sinihu to the common geographical navigation strategy of finding where we are in the context of important landmarks. This paper dkcusscs an useful but simple method of showing the context of nodes of the World-Wide Web with respect to landmark nodes. We have implemented our method in the Navigational View Builder [3], a tool for forming effective visualizations of hyperme@a systems. Examples are shown of Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of ACM. To copy otherwise, or to republish, requires a fee and/or specific permission. CHI Companion 95, Denver, Colorado, USA @ 1995 ACM 0-89791 -755-3/95/0005 ...$3.50 1. 2. how our method found out the context of some of the WWW pages about the research activities at the Graphics Visualization & Usability Center (GVU) at Georgia T&h.l Note that the node and link structure of the WWW were extracted by parsing the html documents using the strategy described in [4]. DISCOVERING LANDMARK NODES Finding nodes that are good landmarks is not a trivial task. Valdez and Chignell [5] “anticipated that landmarks would tend to be connected to more objects than nonkmdmarks, in the same way that major hubs serve as landmarks in airline systems.” While running some experiments they observed a high correlation between the recall of words in a hypertext and their second-order connectedness. The second-order connectcdness is defined as the number of nodes that can be reached by a node when following at most two links. As observed in [2], since hypertext are directed graphs, it is possible to extend the idea and postulate that nodes that have high back second-order connectedness are also good landmarks. The back second-order connectedness of a node is the number of nodes that can reach the specified node in two steps. Similarly, the number of nodes that can be reached from the node by following only one link (the outdegree of the node) and the number of nodes that can reach the node following only one link (the indegree) should be also used in calculating the importance of the node. Thus, the importance of a node can be calculated to be the weighted sum of the second-order connectedness (SOC), the back second-order connectedness (BSOC), the indegree (I) and the outdegree (0). After the importance of the nodes are calculated, the landmark can be defined to be those nodes whose importance value is greater than a threshold. We used a threshold value of ten percent of the total number of nodes in the infornu?tion space. Thus, the procedure for discovering landmarks can be summarized as follows

24 citations

Dissertation
01 Jan 1997
TL;DR: The primary goal of the research presented here is to put forth new techniques and models that can be used to help efficiently manage people's attentional processes when dealing with large, unstructured, heterogeneous information environments.
Abstract: One of the fastest growing sources of information today is the World Wide Web (WWW), having grown from only fifty sources of information in January of 1993 to over a half million four years later. The exponential growth of information within the Web has created an overabundance of information and a poverty of human attention, with the users citing the inability to navigate and find relevant information on the Web as one of the biggest problems facing the Web today. The primary goal of the research presented here is to put forth new techniques and models that can be used to help efficiently manage people's attentional processes when dealing with large, unstructured, heterogeneous information environments. The primary model is based upon the desirability of items on the Web. This research searches for lawful patterns of structure, content, and use. Methods are developed to exploit these patterns to organize and optimize users' information foraging and sense-making activities. These enhancements rely on predicting, categorizing and allocation of attention. Several methods are explored for inducing categorial structures for the WWW. Some of these enhancements involve clustering in a high-dimensional space of content, use, and structural features. Others derive from cocitation analysis methods used in the study of scientific communities. A user would also be aided by retrieval mechanisms that predicted and returned the most likely needed WWW pages, given that the user is attending to some given pages(s). The approach of this research uses a spreading activation mechanism to predict the needed, relevant information, computed using past usage patterns, degree of shared content, and WWW hyperlink structure.

10 citations