scispace - formally typeset
Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

Sergey Brin, +1 more
- Vol. 30, pp 107-117
Reads0
Chats0
TLDR
This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

The Structure and Function of Complex Networks

Mark Newman
- 01 Jan 2003 - 
TL;DR: Developments in this field are reviewed, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.
Journal ArticleDOI

Community detection in graphs

TL;DR: A thorough exposition of community structure, or clustering, is attempted, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists.
Journal ArticleDOI

Community detection in graphs

TL;DR: A thorough exposition of the main elements of the clustering problem can be found in this paper, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.
References
More filters

Search Engines for the World Wide Web: A Comparative Study and Evaluation Methodology

TL;DR: The authors of this study found that Alta Vista outperformed Excite and Lycos in both search facilities and retrieval performance although Lycos had the largest coverage of Web resources among the three Web search engines examined.
Journal ArticleDOI

ParaSite: mining structural information on the Web

TL;DR: The varieties of link information (not just hyperlinks) on the Web, how the Web differs from conventional hypertext, and how the links can be exploited to build useful applications are discussed.
Journal ArticleDOI

Queries and Computation on the Web

TL;DR: Surprisingly, stratified and well-founded semantics for negation turn out to have basic shortcomings in this context, while inflationary semantics emerges as an appealing alternative.
Journal ArticleDOI

Lycos: design choices in an Internet search service

TL;DR: The history and precursors of the Lycos system for collecting, storing, and retrieving information about pages on the Web are outlined and some of the design choices made in building this Web indexer are discussed.