scispace - formally typeset
Topic

Web page

About: Web page is a(n) research topic. Over the lifetime, 50353 publication(s) have been published within this topic receiving 975168 citation(s). The topic is also known as: webpage & web.
Papers
More filters

Journal ArticleDOI
01 Apr 1998-
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,045 citations


Proceedings Article
11 Nov 1999-
TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Abstract: The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.

13,512 citations


Journal Article
01 Jan 1998-Computer Networks
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

13,327 citations


Proceedings ArticleDOI
24 Jul 1998-
TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
Abstract: We consider the problem of using a large unlabeled sample to boost performance of a learning algorit,hrn when only a small set of labeled examples is available. In particular, we consider a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks t,hat point to that page. We assume that either view of the example would be sufficient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples. Specifically, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm’s predictions on new unlabeled examples are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC-style analysis for this setting, and, more broadly, a PAC-style framework for the general problem of learning from both labeled and unlabeled data. We also provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to significant improvement of hypotheses in practice. *This research was supported in part by the DARPA HPKB program under contract F30602-97-1-0215 and by NSF National Young investigator grant CCR-9357793. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. TO copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. COLT 98 Madison WI USA Copyright ACM 1998 l-58113-057--0/98/ 7...%5.00 92 Tom Mitchell School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3891 mitchell+@cs.cmu.edu

5,359 citations


Journal ArticleDOI
09 Sep 1999-Nature
TL;DR: The World-Wide Web becomes a large directed graph whose vertices are documents and whose edges are links that point from one document to another, which determines the web's connectivity and consequently how effectively the authors can locate information on it.
Abstract: Despite its increasing role in communication, the World-Wide Web remains uncontrolled: any individual or institution can create a website with any number of documents and links. This unregulated growth leads to a huge and complex web, which becomes a large directed graph whose vertices are documents and whose edges are links (URLs) that point from one document to another. The topology of this graph determines the web's connectivity and consequently how effectively we can locate information on it. But its enormous size (estimated to be at least 8×108 documents1) and the continual changing of documents and links make it impossible to catalogue all the vertices and edges.

3,988 citations


Network Information
Related Topics (5)
Web intelligence

9.4K papers, 227.4K citations

93% related
Data Web

14.9K papers, 339.2K citations

93% related
Web navigation

14.9K papers, 389.6K citations

93% related
Web development

16.2K papers, 353.9K citations

93% related
Web standards

15.7K papers, 393.4K citations

92% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20229
2021485
2020895
20191,221
20181,440
20171,643

Top Attributes

Show by:

Topic's top 5 most impactful authors

Katsumi Tanaka

105 papers, 1.2K citations

Michael L. Nelson

73 papers, 661 citations

Wei-Ying Ma

56 papers, 4.2K citations

Zheng Chen

41 papers, 1.6K citations

Simon Harper

40 papers, 741 citations