The Harvest information discovery and access system

doi:10.1016/0169-7552(95)00098-5

Journal ArticleDOI

The Harvest information discovery and access system

C. Mic Bowman, +4 more

- 01 Dec 1995 -

Computer Networks and Isdn Systems

- Vol. 28, Iss: 1, pp 119-125

Chats0

TLDR

Harvest as mentioned in this paper is a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicating, and accessing Internet information, which can be used to collect, index, and extract data from the Internet.

Abstract:

It is increasingly difficult to make effective use of Internet information, given the rapid growth in data volume, user base, and data diversity. In this paper we introduce Harvest, a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicating, and accessing Internet information.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Summary cache: a scalable wide-area web cache sharing protocol

Li Fan, +3 more

- 01 Jun 2000 -

IEEE ACM Transactions on Networking

TL;DR: This paper demonstrates the benefits of cache sharing, measures the overhead of the existing protocols, and proposes a new protocol called "summary cache", which reduces the number of intercache protocol messages, reduces the bandwidth consumption, and eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP.

...read moreread less

Proceedings ArticleDOI

Web mining: information and pattern discovery on the World Wide Web

Robert Cooley, +2 more

TL;DR: This paper defines Web mining and presents an overview of the various research issues, techniques, and development efforts, and briefly describes WEBMINER, a system for Web usage mining, and concludes the paper by listing research issues.

...read moreread less

ReportDOI

A hierarchical internet object cache

Anawat Chankhunthod, +4 more

TL;DR: The design and performance of a hierarchical proxy-cache designed to make Internet information systems scale better are discussed, and performance measurements indicate that hierarchy does not measurably increase access latency.

...read moreread less

Patent

Centrifugal communication and collaboration method

Theodore B. Achacoso, +1 more

TL;DR: In this article, a system and method for communicating information among members of a distributed discussion group having peripheral communication devices involves communication between the peripheral devices and a central agent, where messages are retained in memory, thereby causing discussions to be maintained.

...read moreread less

Journal ArticleDOI

Database techniques for the World-Wide Web: a survey

Daniela Florescu, +2 more

TL;DR: The primary goal of this survey is to classify the different tasks to which database concepts have been applied, and to emphasize the technical innovations that were required to do so.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

GLIMPSE: a tool to search through entire file systems

Udi Manber, +1 more

TL;DR: Glimpse is particularly designed for personal information, such as one's own file system, that should support many types of queries, flexible interaction, low overhead, and customization, All these are important features of glimpse.

...read moreread less

ReportDOI

Harvest: A Scalable, Customizable Discovery and Access System

C. M. Bowman, +4 more

TL;DR: This paper introduces Harvest, a system that provides a set of customizable tools for gathering information from diverse repositories, building topic-specific content indexes, flexibly searching the indexes, widely replicating them, and caching objects as they are retrieved across the Internet.

...read moreread less

Journal ArticleDOI

Scalable Internet resource discovery: research problems and approaches

C. Mic Bowman, +3 more

- 01 Aug 1994 -

Communications of The ACM

TL;DR: In this paper, the authors indicate trends in these three dimensions and survey problems these trends will create for current approaches and suggest several promising directions of future resource discovery research, along with some initial results from projects carried out by members of the Internet Research Task Force Research Group on Resource Discovery and Directory Service.

...read moreread less

Proceedings ArticleDOI

A case for caching file objects inside internetworks

Peter B. Danzig, +2 more

TL;DR: Evidence is presented that several, judiciously placed file caches could reduce the volume of FTP traffic by 42%, and hence theVolume of all NSFNET backbone traffic by 21%, and if FTP client and server software automatically compressed data, this savings could increase to 27%.

...read moreread less

Journal ArticleDOI

Customized information extraction as a basis for resource discovery

Darren Hardy, +1 more

- 01 May 1996 -

ACM Transactions on Computer Systems

TL;DR: This work presents a model for type-specific, user-customizable information extraction, and a system implementation called Essence, which can extract information from most of the types of files found in common file systems, including files with nested structure.

...read moreread less

The Harvest information discovery and access system

Citations

Summary cache: a scalable wide-area web cache sharing protocol

Web mining: information and pattern discovery on the World Wide Web

A hierarchical internet object cache

Centrifugal communication and collaboration method

Database techniques for the World-Wide Web: a survey

References

GLIMPSE: a tool to search through entire file systems

Harvest: A Scalable, Customizable Discovery and Access System

Scalable Internet resource discovery: research problems and approaches

A case for caching file objects inside internetworks

Customized information extraction as a basis for resource discovery

Related Papers (5)

The anatomy of a large-scale hypertextual Web search engine

A hierarchical internet object cache

Mediators in the architecture of future information systems

Introduction to Modern Information Retrieval

Accessibility of information on the web