Journal ArticleDOI
The Harvest information discovery and access system
Reads0
Chats0
TLDR
Harvest as mentioned in this paper is a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicating, and accessing Internet information, which can be used to collect, index, and extract data from the Internet.Abstract:
It is increasingly difficult to make effective use of Internet information, given the rapid growth in data volume, user base, and data diversity. In this paper we introduce Harvest, a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicating, and accessing Internet information.read more
Citations
More filters
Journal ArticleDOI
Summary cache: a scalable wide-area web cache sharing protocol
TL;DR: This paper demonstrates the benefits of cache sharing, measures the overhead of the existing protocols, and proposes a new protocol called "summary cache", which reduces the number of intercache protocol messages, reduces the bandwidth consumption, and eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP.
Proceedings ArticleDOI
Web mining: information and pattern discovery on the World Wide Web
TL;DR: This paper defines Web mining and presents an overview of the various research issues, techniques, and development efforts, and briefly describes WEBMINER, a system for Web usage mining, and concludes the paper by listing research issues.
ReportDOI
A hierarchical internet object cache
TL;DR: The design and performance of a hierarchical proxy-cache designed to make Internet information systems scale better are discussed, and performance measurements indicate that hierarchy does not measurably increase access latency.
Patent
Centrifugal communication and collaboration method
TL;DR: In this article, a system and method for communicating information among members of a distributed discussion group having peripheral communication devices involves communication between the peripheral devices and a central agent, where messages are retained in memory, thereby causing discussions to be maintained.
Journal ArticleDOI
Database techniques for the World-Wide Web: a survey
TL;DR: The primary goal of this survey is to classify the different tasks to which database concepts have been applied, and to emphasize the technical innovations that were required to do so.
References
More filters
Proceedings Article
GLIMPSE: a tool to search through entire file systems
Udi Manber,Sun Wu +1 more
TL;DR: Glimpse is particularly designed for personal information, such as one's own file system, that should support many types of queries, flexible interaction, low overhead, and customization, All these are important features of glimpse.
ReportDOI
Harvest: A Scalable, Customizable Discovery and Access System
TL;DR: This paper introduces Harvest, a system that provides a set of customizable tools for gathering information from diverse repositories, building topic-specific content indexes, flexibly searching the indexes, widely replicating them, and caching objects as they are retrieved across the Internet.
Journal ArticleDOI
Scalable Internet resource discovery: research problems and approaches
TL;DR: In this paper, the authors indicate trends in these three dimensions and survey problems these trends will create for current approaches and suggest several promising directions of future resource discovery research, along with some initial results from projects carried out by members of the Internet Research Task Force Research Group on Resource Discovery and Directory Service.
Proceedings ArticleDOI
A case for caching file objects inside internetworks
TL;DR: Evidence is presented that several, judiciously placed file caches could reduce the volume of FTP traffic by 42%, and hence theVolume of all NSFNET backbone traffic by 21%, and if FTP client and server software automatically compressed data, this savings could increase to 27%.
Journal ArticleDOI
Customized information extraction as a basis for resource discovery
Darren Hardy,Michael F. Schwartz +1 more
TL;DR: This work presents a model for type-specific, user-customizable information extraction, and a system implementation called Essence, which can extract information from most of the types of files found in common file systems, including files with nested structure.