HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

doi:10.1145/234828.234846

Proceedings ArticleDOI

HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Ron Weiss, +2 more

- pp 180-193

Chats0

TLDR

Experience with HyPursuit suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies, and is encouraged by preliminary results on clustering based on both document contents and hyperlink structures.

Abstract:

HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search act ivities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPursuit admits multiple, coexisting cluster hierarchies based on different principles for grouping documents, such as the Library of Congress catalog scheme and automatically created hypertext clusters. HyPursuit’s abstraction functions summarize cluster contents to support scalable query processing. The abstraction functions satisfy system resource limitations with controlled information 10SS. The result of query processing operations on a cluster summary approximates the result of performing the operations on the entire information space. We constructed a prototype system comprising 100 leaf WorldWide Web sites and a hierarchy of 42 servers that route queries to the leaf sites. Experience with our system suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies. We are also encouraged by preliminary results on clustering based on both document contents and hyperlink structures.

HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Citations

The anatomy of a large-scale hypertextual Web search engine

The PageRank Citation Ranking : Bringing Order to the Web

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Authoritative sources in a hyperlinked environment

Authoritative sources in a hyperlinked environment

References

Term Weighting Approaches in Automatic Text Retrieval

Scatter/Gather: a cluster-based approach to browsing large document collections

Domain names - concepts and facilities

The Harvest information discovery and access system

Searching for information in a hypertext medical handbook

Related Papers (5)

The anatomy of a large-scale hypertextual Web search engine

Authoritative sources in a hyperlinked environment

Introduction to Modern Information Retrieval

Scatter/Gather: a cluster-based approach to browsing large document collections

The PageRank Citation Ranking : Bringing Order to the Web