A review of the use of inverted files for best match searching in information retrieval systems

Open AccessBook

A review of the use of inverted files for best match searching in information retrieval systems

- pp 124-131

TLDR

In this article, the use of inverted files for the calculation of similarity coefficients and other types of matching function is discussed in the context of mechanised document retrieval systems and a critical evaluation is presented of a range of algorithms which have been described for the matching of documents with queries.

Abstract:

The use of inverted files for the calculation of similarity coefficients and other types of matching function is discussed in the context of mechanised document retrieval systems. A critical evaluation is presented of a range of algorithms which have been described for the matching of documents with queries. Particular attention is paid to the computational efficiency of the various procedures, and improved search heuristics are given in some cases. It is suggested that the algorithms could be implemented sufficiently efficiently to permit the provision of nearest neighbour searching as a standard retrieval option.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Inverted files for text search engines

Justin Zobel, +1 more

- 25 Jul 2006 -

ACM Computing Surveys

TL;DR: This tutorial introduces the key techniques in the area of text indexing, describing both a core implementation and how the core can be enhanced through a range of extensions.

...read moreread less

Journal ArticleDOI

Recent trends in hierarchic document clustering: a critical review

Peter Willett

- 01 Aug 1988 -

Information Processing and Management

TL;DR: Algorithms that can be used to allow the implementation of hierarchic agglomerative clustering methods for document retrieval, and experimental evidence suggests that nearest neighbor clusters provide a reasonably efficient and effective means of including interdocument similarity information in document retrieval systems.

...read moreread less

Journal ArticleDOI

Filtered document retrieval with frequency-sorted indexes

Michael Persin, +2 more

- 01 Sep 1996 -

Journal of the Association for Informati...

TL;DR: An evaluation technique that uses early recognition of which documents are likely to be highly ranked to reduce costs is proposed and it is shown that frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed.

...read moreread less

Journal ArticleDOI

Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking.

Donna Harman, +1 more

- 01 Dec 1990 -

Journal of the Association for Informati...

TL;DR: To show the feasibility ofStatistically based ranked retrieval of records using keywords, research was done to produce very fast search techniques using these ranking algorithms, and to test the results against large databases with many end users.

...read moreread less

Journal ArticleDOI

New techniques for best-match retrieval

Dennis Shasha, +1 more

- 01 Apr 1990 -

ACM Transactions on Information Systems

TL;DR: A scheme to answer best-match queries from a file containing a collection of objects to allow the optimum use of any given set of precomputed intrafile distances is described.

...read moreread less

A review of the use of inverted files for best match searching in information retrieval systems

Citations

Inverted files for text search engines

Recent trends in hierarchic document clustering: a critical review

Filtered document retrieval with frequency-sorted indexes

Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking.

New techniques for best-match retrieval

Related Papers (5)

A review of the use of inverted files for best match searching in information retrieval systems

Introduction to Modern Information Retrieval

Optimization of inverted vector searches

A document retrieval system based on nearest neighbour searching

The nearest neighbour problem in information retrieval: an algorithm using upperbounds