scispace - formally typeset
Search or ask a question

Showing papers by "Gordon V. Cormack published in 1997"


Journal ArticleDOI
TL;DR: The generally accepted rule of “leftmost longest match” is an unfortunate choice and is at the root of the difficulties and a rule is proposed which is semantically cleaner and generally applicable to a variety of text search applications, including source code analysis.
Abstract: The use of regular expressions for text search is widely known and well understood. It is then surprising that the standard techniques and tools prove to be of limited use for searching structured text formatted with SGML or similar markup languages. Our experience with structured text search has caused us to reexamine the current practice. The generally accepted rule of “leftmost longest match” is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner. This rule is generally applicable to a variety of text search applications, including source code analysis, and has interesting properties in its own right. We have written a publicly available search tool implementing the theory in the article, which has proved valuable in a variety of circumstances.

57 citations


25 Jun 1997
TL;DR: This work investigates the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consisting of a page indicating several potentially relevant documents.
Abstract: We investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating several potentially relevant documents. Traditional ranking methods for information retrieval, based on term and inverse document frequencies, have often been found to work poorly in this context. Under the cover density measure, ranking is based on term proximity and co-occurrence. Experimental comparisons show retrieval performance that compares favourably with previous work.

38 citations


Proceedings Article
01 Jan 1997
TL;DR: The MultiText project participated in the routing and adhoc tasks, and in the Chinese, high precision and very large collection tracks, the Multitext system retrieves passages, rather than entire documents.
Abstract: The MultiText project participated in the routing and adhoc tasks, and in the Chinese, high precision and very large collection tracks. The Multitext system retrieves passages, rather than entire documents

29 citations