Search or ask a question

Showing papers by "Gordon V. Cormack published in 1997"

PDF

Open Access

Journal Article•DOI•

On the use of regular expressions for searching text

[...]

Charles L. A. Clarke¹, Gordon V. Cormack¹•Institutions (1)

University of Waterloo¹

01 May 1997-ACM Transactions on Programming Languages and Systems

TL;DR: The generally accepted rule of “leftmost longest match” is an unfortunate choice and is at the root of the difficulties and a rule is proposed which is semantically cleaner and generally applicable to a variety of text search applications, including source code analysis.

...read moreread less

Abstract: The use of regular expressions for text search is widely known and well understood. It is then surprising that the standard techniques and tools prove to be of limited use for searching structured text formatted with SGML or similar markup languages. Our experience with structured text search has caused us to reexamine the current practice. The generally accepted rule of “leftmost longest match” is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner. This rule is generally applicable to a variety of text search applications, including source code analysis, and has interesting properties in its own right. We have written a publicly available search tool implementing the theory in the article, which has proved valuable in a variety of circumstances.

...read moreread less

57 citations

Relevance ranking for one to three term queries

[...]

Charles L. A. Clarke¹, Gordon V. Cormack², Elizabeth A. Tudhope²•Institutions (2)

University of Toronto¹, University of Waterloo²

25 Jun 1997

TL;DR: This work investigates the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consisting of a page indicating several potentially relevant documents.

...read moreread less

Abstract: We investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating several potentially relevant documents. Traditional ranking methods for information retrieval, based on term and inverse document frequencies, have often been found to work poorly in this context. Under the cover density measure, ranking is based on term proximity and co-occurrence. Experimental comparisons show retrieval performance that compares favourably with previous work.

...read moreread less

38 citations

Proceedings Article•

Passage-Based Refinement (MultiText Experiements for TREC-6).

[...]

Gordon V. Cormack, Charles L. A. Clarke, Christopher R. Palmer, Samuel S. L. To

01 Jan 1997

TL;DR: The MultiText project participated in the routing and adhoc tasks, and in the Chinese, high precision and very large collection tracks, the Multitext system retrieves passages, rather than entire documents.

...read moreread less

Abstract: The MultiText project participated in the routing and adhoc tasks, and in the Chinese, high precision and very large collection tracks. The Multitext system retrieves passages, rather than entire documents

...read moreread less

29 citations