scispace - formally typeset
Open AccessProceedings ArticleDOI

Effective retrieval of structured documents

Ross Wilkinson
- pp 311-317
TLDR
This work considers what information is needed to retrieve effectively and shows that knowledge of the structure of documents can lead to improved retrieval performance.
Abstract
Information systems usually retrieve whole documents as answers to queries. However, it may in some circumstances be more appropriate to retrieve parts of documents. We consider formulas for retrieving whole documents and parts of documents horn a large structured document collection. We consider what information is needed to retrieve effectively and show that knowledge of the structure of documents can lead to improved retrieval performance.

read more

Citations
More filters
Proceedings ArticleDOI

Simple BM25 extension to multiple weighted fields

TL;DR: This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents and proposes a much more intuitive alternative which weights term frequencies before the non-linear term frequency saturation function is applied.
Proceedings ArticleDOI

Passage-level evidence in document retrieval

TL;DR: The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages, but questions about how passages are defined, how they can be ranked efficiently, and what is their proper role in long, structured documents are raised.
Patent

Method and apparatus for generating query responses in a computer-based document retrieval system

TL;DR: In this article, a method and apparatus for generating responses to queries to a document retrieval system is presented, which responds to a specific request for information by locating and ranking portions of text that may contain the information sought.
Posted Content

Pretrained Transformers for Text Ranking: BERT and Beyond

TL;DR: This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques.
Proceedings ArticleDOI

Passage retrieval revisited

TL;DR: This paper compares their scheme of arbitrary passage retrieval to several other document retrieval and passage retrieval methods and shows experimentally that, compared to these methods,ranking via fixed-length passages is robust and effective.
References
More filters
Book

Automatic text processing

Gerard Salton
Proceedings ArticleDOI

Approaches to passage retrieval in full text information systems

TL;DR: New approaches are described in this study for implementing selective passage retrieval systems, and identifying text passages responsive to particular user needs.
Proceedings ArticleDOI

Subtopic structuring for full-length document access

TL;DR: It is argued that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access and a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text.
Journal ArticleDOI

Overview of the second text retrieval conference (TREC-2)

TL;DR: The second Text Retrieval Conference (TREC-2) was held in August, 1993, and was attended by about 150 people involved in 31 participating groups as discussed by the authors, with a large variation of retrieval techniques reported on, including methods using automatic thesaurii, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching.
Proceedings ArticleDOI

The use of cluster hierarchies in hypertext information retrieval

TL;DR: An hierarchical structure is described which effectively supports the graphical traversal of a document collection in a hypertext system and an overview of an interactive browser based on cluster hierarchies is provided.