Showing papers on "Document retrieval published in 1979"

PDF

Open Access

Journal Article•DOI•

Using probabilistic models of document retrieval without relevance information

[...]

W. B. Croft¹, David J. Harper²•Institutions (2)

University of Massachusetts Amherst¹, University of Cambridge²

01 Apr 1979-Journal of Documentation

TL;DR: In this paper, the authors consider the situation where no relevance information is available, that is, at the start of the search, and propose strategies based on a probabilistic model for the initial search and an intermediate search.

...read moreread less

Abstract: Most probabilistic retrieval models incorporate information about the occurrence of index terms in relevant and non‐relevant documents. In this paper we consider the situation where no relevance information is available, that is, at the start of the search. Based on a probabilistic model, strategies are proposed for the initial search and an intermediate search. Retrieval experiments with the Cranfield collection of 1,400 documents show that this initial search strategy is better than conventional search strategies both in terms of retrieval effectiveness and in terms of the number of queries that retrieve relevant documents. The intermediate search is shown to be a useful substitute for a relevance feedback search. Experiments with queries that do not retrieve relevant documents at high rank positions indicate that a cluster search would be an effective alternative strategy.

...read moreread less

399 citations

Journal Article•DOI•

A mathematical model of a weighted boolean retrieval system

[...]

W.G. Waller¹, Donald H. Kraft¹•Institutions (1)

Louisiana State University¹

01 Jan 1979-Information Processing and Management

TL;DR: Criteria are given for the functions used to evaluate the relevance of the records to a specific query, including self-consistency, as a generalization of a Boolean retrieval system.

...read moreread less

Abstract: The use of weights to denote a query representation and/or the indexing of a document is analysed as a generalization of a Boolean retrieval system. Criteria are given for the functions used to evaluate the relevance of the records to a specific query, including self-consistency. Various mechanisms suggested in the literature for evaluating the relevance of records with regard to a given query are tested and found to be less than satisfactory. A new approach is suggested to avoid some of the perils of a weighted Boolean retrieval system.

...read moreread less

161 citations

Journal Article•DOI•

Fuzzy set theoretical approach to document retrieval

[...]

Tadeusz Radecki

01 Jan 1979-Information Processing and Management

TL;DR: A new method of document retrieval based on the fundamental operations of the fuzzy set theory is presented, starting by introducing basic notions, then the syntax and semantics of the proposed language for document retrieval will be given and an algorithm allocating documents to particular queries will be described and its properties discussed.

...read moreread less

Abstract: The aim of a document retrieval system is to issue documents which contain the information needed by a given user of an information system The process of retrieving documents in response to a given query is carried out by means of the search patterns of these documents and the query It is thus clear that the quality of this process, ie the pertinence of the information system response to the information need of a given user depends on the degree of accuracy in which document and query contents are represented by their search patterns It seems obvious that the weighting of descriptors entering document search patterns improves the quality of the document retrieval process A mathematical apparatus which takes into consideration, in a natural manner, the fact that the grades of importance of the descriptors in document search patterns are of the continuum type, that is an apparatus adequate to the description of a retrieval system of documents indexed by weighted descriptors is—among known mathematical methods—the theory of fuzzy sets, formulated by LA Zadeh It is the aim of this paper to present a new method of document retrieval based on the fundamental operations of the fuzzy set theory We start by introducing basic notions, then the syntax and semantics of the proposed language for document retrieval will be given and an algorithm allocating documents to particular queries will be described and its properties discussed The basic advantage of the use of the fuzzy set theory for document retrieval system description is that it takes into consideration, in a simple way, the differentiation of the importance of descriptors in document search patterns and the differentiation of the formal relevance grades of particular documents of an information system to a given query Documents of the highest grades (in the given information system) of formal relevance to the given query may be retrieved by means of the application of simple operations of the fuzzy set theory

...read moreread less

154 citations

An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems.

[...]

Michael McGill

01 Oct 1979

88 citations

Journal Article•DOI•

Text Retrieval Computers

[...]

Hollaar¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Mar 1979-IEEE Computer

TL;DR: The hardware required for efficient text retrieval differs from that required for retrieval of formatted data, particularly term comparators.

...read moreread less

Abstract: The hardware required for efficient text retrieval differs from that required for retrieval of formatted data. Here is an examination of such hardware, particularly term comparators.

...read moreread less

87 citations

Journal Article•DOI•

Document retrieval experiments using indexing vocabularies of varying size. ii. hashing, truncation, digram and trigram encoding of index terms

[...]

Peter Willett¹•Institutions (1)

University of Sheffield¹

01 Apr 1979-Journal of Documentation

TL;DR: Experiments with the Cranfield test collection show that trigram encoding of words performs noticeably better than the use of digrams; however, use of the least frequent digram in each term produces more acceptable results.

...read moreread less

Abstract: This paper describes the use of fixed‐length character strings for controlling the size of indexing vocabularies in reference retrieval systems. Experiments with the Cranfield test collection show that trigram encoding of words performs noticeably better than the use of digrams; however, use of the least frequent digram in each term produces more acceptable results. Hashing of terms gives a better performance than that obtained from a vocabulary of comparable size produced by right‐hand truncation. The application of small indexing vocabularies to the sequential searching of large document files is discussed.

...read moreread less

54 citations

Journal Article•DOI•

Document retrieval experiments using indexing vocabularies of varying size. i. variety generation symbols assigned to the fronts of index terms

[...]

John Burnett¹, David Cooper¹, Michael F. Lynch¹, Peter Willett¹, Maureen Wycherley¹ - Show less +1 more•Institutions (1)

University of Sheffield¹

01 Mar 1979-Journal of Documentation

TL;DR: A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections.

...read moreread less

Abstract: A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The vocabularies considered are sets of variable‐length character strings chosen from the fronts of document and query terms so as to occur with approximate equifrequency. Sets containing between 120 and 720 members were tested both using an application of the Cluster Hypothesis and in a series of linear associative retrieval experiments. The effectiveness of the smaller sets is low but the larger ones exhibit retrieval characteristics comparable to those of words.

...read moreread less

25 citations

Journal Article•DOI•

Depth of indexing

[...]

M. E. Maron¹•Institutions (1)

University of California, Berkeley¹

01 Jul 1979-Journal of the Association for Information Science and Technology

TL;DR: It is shown that the issue of depth of indexing is, in fact, not a central issue in the design of effective document retrieval systems and is a logical consequence of answers to more fundamental questions about indexing and retrieval.

...read moreread less

Abstract: For many years it has been believed that in order to design optimal document retrieval systems one must assign index terms to documents at their optimal depth; therefore, it was of primary importance to answer the following question: “What is the optimal depth of indexing?” This article offers an analysis and answer to this question. We show that the issue of depth of indexing is, in fact, not a central issue in the design of effective document retrieval systems. It turns out that the answer to the question about optimal depth is a logical consequence of answers (which this article provides) to more fundamental questions about indexing and retrieval.

...read moreread less

16 citations

Journal Article•DOI•

Properties of a model of information retrieval system based on thesaurus with weights

[...]

Zygmunt Mazur¹•Institutions (1)

Wrocław University of Technology¹

01 Jan 1979-Information Processing and Management

TL;DR: A model of information retrieval system based on thesaurus with weights is described, with emphasis onclusiveness and two other fundamental properties of the considered system are given.

...read moreread less

Abstract: This paper describes a model of information retrieval system based on thesaurus with weights. Definitions of the following terms: thesaurus, document description, information query, similarity of queries and descriptions of documents, similarity measure and accuracy of response are given. Inclusiveness and two other fundamental properties of the considered system are given.

...read moreread less

8 citations

Journal Article•DOI•

Inverted File Organization in the Information Retrieval System Based on Thesaurus with Weights.

[...]

Zygmunt Mazur¹•Institutions (1)

Wrocław University of Technology¹

01 Jan 1979-Information Processing and Management

TL;DR: Property and operations on inverted files, which are used in system based on thesaurus with weights, are studied in this paper.

...read moreread less

Abstract: The inverted file structure is often used to organize data in the information retrieval system. When the hierarchy relation on the set descriptors and weights of descriptors in document description would be taken into account, the conventional concept of the inverted file may be extended. Properties and operations on inverted files, which are used in system based on thesaurus with weights, are studied in this paper.

...read moreread less

6 citations

Journal Article•DOI•

The use of normal multiplication tables for information storage and retrieval

[...]

Dalia Motzkin¹•Institutions (1)

University of Haifa¹

01 Mar 1979-Communications of The ACM

TL;DR: For a certain class of information systems, the normal multiplication table method yields far more rapid retrieval with a more economical space requirement than conventional systems, and incorporates an improved modification of the inverted file technique.

...read moreread less

Abstract: This paper describes a method for the organization and retrieval of attribute based information systems, using the normal multiplication table as a directory for the information system. Algorithms for the organization and retrieval of information are described. This method is particularly suitable for queries requesting a group of information items, all of which possess a particular set of attributes (and possibly some other attributes as well). Several examples are given; the results with respect to the number of disk accesses and disk space are compared to other common approaches. Algorithms evaluating the appropriateness of the above approach to a given information system are described. For a certain class of information systems, the normal multiplication table method yields far more rapid retrieval with a more economical space requirement than conventional systems. Moreover this method incorporates an improved modification of the inverted file technique.

...read moreread less

Journal Article•DOI•

The interface between computerized retrieval systems and micrographic retrieval systems

[...]

George McMurdo

01 Dec 1979-Journal of Information Science

TL;DR: The conclusion is that with a combination of advances in communications technol ogy, and sophisticated indexing input from librarians and information scientists, the new generation of automated micrographs devices may constitute the on-line document retrieval systems of the future.

...read moreread less

Abstract: This paper notes the benefits accruing from interaction between computerized retrieval systems and micrographic retrieval systems. It reviews current state of automated micrographic retrieval technology. The conclusion is that with a combination of advances in communications technol ogy, and sophisticated indexing input from librarians and information scientists, the new generation of automated micrographs devices may constitute the on-line document retrieval systems of the future.

...read moreread less

Proceedings Article•DOI•

Interactive computing in a project-oriented file organization course

[...]

Alan L. Tharp¹•Institutions (1)

North Carolina State University¹

01 Jan 1979

TL;DR: The paper highlights the use and impact of interactive computing, the choice of a project implementation language, and the relationship of the course to an individual's transition from student to professional, and a comparison of project grade and the associated computer development time.

...read moreread less

Abstract: In the decade since Curriculum '68 [1], the suggested structure of courses related to data management has evolved, as evidenced by the report of the ACM committee on curriculum in 1977 [2] and also noted by Dale [3]. A course in Curriculum '68 entitled “Information Organization and Retrieval” [IOR] does not appear in the 1977 report, while a new course in "File Processing [FP] is included. Influenced by Curriculum '68, N.C. State in 1970 instituted a senior-level course entitled “Information Retrieval” to correspond essentially to the IOR course. Over the years that course in information retrieval has changed gradually, as material related to document retrieval has been supplanted by material related to file organization. Although the title has remained constant, the content is now more similar to FP than IOR. This paper describes the current project-oriented course in information retrieval which stresses the importance of query languages in an information retrieval system. In addition, the paper highlights the use and impact of interactive computing, the choice of a project implementation language, and the relationship of the course to an individual's transition from student to professional. The paper concludes with a comparison of project grade and the associated computer development time.

...read moreread less

Journal Article•DOI•

Document representation models for retrieval systems

[...]

L. L. Miller¹•Institutions (1)

Southern Methodist University¹

01 Sep 1979

TL;DR: Document retrieval system models are presented and measures to rank the closeness of documents to a query are given.

...read moreread less

Abstract: Document retrieval system models are presented. Measures to rank the closeness of documents to a query are given. Algorithms to calculate the measures for graph and partition models are provided.

...read moreread less

Journal Article•DOI•

On the implementation of some models of document retrieval

[...]

W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

01 Sep 1979

TL;DR: The main conclusion is that models which concentrate on improving the effectiveness of the search process are not rendered redundant by the availability of new hardware, however, the efficiency of their implementation would be improved.

...read moreread less

Abstract: Recently several models of the search process in a document retrieval system have been proposed and retrieval experiments have shown that they will improve system performance. These include models which use relevance judgements to rank documents in order of probability of relevance and models of retrieval from clusters of documents. In this paper various models are compared in terms of the ease with which they could be implemented. An important consideration is how this implementation would be affected by the introduction of new hardware such as content-addressable memories. The main conclusion is that models which concentrate on improving the effectiveness of the search process are not rendered redundant by the availability of new hardware. However, the efficiency of their implementation would be improved.

...read moreread less

Journal Article•DOI•

Modern on-line systems: challenges to research

[...]

Karen Sparck Jones¹•Institutions (1)

University of Cambridge¹

08 Jan 1979

TL;DR: It is argued that since modern on-line systems have more than achieved the technological aims of the original workers in the field, there is no further need for research in automatic information retrieval.

...read moreread less

Abstract: Automatic information retrieval, that is document retrieval, was an early concern in computing. It might, however, be thought that since modern on-line systems have more than achieved the technological aims of the original workers in the field, there is no further need for research. I shall argue that this is not the case.

...read moreread less

Suggestions for a Uniform Representation of Query and Record Content in Data Base and Document Retrieval

[...]

Gerard Salton

01 Jan 1979

TL;DR: A standard approach is introduced for the representation of information content in data base and document retrieval environments and the use of composite concept vectors representing individual information items leads to a uniform system in different retrieval situations.

...read moreread less

Abstract: A standard approach is introduced for the representation of information content in data base and document retrieval environments. The use of composite concept vectors representing individual information items leads to a uniform system in different retrieval situations for the identification of answers in response to incoming information requests.

...read moreread less

Journal Article•DOI•

Document retrieval using associative processors

[...]

F.N. Teskey

17 Aug 1979-Information Processing Letters

Journal Article•DOI•

Use of dynamic discrimination values in a document retrieval system

[...]

Robert T. Dattola¹•Institutions (1)

Xerox¹

01 Sep 1979

TL;DR: It is shown that regular discrimination values are too costly to compute after every update to the data base and dynamic discrimination values that are easy to update are defined for use as approximations to regular values.

...read moreread less

Abstract: The use of discrimination values as a term weighting function in document retrieval systems is examined. It is shown that regular discrimination values are too costly to compute after every update to the data base. Dynamic discrimination values that are easy to update are defined for use as approximations to regular values. Experiments are performed comparing regular vs. dynamic discrimination values. Actual user queries from an operational data base are used to evaluate dynamic discrimination values in a production environment. Generalized forms of normalized recall and precision are used as evaluation measures. Retrieval results indicate statistically significant improvements using dynamic discrimination weighting.

...read moreread less