scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1978"


Journal ArticleDOI
TL;DR: This paper reports experiments with a term weighting model incorporating relevance information in which it is assumed that index terms are distributed dependently and argues that if high recall searches are required, relevance feedback based on the modified dependence model may be superior to the widely used Boolean search.
Abstract: This paper reports experiments with a term weighting model incorporating relevance information in which it is assumed that index terms are distributed dependently. Initially this model was tested with complete relevance information against a similar model which assumes index terms are distributed independently. The experiments demonstrated conclusively that index terms are not independent for a number of diverse document collections. It was concluded that the use of relevance information together with dependence information could potentially improve retrieval effectiveness. As a result of further experiments the initial strict dependence model was modified and in particular a new relevance‐based term weight was developed. This modified dependence model was then used as the basis for relevance feedback, i.e. with partial relevance information only, and significant increases in retrieval effectiveness were achieved. The evaluation method used in the feedback experiments emphasized the effect of the feedback on documents which the potential user would not previously have seen. Finally the incorporation of relevance feedback in an operational system is considered and in particular it is argued that if high recall searches are required, relevance feedback based on the modified dependence model may be superior to the widely used Boolean search.

170 citations


Journal ArticleDOI
Gertrud Herlach1
TL;DR: In this article, the authors tested and accepted that the mechanistically identifiable citation link characteristic, mention of a given reference more than once within the same research paper, indicates a close and useful relationship of a citing to a given cited paper.
Abstract: The hypothesis is tested and accepted that the mechanistically identifiable citation link characteristic, mention of a given reference more than once within the same research paper, indicates a close and useful relationship of a citing to a given cited paper. Closeness and usefulness of the relationship between papers linked by citation were determined by means of users' judgments. It is shown that as a selection criterion for document retrieval, multiple mention of a reference would yield good precision but low recall, since a considerable number of papers with corresponding single mention were judged closely related to the given cited paper. Frequency counts showed that approximately one-third of all bibliographic references in the research papers checked are mentioned in the text more than once.

62 citations


Book ChapterDOI
Hans-Jörg Schek1
10 Oct 1978
TL;DR: Common restrictions such as the usage only of a certain set of descriptors or (complete) keywords in document retrieval systems or the specification of only certain attributed values for queries in formatted files should be removed without loosing performance necessary for interactive usage.
Abstract: The motivation for the reference string indexing method may be derived from the intention to retrieve any piece of information by specifying arbitrary parts of it. Common restrictions such as the usage only of a certain set of descriptors or (complete) keywords in document retrieval systems or the specification of only certain (inverted) attributed values for queries in formatted files should be removed without loosing performance necessary for interactive usage.

26 citations


Journal ArticleDOI
01 May 1978
TL;DR: In the first section of the paper the basic characteristics of WEIRD are described and the results of a preliminary evaluation are reported, and alternatives for further development ofWEIRD are considered.
Abstract: WEIRD is an automatic document retrieval system designed and implemented at Syracuse University, which attempts to advance the art of computerized retrieval from word-matching to judging conceptual similarity. WEIRD uses a vector space model to represent the relations among terms and documents. Items in the space are located according to their “meaning”, which is their proximity to all other items in the data base as measured by co-occurrence frequencies. This is done without manipulating large matrices. The dimensions of the space are not used to define relations; items are defined solely by their position relative to the other items. Retrieval is determined by Euclidean distance from the plotted query. In the first section of the paper the basic characteristics of WEIRD are described. Second, the results of a preliminary evaluation are reported. Alternatives for further development of WEIRD are then considered.

26 citations


Journal ArticleDOI
TL;DR: The proposed solution is to provide the user with a simple, clearly designed subset of the language that nevertheless includes all important query functions, while the additions to modify, shorten, improve, and extend it are left to the experienced user.
Abstract: Query languages for document retrieval systems should be simple and easy to learn for the casual user; they should provide all conceivable facilities for the experienced user. These goals comprise the most serious contradictions that evolve between all the design criteria collected, compared, and evaluated in this paper. The proposed solution or, at least, relief to this conflict is to provide the user with a simple, clearly designed subset of the language that nevertheless includes all important query functions, while the additions to modify, shorten, improve, and extend it are left to the experienced user. It is stressed that the simple data formats available with most systems are insufficient; the need for more elaborate structures is substantiated. A point is made for a formal rather than a natural language for document retrieval.

26 citations


Proceedings Article
13 Sep 1978
TL;DR: Development efforts being carried out to produce backend systems for the efficient searching and retrieval of full text databases, and the characteristics of text retrieval, are presented.
Abstract: While there have been a number of projects involved with the design and construction of specialized processors to aid in the efficient operation of large structured database systems, such as RAP or CASSM, very little work has been done on comparable hardware for text information retrieval. This paper summarizes development efforts being carried out to produce backend systems for the efficient searching and retrieval of full text databases. The characteristics of text retrieval, and its special problems when compared to other database systems, are presented. Two representative applications are discussed, one the retrieval of relevant items from a database being updated online from messages originating from a large number of sources, and the second a legal reference system consisting of all court decisions. Processors to scan large amounts of data at speeds comparable to the transfer rate of the disks on which it is stored are presented, along with a network of simple processors to allow rapid merging of directory information for inverted file systems.

12 citations



Journal ArticleDOI
01 May 1978
TL;DR: Relevance feedback techniques as implemented in Salton's SMART DRS appear to show that it is worthwhile for user's to read abstracts prior to evaluation of full texts, and three reasonable, easily understood retrieval procedures are presented.
Abstract: Many authors (1, 2, 3, 5, 6, 7) have suggested that overall performance of a document retrieval system is improved by relevance feedback. Relevance feedback denotes the last three steps in the following process: 1) the searcher enters a query, 2) the system prepares a ranked list of suggested documents, 3) the searcher judges some of the documents for relevancy, 4) the searcher informs the system of these documents judged and of the judgement, 5) the system constructs a new query based on the descriptors used in the original query and the descriptors used in the documents judged, 6) the system prepares a second ranked list of suggested documents. The presumption is that the second list is better than the first. By all performance measures (e.g. “fluid ranking” and “frozen ranking”), the second list is better than the first. However, if one reranks documents in the original list so as to reflect the searcher's efforts (step 3), the corresponding performance measures are comparable to those for the second list. The marginal difference between the performance measures for the ”reranked original” list (searcher's efforts alone) and the second list (which includes computer efforts) makes it unclear if the cost of steps 4 through 6 above can be justified. It is hoped that advocates of relevance feedback will present “reranked original” performance measures as a basis for any performance improvement claims. This paper also presents three reasonable, easily understood retrieval procedures for which the frozen ranking, the fluid ranking, and the reranked original evaluations are “obviously” the pertinent way to evaluate. Relevance feedback techniques as implemented in Salton's SMART DRS appear to show that it is worthwhile for user's to read abstracts prior to evaluation of full texts. The last indication presented in this paper is that the relevance feedback performance improvements noted using SMART are due mostly to the user making assessments; subsequent computer efforts appear to be most likely to result in no further change. For a query for which there is a subsequent change, the change is as likely to be harmful as helpful.

8 citations


Journal ArticleDOI
01 May 1978
TL;DR: This paper illustrates the use of Augmented Transition Networks as a design tool for constructing document retrieval systems for those personalized applications which are too small or specialized to attract a commercial vendor.
Abstract: This paper illustrates the use of Augmented Transition Networks (ATNs) as a design tool for constructing document retrieval systems for those personalized applications which are too small or specialized to attract a commercial vendor. ATNs, which are explained in the context of this application, are used not only to improve the human/computer interface with the retrieval system but also to conceptually organize its structure.

6 citations


Book
01 Jan 1978

3 citations





Proceedings ArticleDOI
01 Jan 1978
TL;DR: It is proposed that a particular infinite-valued propositional calculus be used to define a retrieval function for a document retrieval system in which queries are well-formed formulas of the propositionally calculus.
Abstract: It is proposed that a particular infinite-valued propositional calculus be used to define a retrieval function for a document retrieval system in which queries are well-formed formulas of the propositional calculus. With this approach, “weighted” queries may be processed in a simple, straightforward manner; queries may be represented as highly sparse vectors, and this representation may be transformed easily into the more conventional truth-table representation, and conversely; feedback techniques may be used; and a means is provided for circumventing the “independence assumption” with regard to subject identifiers.

Journal ArticleDOI
01 May 1978
TL;DR: This paper studies the problem of record address allocation in disk-like devices so as to facilitate the fast retrieval of a set of records which are jointly accessed by a query.
Abstract: Query retrieval based on secondary keys is an important operation in retrieval systems. Such a query generally retrieves more than one data record which satisfies the query criterion. This paper studies the problem of record address allocation in disk-like devices so as to facilitate the fast retrieval of a set of records which are jointly accessed by a query. A heuristic scheme, using the proposed minimal access retrieval property, is designed to assign records to blocks. Some experimental results are also presented.