Showing papers on "Document retrieval published in 1977"

PDF

Open Access

Journal Article•DOI•

Local Feedback in Full-Text Retrieval Systems

[...]

R. Attar¹, Aviezri S. Fraenkel¹•Institutions (1)

01 Jul 1977-Journal of the ACM

TL;DR: Local clustering is practical also for large databases and appears to improve overall performance, especially if metrical constraints and weighting by proximity are embedded m the local feedback.

...read moreread less

Abstract: AaSTRACT. In a full-text natural-language retrieval system, local feedbacl~ is the process of formulating a new ~mproved search based on clustering terms from the documents returned m a previous search of any given query Experiments were run on a database of US patents It ~s concluded that m contrast toglobalclustermg, w h e r e the size of matrices hmmts apphcatmns to small databases and improvements are doubtful, local clustering is practical also for large databases and appears to improve overall performance, especially tf metrical constraints and weighting by proximity are embedded m the local feedback The local methods adapt themselves to each mdwtdual search and produce useful searchonyms terms which are \"synonymous\" m the context of one query Searchonyms lead to new ~mproved search formulahons both via manual and vm automahc feedback

...read moreread less

266 citations

Journal Article•DOI•

On indexing, retrieval and the meaning of about

[...]

M. E. Maron¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1977-Journal of the Association for Information Science and Technology

TL;DR: This paper shows how aboutness is related to probability of satisfaction and shows that about is, in fact, not the central concept in a theory of document retrieval.

...read moreread less

Abstract: The primary objective of this paper is to examine the concept of about as it is used in its information retrieval sense when, for example, an indexer judges that a document is (or is not) about some given subject. The problem with about is that it is a very complex notion and we are unable to say precisely what it is we do when we make judgment of aboutness. Since about is at the heart of indexing, how are we to formulate any proper theory of indexing if we cannot explicate precisely the key concept of about? In this paper we look at this concept of about and offer a solution to the problem mentioned; it consists of an operational definition of about which interprets about in terms of search behavior. A second objective of this paper is to show that about is, in fact, not the central concept in a theory of document retrieval. A document retrieval system ought to provide a ranked output (in response to a search query) not according to the degree that they are about the topic sought by the inquiring patron, but rather according to the probability that they will satisfy that person's information need. This paper shows how aboutness is related to probability of satisfaction.

...read moreread less

170 citations

Journal Article•DOI•

Information Retrieval as a Trial-And-Error Process

[...]

Don R. Swanson

01 Apr 1977-The Library Quarterly

TL;DR: This paper examines three important and well-known information retrieval experiments, with a focus on certain internal inconsistencies and on the high variability of search results.

...read moreread less

Abstract: Recognition of the essential role of trial and error in access to scientific literature may point the way toward improved information services and may illuminate inconsistencies that have beset many retrieval experiments. This paper examines three important and well-known information retrieval experiments, with a focus on certain internal inconsistencies and on the high variability of search results. In these experiments, retrieval systems are evaluated in terms of their ability to select relevant documents and reject those that are irrelevant. It is suggested that this criterion is inadequate because of ambiguities inherent in the concept of relevance and that closer attention to trial-and-error processes may be helpful in developing better criteria. Specific examples of how one might improve document retrieval, library use, and citation indexing are offered.

...read moreread less

135 citations

Journal Article•DOI•

Automatic ranked output from boolean searches in SIRE

[...]

Terry Noreault¹, Matthew B. Koll¹, Michael J. McGill¹•Institutions (1)

Syracuse University¹

01 Nov 1977-Journal of the Association for Information Science and Technology

TL;DR: The study found that relevant documents were ranked significantly higher than nonrelevant documents in the set of documents retrieved in response to a Boolean query.

...read moreread less

Abstract: This study examined the effectiveness and efficiency of employing a fully automatic algorithm for ranking the results of Boolean searches of an inverted file design document retrieval system. The study indicated that with minor modification of file designs, such as those implemented in the Syracuse Information Retrieval Experiment (SIRE), document retrieval systems could efficiently provide users with output lists on which the rank order of a document is a good indicator of its probable relevance to the user's information need. The study found that relevant documents were ranked significantly higher than nonrelevant documents in the set of documents retrieved in response to a Boolean query. By utilizing an augmented inverted file design the variable incremental cost for ranked output was only ten cents per query. There was no increased user effort.

...read moreread less

92 citations

Journal Article•DOI•

Term Relevance Weights in On-Line Information Retrieval

[...]

Gerard Salton¹, R. K. Waldstein¹•Institutions (1)

Cornell University¹

01 Jul 1977-Information Processing and Management

TL;DR: Considerable evidence exists to show that the use of term relevance weights is beneficial in interactive information retrieval, and various relevance ranking systems are evaluated, including fully automatic systems based on inverse document frequency parameters, and human rankings performed by the user population.

...read moreread less

Abstract: Considerable evidence exists to show that the use of term relevance weights is beneficial in interactive information retrieval. Various term weighting systems are reviewed. An experiment is then described in which information retrieval users are asked to rank query terms in decreasing order of presumed importance prior to actual search and retrieval. The experimental design is examined, and various relevance ranking systems are evaluated, including fully automatic systems based on inverse document frequency parameters, human rankings performed by the user population, and combinations of the two.

...read moreread less

47 citations

Journal Article•DOI•

Operations Research Applied to Document Indexing and Retrieval Decisions

[...]

Abraham Bookstein¹, Donald H. Kraft²•Institutions (2)

University of Chicago¹, Louisiana State University²

01 Jul 1977-Journal of the ACM

TL;DR: The earher model is extended to include interactions among terms, which allows one to decide whether to retrieve a document by taking into consideration occurrences of all the words in the text.

...read moreread less

Abstract: This paper begins with a review of earher work in which a model of word occurrence formed the basis of a decision-making procedure for indexing or, more generally, retrieving documents in response to a request In the earlier work words were considered individually This paper extends the earher model to include interactions among terms The elaborated model allows one to decide whether to retrieve a document by taking into consideration occurrences of all the words in the text Retrieval in response to Boolean expresstons IS also considered, as are procedures for ranking documents in accordance with their assessed relevance to a request The discussion is within the framework of Bayesian decision theory

...read moreread less

45 citations

Proceedings Article•DOI•

Associative/parallel processors for searching very large textual data bases

[...]

R. M. Bird, J. C. Tu, R. M. Worthy

01 Jan 1977

TL;DR: The system, called the Associative File Processor (AFP), utilizes a conventional minicomputer for control, off-the-shelf high density disks for storage, a special purpose parallel search module as a text term detector, and query and retrieval software.

...read moreread less

Abstract: This paper describes an approach to solving a major problem in the information processing sciences— that of searching very large (5-50 billion characters) data bases of unstructured free-text for random queries within a reasonable time and at an affordable price.The need by information specialists and knowledge workers for large, fast low-cost text and document retrieval systems is growing rapidly. Conventional approaches to the problem have usually depended upon expensive, general purpose computers, upon special pre-preprocessing of the textual data (e.g. file inverting, indexing, abstracting, etc.), and upon elaborate, costly software. The resulting retrieval systems often cost hundreds of dollars per query and the full scanning of an uninverted, unstructured billion byte textual data base could take hours of computer services. However, in spite of these restrictions, such full text search systems have proved useful and even indispensible for many applications.Computer technology of the late 1960's and the 1970's, in both hardware and software (e.g., minicomputers, low-cost, high density disk storage, “chip” electronics, natural language query systems, etc.), have made i t practical to build special purpose, low-cost text retrieval systems. Such a system has been built, tested, and is now in a production stage. The system called the Associative File Processor (AFP), utilizes a conventional minicomputer (DEC's PDP-11/45) for control, off-the-shelf high density disks for storage, a special purpose parallel search module as a text term detector, and query and retrieval software. The AFP is currently being field tested at two sites. Full text, parallel searches on un-preprocessed textual data bases are being performed at the effective matching rates of 4 billion bytes per second (8K byte key memory times 500 Kbyte/second data stream). Estimated costs are 10 to 25 cents per query for a one billion byte data base. The costs per query and the time for searching increase in a linear fashion as data base increases. A basic architecture for the AFP is described and an implemented version is discussed. A more powerful term detector module is also under development. This system is designed around a finite state automaton algorithm.

...read moreread less

43 citations

Journal Article•DOI•

Automatic query adjustment in document retrieval

[...]

Carlo Vernimb

01 Jan 1977-Information Processing and Management

TL;DR: The automatic procedure is superior to traditional searching procedures in terms of both recall and precision and probably for more than 80% of the inquiries the need for a documentalist as an intermediary between the user and the system can be avoided.

...read moreread less

Abstract: A system is described for the automatic adjustment of queries addressed to information retrieval systems employing a structurised thesaurus for the coordinate indexing of an average of at least five or six descriptors per document. Starting with at least two documents considered by the user as relevant to his inquiry, the system formulates different queries using descriptors occuring in the relevant documents. Results from these queries are presented to the user for relevance assessment as a result of which the most efficient queries are automatically selected and loosened (broadened). The new documents retrieved are again checked for relevance by the user; and with new relevant documents the loop starts again. The result of the automatic procedure is independent of the point of departure. The automatic procedure is superior to traditional searching procedures in terms of both recall and precision. The automatic procedure requires more computing, but probably for more than 80% of the inquiries the need for a documentalist as an intermediary between the user and the system can be avoided.

...read moreread less

30 citations

Journal Article•DOI•

Mathematical model of time-effective information retrieval system based on the theory of fuzzy sets

[...]

Tadeusz Radecki¹•Institutions (1)

Wrocław University of Technology¹

01 Jan 1977-Information Processing and Management

TL;DR: The organization of a set of document search patterns proposed in the paper ensures the limitation of documentSearch pattern set searching process—when retrieving a response to a given information request—to one (or several) subset from previously determined subsets, which makes the information system response time acceptable.

...read moreread less

Abstract: Search patterns of documents and information requests are their better or worse representatives only, so it is important to carry on examinations on possibilities of designing self-learning information retrieval systems. Another important question is to elaborate such an organization of document search pattern set as to obtain an acceptable response time of the information system to a given information request. A self-learning process of the proposed information system consists in the determination—on a set of document and information request search patterns—of the similarity relation according to L. A. Zadeh. The organization of a set of document search patterns proposed in the paper ensures the limitation of document search pattern set searching process—when retrieving a response to a given information request—to one (or several) subset from previously determined subsets. This makes the information system response time acceptable. The proposed information retrieval strategy is discussed in terms of fuzzy sets.

...read moreread less

29 citations

Journal Article•DOI•

An Inverted File Processor for Information Retrieval

[...]

Stellhorn¹•Institutions (1)

United States Department of the Army¹

01 Dec 1977-IEEE Transactions on Computers

TL;DR: Using this equipment, a complicated sample search involving 70 terms and over 67 000 document references can be performed from 13 to 60 times faster than with a conventional machine.

...read moreread less

Abstract: Response time in large, inverted file document retrieval systems is determined primarily by the time required to access files of document identifiers on disk and perform the processing associated with a Boolean search request. This paper describes a specialized computer system capable of performing these functions in hardware. Using this equipment, a complicated sample search involving 70 terms and over 67 000 document references can be performed from 13 to 60 times faster than with a conventional machine. Alternatively, many small searches can be processed concurrently with little effect upon system performance. Similar configurations can be applied to standard merging and sorting problems.

...read moreread less

25 citations

Journal Article•DOI•

Information storage and retrieval: a survey and functional description

[...]

Jack Minker¹•Institutions (1)

University of Maryland, College Park¹

01 Sep 1977

TL;DR: Information Storage and Retrieval encompasses a broad scope of topics ranging from basic techniques for accessing data to sophisticated approaches for the analysis of natural language text and the deduction of information.

...read moreread less

Abstract: Information Storage and Retrieval (IS&R) encompasses a broad scope of topics ranging from basic techniques for accessing data to sophisticated approaches for the analysis of natural language text and the deduction of information. Within the field, three general areas of investigation can be distinguished not only by their subject matter but also by the types of individuals presently interested in them:(1) Document retrieval,(2) Generalized data management, and(3) Question-answering.A functional description which applies to each of the three areas is presented together with a survey of work being conducted. The similarities and differences of the three areas of IS&R are described. Typical systems which incorporate many of the functions and techniques are described in the appendix.

...read moreread less

Computation and Deductive Information Retrieval.

[...]

Maarten H. van Emden

01 Jan 1977

Book•

Information retrieval and the computer

[...]

Chris D. Paice

01 Jan 1977

Journal Article•DOI•

Retrieval performance and information theory

[...]

Mauro Guazzo

01 Jan 1977-Information Processing and Management

TL;DR: This paper challenges the meaningfulness of precision and recall values as a measure of performance of a retrieval system by advocating the use of a normalised form of Shannon's functions (entropy and mutual information).

...read moreread less

Abstract: This paper challenges the meaningfulness of precision and recall values as a measure of performance of a retrieval system. Instead, it advocates the use of a normalised form of Shannon's functions (entropy and mutual information). Shannon's four axioms are replaced by an equivalent set of five axioms which are more readily shown to be pertinent to document retrieval. The applicability of these axioms and the conceptual and operational advantages of Shannon's functions are the central points of the work. The applicability of the results to any automatic classification is also outlined.

...read moreread less

Proceedings Article•DOI•

A specialized architecture for textual information retrieval

[...]

L. A. Hollaar¹, W. H. Stellhorn²•Institutions (2)

University of Illinois at Urbana–Champaign¹, United States Department of the Army²

13 Jun 1977

TL;DR: Characteristics which distinguish text retrieval from retrieval of formatted files are discussed, and a computer configuration employing for special purpose processors is described.

...read moreread less

Abstract: Retrieval of information from the complete text of large document collections cannot be performed efficiently or rapidly by current general purpose digital computers or by most special purpose rotating memory associative processors frequently proposed for efficient processing of relational databases. Characteristics which distinguish text retrieval from retrieval of formatted files are discussed, and a computer configuration employing for special purpose processors is described.

...read moreread less

Journal Article•DOI•

Data Retrieval by Text Searching

[...]

John O'Connor

01 Aug 1977-Journal of Chemical Information and Computer Sciences

TL;DR: Sixty percent of the data papers in an experiment were retrieved by human-computer text searching, in which the human contribution consisted of selection of search words for input to the computer search.

...read moreread less

Abstract: Sixty percent of the data papers in an experiment were retrieved by human-computer text searching, in which the human contribution consisted of selection of search words for input to the computer search. Most of the successful retrieval consisted of identifying within papers those figures containing data asked for by the retrieval questions, and automatically labeling those data within the figures. The retrieval procedures are economically feasible now because they primarily require only that words from figures be in computer-readable form.

...read moreread less

Journal Article•DOI•

Document retrieval using a substring index

[...]

P. W. Williams¹, M. T. Khallaghi¹•Institutions (1)

University of Manchester¹

01 Jan 1977-The Computer Journal

Accessing Individual Records from Personal Data Files Using Non-Unique Identifiers. Final Report. Computer Science & Technology Series.

[...]

Gwendolyn B. Moore

01 Feb 1977

Journal Article•DOI•

Intelligent terminals

[...]

Mario C. Grignetti¹•Institutions (1)

BBN Technologies¹

01 Feb 1977-Intelligence\/sigart Bulletin

TL;DR: A system to help computer-naive people deal with their Intelligent Terminal (IT), an envisioned personal mini-computer of great sophistication and power, which will be capable of mediating intelligently between these users and the tools that will be available to them in the Intelligent Terminal.

...read moreread less

Abstract: Our objective is to develop a system to help computer-naive people deal with their Intelligent Terminal (IT), an envisioned personal mini-computer of great sophistication and power. Our system will be capable of mediating intelligently between these users and the tools that will be available to them in the Intelligent Terminal: it will provide streamlined instructions on how to use these tools, and it will answer questions about them posed in relatively unconstrained English. The system's teachings will rely heavily on letting people 'learn by doing', by setting up practice sessions under tutorial supervision. Our initial plan includes developing one such Intelligent oN-Line Assistant and Tutor (INLAT) system that will "know" about an editing system, a mail system, and a document retrieval system.

...read moreread less

Proceedings Article•

Evaluation of combinatorial file organization schemes

[...]

Hideto Ikeda

06 Oct 1977

TL;DR: This paper will describe implementations of three combinatorial file organization schemes, viz., an inverted filing scheme of order 1 (IFS1), a generalized Hiroshima University balanced filing schemes of order 2 (GHUBFS2), a filing scheme having consecutive retrieval property with redundancy (CRWR), as a document retrieval system.

...read moreread less

Abstract: This paper will describe implementations of three combinatorial file organization schemes, viz, an inverted filing scheme of order 1 (IFS1), a generalized Hiroshima University balanced filing scheme of order 2 (GHUBFS2), a filing scheme having consecutive retrieval property with redundancy (CRWR), as a document retrieval system The results of an experimentation for evaluating the efficiency of those storage and retrieval schemes will be presented The characteristic features of those schemes by the growth of the number of data will also be discussed

...read moreread less