scispace - formally typeset
Search or ask a question

Showing papers on "Inverted index published in 1977"


Journal ArticleDOI
TL;DR: The study found that relevant documents were ranked significantly higher than nonrelevant documents in the set of documents retrieved in response to a Boolean query.
Abstract: This study examined the effectiveness and efficiency of employing a fully automatic algorithm for ranking the results of Boolean searches of an inverted file design document retrieval system. The study indicated that with minor modification of file designs, such as those implemented in the Syracuse Information Retrieval Experiment (SIRE), document retrieval systems could efficiently provide users with output lists on which the rank order of a document is a good indicator of its probable relevance to the user's information need. The study found that relevant documents were ranked significantly higher than nonrelevant documents in the set of documents retrieved in response to a Boolean query. By utilizing an augmented inverted file design the variable incremental cost for ranked output was only ten cents per query. There was no increased user effort.

92 citations


Journal ArticleDOI
TL;DR: A cost function is developed for the evaluation of candidate indexing choices and applied to the optimization of index selection, demonstrating the increased effectiveness of secondary indexes for large files, the effect of the relative rates of retrieval and maintenance, the greater cost of allowing for arbitrarily formulated queries, and the impact on cost of the use of different index structures.
Abstract: Secondary indexes are often used in database management systems for secondary key retrieval. Although their use can improve retrieval time significantly, the cost of index maintenance and storage increases the overhead of the file processing application. The optimal set of indexed secondary keys for a particular application depends on a number of application dependent factors. In this paper a cost function is developed for the evaluation of candidate indexing choices and applied to the optimization of index selection. Factors accounted for include file size, the relative rates of retrieval and maintenance and the distribution of retrieval and maintenance over the candidate keys, index structure, and system charging rates. Among the results demonstrated are the increased effectiveness of secondary indexes for large files, the effect of the relative rates of retrieval and maintenance, the greater cost of allowing for arbitrarily formulated queries, and the impact on cost of the use of different index structures.

65 citations


Journal ArticleDOI
TL;DR: Using this equipment, a complicated sample search involving 70 terms and over 67 000 document references can be performed from 13 to 60 times faster than with a conventional machine.
Abstract: Response time in large, inverted file document retrieval systems is determined primarily by the time required to access files of document identifiers on disk and perform the processing associated with a Boolean search request. This paper describes a specialized computer system capable of performing these functions in hardware. Using this equipment, a complicated sample search involving 70 terms and over 67 000 document references can be performed from 13 to 60 times faster than with a conventional machine. Alternatively, many small searches can be processed concurrently with little effect upon system performance. Similar configurations can be applied to standard merging and sorting problems.

25 citations


Proceedings ArticleDOI
13 Jun 1977
TL;DR: The design principles of an automatic system that has the ability to choose the physical design for a data base and to adapt this design to changing requirements are presented.
Abstract: Physical data base design, the selection of organizational structures and access mechanisms for a data base, is one of the most important responsibilities of a Data Base Administrator (DBA). A DBA often has difficulty in performing this task; he lacks the information needed to choose a design that is well matched to the data base's mode of use.This paper presents the design principles of an automatic system that has the ability to choose the physical design for a data base and to adapt this design to changing requirements. The components of such a system include: an information gathering module that collects global statistics on the overall usage pattern of the data base; a predictor that projects observed usage statistics into the future; a design evaluator that computes a figure of merit for any proposed design; and a heuristic proposer that synthesizes a small set of candidate designs for detailed consideration. These principles have been applied to the design of a system that selects secondary indices for an inverted file system.

13 citations


Book ChapterDOI
01 Jan 1977
TL;DR: A common misconception is that associative processing is a special mode of computation which can only be achieved at high expense with complex hardware components, but this is not the case.
Abstract: A common misconception is that associative processing is a special mode of computation which can only be achieved at high expense with complex hardware components. Consequently, it is often maintained that associative processing can only be justified in certain dedicated computer applications for which conventional computer hardware is cost-ineffective. In truth, associative processing is a natural form of information processing and its features are independent of the machine on which it is implemented. Moreover, computer systems supporting the storage, retrieval and processing of non-numerical information are inevitably associative processing systems, whether or not this was intended by their designers. To understand this, perhaps controversial, contention it is helpful to reflect on the nature of information itself.

12 citations


Journal ArticleDOI

12 citations


Journal ArticleDOI
TL;DR: For analytical, inventory, and a variety of other basic types of geological data the main functions of an information management system can be accommodated by simple systems in which comprehensiveness is compromised in favor of practicality and ease of implementation.
Abstract: For analytical, inventory, and a variety of other basic types of geological data the main functions of an information management system can adequately be accommodated by simple systems in which comprehensiveness is compromised in favor of practicality and ease of implementation. Albeit possessing some shortcomings, such a strategy is likely to prove profitable particularly to geologists in developing nations who are confronted with the task of self-developing much needed geological data systems in the face of limited electronic data processing resources. Based on the experience of the Geological Survey of Israel, several considerations and practical guidelines for the design and implementation of such systems can be outlined. Data bases should be limited in their scope to specific subjects or projects, be designed to serve existing and only the more realistic foreseeable needs, and include provisions for merger and intelligent communication with related files. Such data bases typically contain logically simple-structured information and are small in size. Revision, deletion, and update transactions are infrequent; the search criteria for retrieval are for the most part predictable and a fast response time is not essential. These attributes prescribe a preference for simple fixed- or semi-free-format sequential files which, in turn, simplify appreciably the programming of the supporting software. Input forms should be meticulously planned with due consideration given to aspects of software compatibility, user convenience and acceptance, and efficiency in data gathering. The use of standard forms should be integrated into the institution's routine to facilitate direct data entry by each contributor, thereby improving and economizing the data collection process, and to secure data capture at its acquisition level (field, laboratory). The user's more immediate retrieval needs are adequately satisfied by a master list, documenting the entire data base and a number of external inverted index directories cross-referencing the master list according to the attributes by which the file is most likely to be searched. Further development of output capabilities should be directed to provide for flexible retrieval by multikey query functions and base map posting. For data files storing raw chemical analyses of rocks and water samples, the incorporation of processing capabilities to compute interpretative geochemical parameters as an integral part of the system's output is particularly useful.

3 citations



Journal ArticleDOI
TL;DR: In the course of validating searches in a free-text data base taken from the Merck Index, an inverted Index was produced which can be entered by medical use rather than by compound name.

1 citations