scispace - formally typeset
Search or ask a question

Showing papers on "Inverted index published in 1986"


Proceedings ArticleDOI
01 Sep 1986
TL;DR: The processing time and disk space requirements of an inverted index and top-down cluster search are compared and the cluster search is shown to use both more time and more disk space.
Abstract: The processing time and disk space requirements of an inverted index and top-down cluster search are compared. The cluster search is shown to use both more time and more disk space, mostly due to the large number of cluster centroids needed by the search. When shorter centroids are used, the efficiency of the cluster search improves, but the inverted index search remains more efficient.

38 citations


Journal ArticleDOI
TL;DR: The presented retrieval rules may be viewed as the logical approach in implementing a physical distributed retrieval system that consists of n local retrieval systems.
Abstract: This paper describes how the operations on the local inverted files are to be modified in order to use them in the distributed information retrieval system based on thesauri. The global system consists of n local retrieval systems. The presented retrieval rules may be viewed as the logical approach in implementing a physical distributed retrieval system.

9 citations


Journal ArticleDOI
TL;DR: A technique of word coding that generates short fixed-length codes obtained from the index terms themselves by analysis of monogram and bigram statistical distributions is described, which preserves a word-to-word discrimination with a rate of three synonyms per 1300 terms.
Abstract: A new method of index term dictionary compression in an inverted-file-orientated database is discussed. A technique of word coding that generates short fixed-length codes obtained from the index terms themselves by analysis of monogram and bigram statistical distributions is described. Transformation of the index term dictionary into a code dictionary preserves a word-to-word discrimination with a rate of three synonyms per 1300 terms, at compression ratio up to 90% and at low cost in terms of the CPU time expenditure. When applied in computer network environment, it offers substantial savings in communication channel utilization at negligible response time degradation. Experimental data for 26,113 index term dictionary of the New York Times Info Bank available via a computer network are presented.

8 citations


01 Jan 1986
TL;DR: The nature of the data base management problems encountered in developing a life course perspective for family studies is described and a general procedure for indicating various units of analysis in the CASA system is provided.
Abstract: This paper describes the general approach and capabilities of CASA a data storage update and retrieval system for life course data in household and kinship context It describes the nature of the data base management problems encountered in developing a life course perspective for family studies The solutions to 4 particular problems were central in developing the system 1st an efficient means of storing and retrieving event-histories was needed The approach taken here was to store the data as separate event-history files and provide a means of linking the various files together 2nd a way to specify and activate multiple units of analysis was needed Rather than building the units that were important in the Casalecchio project directly into the system the authors provided a general procedure for indicating various units of analysis in the system This approach has the advantage of flexibility but the disadvantage of not necessarily being efficient to run There are units which in combination result in essentially incompatible ways of looking at the data For example from the standpoint of looking at the data for individuals the data are best thought of as stored in rows However for coresidence and kin ties it would probably be more compatible to think of the data as an inverted file or as columns A compromise was reached on the assumption that more of the accesses would pertain to individuals than strictly to kinship or coresidence so that the data are stored in the most efficient manner for this view 3rd there is a need to indicate how linkages among the files are to be made Rather than "hardwiring" specific views into the system CASA lets the user create the linkages by indicating how the identification codes are to be linked Finally a means of easily interacting with the data base to ask for items of interest had to be developed Here a menu-driven approach was taken Primary directions of future system development include expending the capacity of the system for preliminary statistical analyses and adapting the system to the personal computer environment

4 citations


Proceedings ArticleDOI
05 Feb 1986
TL;DR: The authors have developed an information retrieval system named AIR (Augmented Information Retrieval system), which might be one of the most efficient systems for very large document databases.
Abstract: The authors have developed an information retrieval system named AIR (Augmented Information Retrieval system), which might be one of the most efficient systems for very large document databases AIR can store the document data compactly and retrieve them quickly The techniques bringing AIR to the high efficiency, the data compression, the quick keyword index, and the automatic keyword selection, are discussed These techniques, which are based on the statistical properties of word occurrence, are fairly simple, so that the information retrieval systems employing them can be implemented with ease The data compression technique reduces English text by a factor of 4 The quick keyword index decreases the average number of disk accesses to retrieve a keyword to about 03 The automatic keyword selection technique roughly halves both the number of different keywords and the size of the inverted file with only 2% loss of retrieval power

3 citations


Journal ArticleDOI
J A Pino1
TL;DR: The development of a text retrieval software system for microcomputers is described, and two applications of this software are presented, including a bibliographic retrieval system for a small library and an office automation project.

1 citations