scispace - formally typeset
Search or ask a question

Showing papers on "Search engine indexing published in 1984"


Proceedings ArticleDOI
01 Jun 1984
TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.
Abstract: In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, traditional indexing methods are not well suited to data objects of non-zero size located m multi-dimensional spaces In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it. We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems in spatial applications

7,336 citations


Proceedings ArticleDOI
02 Jul 1984
TL;DR: The underlying conception of a rule based approach is given which is suited to the task of a controlled-vocabulary indexing of even large subject fields and first results of retrieval runs with weighted automatic indexing are described.
Abstract: The automatic indexing system AIR/PHYS and its evaluation by means of a retrieval test with 309 requests and 15,000 documents is described. First, the underlying conception of a rule based approach is given which is suited to the task of a controlled-vocabulary indexing of even large subject fields. Preconditions, performance and results of the retrieval test are described, including first results of retrieval runs with weighted automatic indexing.

44 citations


Proceedings ArticleDOI
02 Jul 1984
TL;DR: This work has been partially developed as part of the EEC ESPRIT project on "Mixed-Mode Message Filing System" in the Office Systems area and evaluated in function of office environment requirements.
Abstract: This paper compares two different approaches for indexing archived text documents. The first approach is based on inversion of words in the text, the second on the generation of a signature file representing the text content. A system reflecting the word inversion approach is compared against two systems reflecting the signature scanning approach and using, alternatively, superimposed coding and the concatenation of word signatures. Performances are estimated using analytical models of these systems. Characteristics are evaluated in function of office environment requirements. The evaluations derive from a model for estimating the statistical parameters of text archives.This work has been partially developed as part of the EEC ESPRIT project on "Mixed-Mode Message Filing System" in the Office Systems area.

35 citations



Journal Article
TL;DR: Searches for rare diseases with unique names or for representative examples of common diseases were most readily performed with the use of computer-printed key word in context (KWIC) books, and needs were best served by a computerized search for logical combinations of key words.
Abstract: Computerized indexing and retrieval of medical records is increasingly important; but the use of natural language versus coded languages (SNOP, SNOMED) for this purpose remains controversial. In an effort to develop search strategies for natural language text, the authors examined the anatomic diagnosis reports by computer for 7000 consecutive autopsy subjects spanning a 13-year period at The Johns Hopkins Hospital. There were 923,657 words, 11,642 of them distinct. The authors observed an average of 1052 keystrokes, 28 lines, and 131 words per autopsy report, with an average 4.6 words per line and 7.0 letters per word. The entire text file represented 921 hours of secretarial effort. Words ranged in frequency from 33,959 occurrences of "and" to one occurrence for each of 3398 different words. Searches for rare diseases with unique names or for representative examples of common diseases were most readily performed with the use of computer-printed key word in context (KWIC) books. For uncommon diseases designated by commonly used terms (such as "cystic fibrosis"), needs were best served by a computerized search for logical combinations of key words. In an unbalanced word distribution, each conjunction (logical and) search should be performed in ascending order of word frequency; but each alternation (logical inclusive or) search should be performed in descending order of word frequency. Natural language text searches will assume a larger role in medical records analysis as the labor-intensive procedure of translation into a coded language becomes more costly, compared with the computer-intensive procedure of text searching.

14 citations


Patent
15 Aug 1984
TL;DR: In this article, a rotary table and a bearing device are arranged on a base as facing each other with capability of indexing, and an indexing shaft is inserted in this table 4 concentrically.
Abstract: PURPOSE:To have simultaneous or continuous mult-surface machining of a work by coupling an indexing table with an indexing shaft through an indexing mechanism having identical indexing angle in respective corresponding positions. CONSTITUTION:An index table 2 and a bearing device 3 are arranged on a base 1 as facing each other, and a rotary table 4 is fitted between them with capability of indexing, wherein an indexing shaft 5 is inserted in this table 4 concentrically. An indexing motor is mounted on said table 2, and indexing is made by rotating the table 4 through an indexing disc 8 for 90 deg. eacn time. On this table 4, a plurality of indexing tables 10 for indexing of a work are arranged rotatably at a certain specific spacing. The base end of table 10 is related to the indexing shaft 5 in their respective corresponding positions through the action of an indexing mechanism 11 consisting of bevel gears.

12 citations


Journal ArticleDOI
TL;DR: The hypothesis of the study was that by using a particular non‐Boolean method as a file structuring and searching technique, full‐text indexing is not essential to optimum information retrieval effectiveness.
Abstract: The relative effectiveness of indexing using full-text or less than full-text was tested using a non-Boolean, chaining type of file structure and searching method. Indexing was done using titles, abstracts, full-text, references, and various combinations of these surrogates and then Goffman's indirect method of information retrieval was used to structure and search the file. The database consisted of 733 documents and 38 queries were searched. The hypothesis of the study was that by using a particular non-Boolean method as a file structuring and searching technique, full-text indexing is not essential to optimum information retrieval effectiveness. The outcome of the study was positive.

11 citations


Journal ArticleDOI
TL;DR: A simple indexing system has been developed which is free from any classification system and which is not limited by cumbersome postulates.
Abstract: In 1982 it was decided to computerize the indexing activities of SMIC—the Sorghum and Millets Information Center—located in the Library of ICRISAT (International Crops Research Institute for the Semi-Arid Tropics). The previously used manual indexing procedure was found to be unsuitable for computer manipulation, and it was decided to replace it by some other system which would be computer-manipulative and could represent subject contents of the documents accurately and precisely. A survey of the existing systems was made and none of them was found entirely satisfactory for SMIC. A simple indexing system has been developed which is free from any classification system and which is not limited by cumbersome postulates. In this system, keywords chosen by the indexer (who is expected to have some subject knowledge) are arranged in a meaningful sequence (a logical string). The keywords are connected by punctuation marks depicting various types of associations. The keywords are rotated to provide access through each significant keyword. The system, which is computer-manipulative, can also be used manually

8 citations


Journal ArticleDOI
TL;DR: Because of the high cost of controlled indexing of bibliographic information, the American Petroleum Institue's Central Abstracting and Indexing Service (CAIS) has desinged an expert-like system to take advantage of the 20 years of experienced indexing using keywords from its Thesaurus.
Abstract: Because of the high cost of controlled indexing of bibliographic information, the American Petroleum Institue's Central Abstracting and Indexing Service (CAIS) has desinged an expert-like system to take advantage of the 20 years of experienced indexing using keywords from its Thesauraus. It is expected that successful development of the automated indexing system could lead to a friendly system for the ultimate end-user utilizing controlled keywords. In the development stage, the natural language keywords of the abstracts of papers are compared with the keywords chosen by the indexers and rules for cross references are added to improve the machine indexing. Noise is eliminated by additional rules. The system is designed so that the copmuter will convert words in an abstract to controlled keywords of the Thesaurus to such an accurate extent that the output need only be edited in order to achieve the quality of manual indexing. It is hoped that a user's query words may then also be converted to controlled ke...

8 citations


Proceedings ArticleDOI
02 Jul 1984
TL;DR: The goals and design decisions for the Utah Retrieval System Architecture (URSA) are described, the prototype system's features and limitations are discussed, and the changes that will be made to produce the production version.
Abstract: The Utah Text Retrieval Project addresses a number of areas in information retrieval, including basic system structure, user interfaces integrating information retrieval with word processing, indexing techniques, and the use of specialized backend processors. Although the work on the development of a high-speed text search engine is generally the best known, probably the most exciting aspect of the project is the message-based architecture, which provides an adaptable testbed for information retrieval techniques. It can support a variety of index and search strategies, while instrumenting their performance so that they can be accurately compared in an identical environment.This paper describes the goals and design decisions for the Utah Retrieval System Architecture (URSA). It discusses the prototype system's features and limitations, and the changes that will be made to produce the production version.

6 citations


Journal ArticleDOI
TL;DR: It is concluded that storing the index in the main memory when operating on the file is feasible for small to medium-sized, and sometimes even large files.
Abstract: A hash structure, Overflow Indexing (OVI), using an index for the overflows is presented. The index contains one entry (key, bucket number) for each overflow. Formulas for computing the expected number of entries in the index and the standard deviation are derived and the numerical results obtained using these formulae are presented in a graph. It is concluded that storing the index in the main memory when operating on the file is feasible for small to medium-sized, and sometimes even large files. The number of probes for both a successful and unsuccessful search is one. Deletion requires two probes and insertion two or three probes. Details of OVI are presented and illustrated by simulation experiments. The structure of the index is discussed and one possible structure, hashing with dynamic buckets, is presented.

Journal ArticleDOI
TL;DR: Description d'une methode d'indexation semiautomatique de textes tels que les brevets ou les resumes d'articles techniques en anglais ou japonais.
Abstract: Description d'une methode d'indexation semiautomatique de textes tels que les brevets ou les resumes d'articles techniques en anglais ou japonais. Les phrases des textes sont analysees grammaticalement, reduites et normalisees jusqu'a n'en retirer que les termes specifiques

Journal ArticleDOI
TL;DR: The aim of the paper is to demonstrate the efficiency of this semantic analyser method in the field of automatic indexing by comparing the results obtained by means of this method with some traditional methods and with the results of indexing done by human indexers.
Abstract: The article deals with the preparation of query description using a semantic analyser method based on the analysis of semantic structure of documents. The aim of the paper is to demonstrate the efficiency of this method in the field of automatic indexing. The results obtained by means of this method are compared with the results of automatic indexing performed by some traditional methods and with the results of indexing done by human indexers.

Proceedings ArticleDOI
02 Jul 1984
TL;DR: A linguistically motivated approach to indexing, that is the provision of descriptive terms for texts of any kind, is presented and illustrated to achieve good indexing by identifying index term sources in the meaning representations built by a powerful general purpose analyser.
Abstract: A linguistically motivated approach to indexing, that is the provision of descriptive terms for texts of any kind, is presented and illustrated. The approach is designed to achieve good, i.e. accurate and flexible, indexing by identifying index term sources in the meaning representations built by a powerful general purpose analyser, and providing a range of text expressions constituting semantic and syntactic variants for each term concept. Indexing is seen as a legitimate form of shallow text processing, but one requiring serious semantically based language processing, particularly to obtain well-founded complex terms, which is the main objective of the project described. The type of indexing strategy described is further seen as having utility in a range of applications environments.

Journal ArticleDOI
I. Wormell1
TL;DR: Some characteristics and features of natural language representation of documents, followed by remarks on the ‘best-match’ principle and ‘indexing consistency’ are presented.




Journal ArticleDOI
01 Mar 1984-Calcolo
TL;DR: This paper presents a methodology for the selection of terms to be inserted in a technical vocabulary, based on a new method to identify a meaningful interval on the Zipf's frequency distribution of terms.
Abstract: Selection of a technical vocabulary, thesaurus construction and document indexing are critical phases in the organization of Information Retrieval Systems.


Proceedings ArticleDOI
01 Jan 1984
TL;DR: A linguistically motivated approach to indexing, that is the provision of descriptive terms for texts of any kind, is presented and illustrated in this paper, where the approach is designed to achieve good, i.e. accurate and flexible, indexing by identifying index term sources in the meaning representations built by a powerful general purpose analyser, and providing a range of text expressions constituting semantic and syntactic variants for each term concept.
Abstract: A linguistically motivated approach to indexing, that is the provision of descriptive terms for texts of any kind, is presented and illustrated. The approach is designed to achieve good, i.e. accurate and flexible, indexing by identifying index term sources in the meaning representations built by a powerful general purpose analyser, and providing a range of text expressions constituting semantic and syntactic variants for each term concept. Indexing is seen as a legitimate form of shallow text processing, but one requiring serious semantically based language processing, particularly to obtain well-founded complex terms, which is the main objective of the project described. The type of indexing strategy described is further seen as having utility in a range of applications environments.

Journal ArticleDOI
TL;DR: It is shown that the index resident in main memory can be maintained in a TRIE structure [Knuth], the main feature of which is compactness, and an analysis of this structure average size is proposed.

Journal ArticleDOI
TL;DR: To compare these two indexing treatments, the investigator solicited readers from among mathematics faculty and graduate students to suggest queries to which the documents would be satisfactory responses, and then compared search terms with author and professional indexing terms.
Abstract: The American Mathematical Society (AMS) has required since 1970 that authors of articles published in AMS journals submit indexing terms with their manuscripts. This study examincd 159 documents published by the AMS in 1975. Each contained author indexing tenn--class numbers selected bv authors from the AMS(M0S) Subject Classification Scheme (1970). Most documents also received indexing terms provided by Mathematical Reviews indexers. To compare these two indexing treatments, the investigator solicited readers from among mathematics faculty and graduate students to suggest queries to which the documents would be satisfactory responses. The investigator transformed the queries into search statements composed of AMS class numbers, and then compared search terms with author and professional indexing terms. A document was retrieved (and recall considered successful) if at least one search term matched at least one indexing term, under a given indexing treatment. The major hypothesis of the study was that auth...





DOI
01 Jan 1984
TL;DR: A computer aided indexing system has been developed at Saarbrucken University to improve the natural language oriented access to textual data ("free text") applying linguistic strategies to information retrieval processes.
Abstract: On the other hand, the development of large textual databases within different fields (e.g. law, patent specifications, medicine) is increasing rapidly. Therefore, a computer aided indexing system ('Computergestutzte Texterschliesung: CTX’) has been developed at Saarbrucken University to improve the natural language oriented access to textual data ("free text") applying linguistic strategies to information retrieval processes.