scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Balanced Tree and Its Utilization in Information Retrieval

01 Dec 1963-IEEE Transactions on Electronic Computers (IEEE)-Vol. 12, Iss: 6, pp 863-871
TL;DR: The balanced tree provides a stratagem to effect a fast information retrieval with a limited amount of serialized scanning by storing in and retrieving from the balanced tree.
Abstract: To translate descriptors into memory locations a memory organization scheme called the balanced tree is introduced, The descriptors that describe the information to be stored or retrieved constitute quasi-inputs to the tree while the outputs are lists on which the information identified by the descriptors is stored. The balanced tree thus provides a stratagem to effect a fast information retrieval with a limited amount of serialized scanning. The algorithms for storing in and retrieving from the balanced tree are outlined. While in a randomly growing tree, the shape of the tree depends on the order of the input, the balanced tree is independent of this order. The expected number of rearrangement steps to keep the tree balanced was derived from combinatorial considerations. Numerical results were obtained by machine computations and are presented in this paper.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the index is maintained with an average of 9 (at least 4) transactions per second on an IBM 360/44 with a 2311 disc and the index pages are organized in a special datastructure, so-called B-trees.
Abstract: Organization and maintenance of an index for a dynamic random access file is considered. It is assumed that the index must be kept on some pseudo random access backup store like a disc or a drum. The index organization described allows retrieval, insertion, and deletion of keys in time proportional to logk I where I is the size of the index and k is a device dependent natural number such that the performance of the scheme becomes near optimal. Storage utilization is at least 50% but generally much higher. The pages of the index are organized in a special datastructure, so-called B-trees. The scheme is analyzed, performance bounds are obtained, and a near optimal k is computed. Experiments have been performed with indexes up to 100000 keys. An index of size 15000 (100000) can be maintained with an average of 9 (at least 4) transactions per second on an IBM 360/44 with a 2311 disc.

1,051 citations

Proceedings ArticleDOI
15 Nov 1970
TL;DR: The index organization described allows retrieval, insertion, and deletion of keys in time proportional to logk I where I is the size of the index and k is a device dependent natural number such that the performance of the scheme becomes near optimal.
Abstract: Organization and maintenance of an index for a dynamic random access file is considered. It is assumed that the index must be kept on some pseudo random access backup store like a disc or a drum. The index organization described allows retrieval, insertion, and deletion of keys in time proportional to logk I where I is the size of the index and k is a device dependent natural number such that the performance of the scheme becomes near optimal. Storage utilization is at least 50% but generally much higher. The pages of the index are organized in a special data-structure, so-called B-trees. The scheme is analyzed, performance bounds are obtained, and a near optimal k is computed. Experiments have been performed with indices up to 100,000 keys. An index of size 15,000 (100,000) can be maintained with an average of 9 (at least 4) transactions per second on an IBM 360/44 with a 2311 disc.

531 citations

Book
01 Jan 2002
TL;DR: The index organization described allows retrieval, insertion, and deletion of keys in time proportional to logkI where I is the size of the index and k is a device dependent natural number such that the performance of the scheme becomes near optimal.
Abstract: Organization and maintenance of an index for a dynamic random access file is considered. It is assumed that the index must be kept on some pseudo random access backup store like a disc or a drum. The index organization described allows retrieval, insertion, and deletion of keys in time proportional to logk I where I is the size of the index and k is a device dependent natural number such that the performance of the scheme becomes near optimal. Storage utilization is at least 50% but generally much higher. The pages of the index are organized in a special data-structure, so-called B-trees. The scheme is analyzed, performance bounds are obtained, and a near optimal k is computed. Experiments have been performed with indexes up to 100000 keys. An index of size 15000 (100000) can be maintained with an average of 9 (at least 4) transactions per second on an IBM 360/44 with a 2311 disc.

360 citations

Proceedings ArticleDOI
Rudolf Bayer1
11 Nov 1971
TL;DR: A class of binary trees is described for maintaining ordered sets of data that avoid the storage overhead encountered with B-trees and are suitable for processing in a one-level store.
Abstract: A class of binary trees is described for maintaining ordered sets of data. Random insertions, deletions, and retrievals of keys can be done in time proportional to log N where N is the cardinality of the data-set. Binary B-trees are a modification of B-trees described previously by Bayer and McCreight. They avoid the storage overhead encountered with B-trees and are suitable for processing in a one-level store.

108 citations

Journal ArticleDOI
TL;DR: A number of important techniques which are used to retrieve individually identified data records from a computer memory space are discussed, specifically, the areas of compiler construction and of data base management.
Abstract: An important functional component of ever.y file organization is the set of search mechanisms which are used to locate individually identified records during the process of either update or retrieval. There have been a large number of computer-related search techniques developed in the past 20 years. The objective of this article is to synthesize many of these techniques into one parametrically describable search mechanism. This is accomplished by separating search techniques into two categries: addressing techniques, and tree searching techniques. Significant literature is surveyed for each category. The new concept of a TRIE-TREE search mechanism is then introduced, and used to combine these categories via a generalized model. The model has proven to be a pedagogic convenience in explaining the relation between alternative search techniques. It has also been used irk a practical application as a schema for alternative search strategies which are automatically evaluated during the process of a file organization design. INTRODUCTION Two Primary Areas of Application This paper discusses a number of important techniques which are used to retrieve individually identified data records from a computer memory space. Such techniques find two notable application areas; specifically, the areas of compiler construction and of data base management. When designing a compiler or an assem-bler, one is necessarily concerned with maintaining a symbol table to describe variables encountered as a source program is processed. The records stored in a symbol table contain such data items as a variable's name, type storage address, and domain of definition. Once inserted into the table, a record is consulted whenever the same symbolic name is reencountered in the program. Typically, symbol tables are relatively small, containing something on the order of 100 to 1000 entries, each approximately 10 to 50 characters in length. These tables are generally stored and manipulated internally in main memory where access time is measured in terms of nanoseconds. Because of this speed of data access, a designer's primary objective in selecting a symbol table search strategy is to minimize the average number of machine instructions executed in the process of inserting or retrieving a table entry. There are a nmnber of similarities between this problem and the retrieval of records from a commercial data base. As in a symbol table, the records of a data base correspond to descriptions of logical entities (employees, machine parts, airline flights etc.). Properties of these entities are described by data fields in …

61 citations


Cites background from "The Balanced Tree and Its Utilizati..."

  • ...Landauer [ 21 ] (1963) points out that the emphasis in information retrieval systems should be on the efficiency of retrieval operations....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper introduces an abstract entity, the binary search tree, and exhibits some of its properties, which are relevant to processes occurring in stored program computers-in particular, to search processes.
Abstract: This paper introduces an abstract entity, the binary search tree, and exhibits some of its properties. The properties exhibited are relevant to processes occurring in stored program computers-in particular, to search processes. The discussion of this relevance is deferred until Section 2. Section 1 constitutes the body of the paper. Section 1.1 consists of some mathematical formulations which arise in a natural way from the somewhat less formal considerations of Section 2.1. The main results are Theorem 1 (Section 1.2) and Theorem 2 (Section 1.3). The initial motivat ion of the paper was an actual computer programming problem. This problem was the need for a list which could be searched efficiently and also changed efficiently. Section 2.1 contains a description of this problem and explains the relevance to its solution of the results of Section 1. Section 2.2 contains an application to sorting. The reader who is interested in the programming applications of the results but not in their mathematical content can profit by reading Section 2 and making only those few references to Section 1 which he finds necessary.

125 citations

Journal ArticleDOI

51 citations

Journal ArticleDOI
TL;DR: This paper analyses the best methods of sorting on a digital computer using two main types, “sorting by merging” and “distribution sorting”, and a minimal tree is constructed giving the best strategy.
Abstract: This paper analyses the best methods of sorting on a digital computer. Two main types, “sorting by merging” and “distribution sorting” are considered. The strategy to be used is diagrammed by a tree. Optimum strategy is shown to depend on the order already existing in the data. Given this, a minimal tree is constructed giving the best strategy. A relation is shown between distribution sorting and decoding a set of messages, or searching for a particular message on a list. Two criteria for pre-existing order among items are established, and measures of order and disorder are defined. Analogy is shown between a measure of disorder and entropy in statistical mechanics, and between a measure of order and a measure of information.

40 citations

Journal ArticleDOI
TL;DR: The method of construction described in this paper overcomes both problems of manual construction with card indexes and computer method, and is designed specifically for computers.
Abstract: Programs for constructing dictionaries of texts, with computers, have sometimes been adaptat ions of methods suitable for manual construction with card indexes. With all card index methods it is customary to keep the different words collected in alphabetical order, for the structure of a card index lends itself to such a process: all tha t is necessary to insert ~ new word into the index between two existing ones is to make out a new card and to put it in the correct place. However, the insertion of a new word in the store of a computer where the words are kept in alphabetical order is a t ime-consuming process, for all the words below the one which is inserted have to be \"moved down\" by one place. If, however, a computer method is used where the words are not stored in alphabetical order but in the order in which they occur, the position is even worse, for although the shifting of words is eliminated, the dict ionary search which is necessary for each word in the text, to establish whether it is a new word or not, must involve all the words collected so far, and not: iust a small number of them, as would be the ease if the words were in alphabetical order, and a logarithmic search (Booth, 1955) could be used. The method of construction described in this paper overcomes both these problems. I t is no t an adaptat ion of a manual method, but is designed specifically for computers. The method is based on a tree structure, which is discussed below.

23 citations

Proceedings Article
01 Jan 1962

14 citations