scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Evaluation and selection of file organization—a model and system

01 Sep 1973-Communications of The ACM (ACM)-Vol. 16, Iss: 9, pp 540-548
TL;DR: A methodology, a model and a programmed system are presented to estimate primarily total storage costs and average access time of several file organizations, given a specific data base, query characterization and device-related specifications.
Abstract: This work first discusses the factors that affect file (data base) organization performance, an elusive subject, and then presents a methodology, a model and a programmed system to estimate primarily total storage costs and average access time of several file organizations, given a specific data base, query characterization and device-related specifications. Based on these estimates, an appropriate file structure may be selected for the specific situation. The system is a convenient tool to study file structures and to facilitate as much as possible the process of data base structure design and evaluation.
Citations
More filters
Journal ArticleDOI
Alfonso F. Cardenas1
TL;DR: The need to envision and architecture data base systems in a hierarchical level by level framework is stressed, and formulations presented are necessary to be used in conjunction with any index selection criteria to determine the optimum set of index keys.
Abstract: The need to envision and architecture data base systems in a hierarchical level by level framework is stressed. The inverted data base (file) organization is then analyzed, considering implementation oriented aspects. The inverted directory is viewed realistically as another large data base which itself is subjected to inversion. Formulations are derived to estimate average access time (read only) and storage requirements, formalizing the interaction of data base content characteristics, logical complexity of queries, and machine timing and blocking specifications identified as having a first-order effect on performance. The formulations presented are necessary to be used in conjunction with any index selection criteria to determine the optimum set of index keys.

332 citations


Cites background or methods or result from "Evaluation and selection of file or..."

  • ...where [10] that under certain data base contents and query load, inversion will perform better, whereas under other situations double-chaining stands out....

    [...]

  • ...The model presented forms part of the overall model and system for evaluating file organizations reported elsewhere [10]....

    [...]

  • ...Accessing volumes of more than 10 to 20 percent of the total number of records are often cited as justifying sequential search of the whole data base - no indexing [10]....

    [...]

  • ...The potential effect of actual implementation characteristics has been stressed and illustrated elsewhere [10]....

    [...]

  • ...devices and estimated average processing times, to be consistent with other related work reported [10, 11]....

    [...]

Journal ArticleDOI
S. B. Yao1
TL;DR: The model provides a general design framework in which the distinguishing properties of database organizations are made explicit and their performances can be compared.
Abstract: A generalized model for physical database organizations is presented. Existing database organizations are shown to fit easily into the model as special cases. Generalized access algorithms and cost equations associated with the model are developed and analyzed. The model provides a general design framework in which the distinguishing properties of database organizations are made explicit and their performances can be compared.

108 citations


Cites methods from "Evaluation and selection of file or..."

  • ...Under Cardenas’s assumption, for an inverted database organization, we have E(Mc, N, , p) = E(Mb , N, , p) = 1 (all attribute-names are stored in one block) ; E( M, , N, , q) = E(Mb , N, , 9) = 1 (keywords of each attribute-name are stored in one block) ; C = 1 on level r = 3 (each accession list requires no more than one cylinder) ; C = B = 1 on level n = 4 (each record requires no mor’e than one block)....

    [...]

  • ...A simulation system for the selection of inverted, multilist, and doubly chained tree database organizations was reported by Cardenas [2]....

    [...]

  • ...Previous models for multiattribute database organizations include the FOREM simulation model [lo], the analytic models developed by Lowe [8] and Martin [9], and the simulation models reported by Cardenas [2] and Siler [13], In each case, a given structure is analyzed and an analytic or simulation evaluation is performed....

    [...]

  • ...( 15) to the access time equations of Cardenas [3, eqs....

    [...]

  • ...(15)) Cardenas used G(Xa , &‘)TT , which is inaccurate since not every block accessed requires a random cylinder seek....

    [...]

Journal ArticleDOI
TL;DR: Over a dozen relational database systems have been implemented since E. F. Codd introduced the relational model of data in a series of pioneering papers between 1970 and 1971.
Abstract: Over a dozen relational database systems have been implemented since E. F. Codd introduced the relational model of data in a series of pioneering papers between 1970 and 1971 [CODD70, CODD71a, CODD71b, CODD71c]. A number of prototype systems (such as MIT's MADAM, GMR's RDMS, IBM's SEQUEL) were implemented primarily to demonstrate the feasibility of supporting high-level, nonprocedural data languages based on the relational algebra or the relational calculus. At about the same time, a number of other prototype systems {such as IBM's RM/XRM, GAMMA-0, and University of Toronto's ZETA/MINIZ) were developed for use as low-level, database access and storage subsystems for implementing high-level, nonprocedural, relational data languages. More recently, efforts have been directed toward implementing more comprehensive systems

81 citations

Journal ArticleDOI
TL;DR: A new method for multiple attribute indexing, the Multidimensional B -Tree (MBDT), is developed, well suited for dynamic databases, since it handles several types of associative queries efficiently and requires low-cost maintenance.

78 citations

Proceedings ArticleDOI
27 May 2018
TL;DR: The Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, and synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.
Abstract: Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.

70 citations

References
More filters
Journal ArticleDOI
TL;DR: A file organized into a tree-like structure is discussed, and it is shown that such a file may both be searched and altered with times proportional to s log, N, where N is the number of file items and s is a parameter of the tree.
Abstract: In data processing problems, files are frequently used which must both be searched and altered. Binary search techniques are efficient for searching large files, but the associated file organization is not readily adapted to the file alterations. Conversely, a chained file allocation permits e 1 cient alteration but cannot be searched efficiently. A file organized into a tree-like structure is discussed, and it is shown that such a file may both be searched and altered with times proportional to s logsN, where N is the number of file items and s is a parameter of the tree. It is also shown that optimizing the value of s leads to a search time which is only 25 per cent slower than the binary search. The tree organization employs two data chains and may be considered to be a compromise between the organizations for the binary search and the chained file. The relation of the tree organization to multidimensional indexing and to the trie structure is also discussed.

118 citations

Journal ArticleDOI
George G. Dodd1
TL;DR: A description of the basic types of data management techniques, as well as the relation of each to the hardware on which it is used, and how these basic elements can be used as building blocks to describe and build more complex data management systems.
Abstract: Many different data management techniques have been designed, described in the literature, and marketed. With each teehmque there are claims of added flexiblhty and speed. However, because new terms are invented to describe each new technique, the observer is left m a state of confusion when he tries to understand how they work. A description is given oi the basic types of data management techniques, as well as the relation of each to the hardware on which it is used. Then it is shown how these basic elements can be used as building blocks to describe and build more complex data management systems. Finally, there is a discussion of the languages used for programming data management systems.

104 citations

Book
01 Jan 1969

90 citations

Journal ArticleDOI
Vincent Y. Lum1
TL;DR: A file organization scheme designed to replace the use of the popular secondary index filing scheme (or inverted files on secondary key fields) is described, using redundancy and storing keys that satisfy different combinations of secondary index values in “buckets,” which has the following advantages.
Abstract: In this paper a file organization scheme designed to replace the use of the popular secondary index filing scheme (or inverted files on secondary key fields) is described. Through the use of redundancy and storing keys (or access numbers of the records) that satisfy different combinations of secondary index values in “buckets,” it is possible to retrieve all keys satisfying any input query derived from a subset of fields by a single access to an index file, although each bucket may be used for many combinations of values and a combination of buckets may be required for a given query.The method which, in its degenerate case, becomes the conventional secondary index filing scheme works similarly but has the following advantages: (1) the elimination of multiple accesses in many cases; (2) the elimination of false drops; (3) the elimination of computer time to perform intersection of key sets each qualified for one secondary index field only; and (4) the avoidance of long strings of keys when an index field appearing in a query has very few possible values. Redundancy, in some cases, is the same as the secondary indexing method. In the general case, trade-off between the number of accesses for query and redundancy exists.

86 citations