Evaluation and selection of file organization—a model and system

doi:10.1145/362342.362352

Home
/
Papers
/
Evaluation and selection of file organization—a model and system

Journal Article•DOI•

Evaluation and selection of file organization—a model and system

Alfonso F. Cardenas¹•Institutions (1)

University of California, Los Angeles¹

01 Sep 1973-Communications of The ACM (ACM)-Vol. 16, Iss: 9, pp 540-548

TL;DR: A methodology, a model and a programmed system are presented to estimate primarily total storage costs and average access time of several file organizations, given a specific data base, query characterization and device-related specifications.

read less

Abstract: This work first discusses the factors that affect file (data base) organization performance, an elusive subject, and then presents a methodology, a model and a programmed system to estimate primarily total storage costs and average access time of several file organizations, given a specific data base, query characterization and device-related specifications. Based on these estimates, an appropriate file structure may be selected for the specific situation. The system is a convenient tool to study file structures and to facilitate as much as possible the process of data base structure design and evaluation.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Analysis and performance of inverted data base structures

[...]

Alfonso F. Cardenas¹•Institutions (1)

IBM¹

01 May 1975-Communications of The ACM

TL;DR: The need to envision and architecture data base systems in a hierarchical level by level framework is stressed, and formulations presented are necessary to be used in conjunction with any index selection criteria to determine the optimum set of index keys.

...read moreread less

Abstract: The need to envision and architecture data base systems in a hierarchical level by level framework is stressed. The inverted data base (file) organization is then analyzed, considering implementation oriented aspects. The inverted directory is viewed realistically as another large data base which itself is subjected to inversion. Formulations are derived to estimate average access time (read only) and storage requirements, formalizing the interaction of data base content characteristics, logical complexity of queries, and machine timing and blocking specifications identified as having a first-order effect on performance. The formulations presented are necessary to be used in conjunction with any index selection criteria to determine the optimum set of index keys.

...read moreread less

332 citations

Cites background or methods or result from "Evaluation and selection of file or..."

...where [10] that under certain data base contents and query load, inversion will perform better, whereas under other situations double-chaining stands out....
[...]
...The model presented forms part of the overall model and system for evaluating file organizations reported elsewhere [10]....
[...]
...Accessing volumes of more than 10 to 20 percent of the total number of records are often cited as justifying sequential search of the whole data base - no indexing [10]....
[...]
...The potential effect of actual implementation characteristics has been stressed and illustrated elsewhere [10]....
[...]
...devices and estimated average processing times, to be consistent with other related work reported [10, 11]....
[...]

Journal Article•DOI•

An attribute based model for database access cost analysis

[...]

S. B. Yao¹•Institutions (1)

Purdue University¹

01 Mar 1977-ACM Transactions on Database Systems

TL;DR: The model provides a general design framework in which the distinguishing properties of database organizations are made explicit and their performances can be compared.

...read moreread less

Abstract: A generalized model for physical database organizations is presented. Existing database organizations are shown to fit easily into the model as special cases. Generalized access algorithms and cost equations associated with the model are developed and analyzed. The model provides a general design framework in which the distinguishing properties of database organizations are made explicit and their performances can be compared.

...read moreread less

108 citations

Cites methods from "Evaluation and selection of file or..."

...Under Cardenas’s assumption, for an inverted database organization, we have E(Mc, N, , p) = E(Mb , N, , p) = 1 (all attribute-names are stored in one block) ; E( M, , N, , q) = E(Mb , N, , 9) = 1 (keywords of each attribute-name are stored in one block) ; C = 1 on level r = 3 (each accession list requires no more than one cylinder) ; C = B = 1 on level n = 4 (each record requires no mor’e than one block)....
[...]
...A simulation system for the selection of inverted, multilist, and doubly chained tree database organizations was reported by Cardenas [2]....
[...]
...Previous models for multiattribute database organizations include the FOREM simulation model [lo], the analytic models developed by Lowe [8] and Martin [9], and the simulation models reported by Cardenas [2] and Siler [13], In each case, a given structure is analyzed and an analytic or simulation evaluation is performed....
[...]
...( 15) to the access time equations of Cardenas [3, eqs....
[...]
...(15)) Cardenas used G(Xa , &‘)TT , which is inaccurate since not every block accessed requires a random cylinder seek....
[...]

Journal Article•DOI•

Relational Database Systemsr

[...]

Won Kim¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 1979-ACM Computing Surveys

TL;DR: Over a dozen relational database systems have been implemented since E. F. Codd introduced the relational model of data in a series of pioneering papers between 1970 and 1971.

...read moreread less

Abstract: Over a dozen relational database systems have been implemented since E. F. Codd introduced the relational model of data in a series of pioneering papers between 1970 and 1971 [CODD70, CODD71a, CODD71b, CODD71c]. A number of prototype systems (such as MIT's MADAM, GMR's RDMS, IBM's SEQUEL) were implemented primarily to demonstrate the feasibility of supporting high-level, nonprocedural data languages based on the relational algebra or the relational calculus. At about the same time, a number of other prototype systems {such as IBM's RM/XRM, GAMMA-0, and University of Toronto's ZETA/MINIZ) were developed for use as low-level, database access and storage subsystems for implementing high-level, nonprocedural, relational data languages. More recently, efforts have been directed toward implementing more comprehensive systems

...read moreread less

81 citations

Journal Article•DOI•

Multidimensional B-trees for associative searching in database systems

[...]

Peter Scheuermann¹, Mohamed Ouksel¹•Institutions (1)

Northwestern University¹

01 Jan 1982-Information Systems

TL;DR: A new method for multiple attribute indexing, the Multidimensional B -Tree (MBDT), is developed, well suited for dynamic databases, since it handles several types of associative queries efficiently and requires low-cost maintenance.

...read moreread less

78 citations

Proceedings Article•DOI•

The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models

[...]

Stratos Idreos¹, Kostas Zoumpatianos¹, Brian Hentschel¹, Michael S. Kester¹, Demi Guo¹ - Show less +1 more•Institutions (1)

Harvard University¹

27 May 2018

TL;DR: The Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, and synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.

...read moreread less

Abstract: Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.

...read moreread less

70 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Use of tree structures for processing files

[...]

Edward H. Sussenguth¹•Institutions (1)

Harvard University¹

01 May 1963-Communications of The ACM

TL;DR: A file organized into a tree-like structure is discussed, and it is shown that such a file may both be searched and altered with times proportional to s log, N, where N is the number of file items and s is a parameter of the tree.

...read moreread less

Abstract: In data processing problems, files are frequently used which must both be searched and altered. Binary search techniques are efficient for searching large files, but the associated file organization is not readily adapted to the file alterations. Conversely, a chained file allocation permits e 1 cient alteration but cannot be searched efficiently. A file organized into a tree-like structure is discussed, and it is shown that such a file may both be searched and altered with times proportional to s logsN, where N is the number of file items and s is a parameter of the tree. It is also shown that optimizing the value of s leads to a search time which is only 25 per cent slower than the binary search. The tree organization employs two data chains and may be considered to be a compromise between the organizations for the binary search and the chained file. The relation of the tree organization to multidimensional indexing and to the trie structure is also discussed.

...read moreread less

118 citations

Journal Article•DOI•

Elements of Data Management Systems

[...]

George G. Dodd¹•Institutions (1)

General Motors¹

01 Jun 1969-ACM Computing Surveys

TL;DR: A description of the basic types of data management techniques, as well as the relation of each to the hardware on which it is used, and how these basic elements can be used as building blocks to describe and build more complex data management systems.

...read moreread less

Abstract: Many different data management techniques have been designed, described in the literature, and marketed. With each teehmque there are claims of added flexiblhty and speed. However, because new terms are invented to describe each new technique, the observer is left m a state of confusion when he tries to understand how they work. A description is given oi the basic types of data management techniques, as well as the relation of each to the hardware on which it is used. Then it is shown how these basic elements can be used as building blocks to describe and build more complex data management systems. Finally, there is a discussion of the languages used for programming data management systems.

...read moreread less

104 citations

Book•

Feature analysis of generalized data base management systems: CODASYL Systems Committee, May 1971

[...]

B. K. Bhargava¹, Olin Bray, M. E. Oeppe², D. A. DeSmith³, G. C. Everest⁴, J. P. Fry³, R. R. Hayward⁵, H. F. Herre, S. R. Kimbteton⁶, M. W. Miller, C. L. Moss, E. I. Nusinow, B. K. Plagman, C. Stallings⁷, W. H. Stieger, J. A. Tillinghast⁸, P. M. Whiting-O'Keefe, Gordon C. Everest⁴, James P. Fry³, Mary E. Fuller, Mary K. Hawes, Anthony J. Kay⁹, Henry C. Lefkovits⁹, William C. McGee¹⁰, A. Metaxides¹¹, Ronald M. Olson, Martin J. Rich¹², Richard F. Schubert, Edgar H. Sibley³, William H. Stieger, Alfred H. Vorliaus¹³, Aria E. Weinert, John W. Young - Show less +29 more•Institutions (13)

University of Pittsburgh¹, Hewlett-Packard², University of Michigan³, University of Minnesota⁴, United States Department of the Navy⁵, National Institute of Standards and Technology⁶, Nortel⁷, Intel⁸, Honeywell⁹, IBM¹⁰, Bell Labs¹¹, Esso¹², Mitre Corporation¹³

01 May 1971

100 citations

Book•

File structures for on-line systems

[...]

David Lefkovitz

01 Jan 1969

90 citations

Journal Article•DOI•

Multi-attribute retrieval with combined indexes

[...]

Vincent Y. Lum¹•Institutions (1)

IBM¹

01 Nov 1970-Communications of The ACM

TL;DR: A file organization scheme designed to replace the use of the popular secondary index filing scheme (or inverted files on secondary key fields) is described, using redundancy and storing keys that satisfy different combinations of secondary index values in “buckets,” which has the following advantages.

...read moreread less

Abstract: In this paper a file organization scheme designed to replace the use of the popular secondary index filing scheme (or inverted files on secondary key fields) is described. Through the use of redundancy and storing keys (or access numbers of the records) that satisfy different combinations of secondary index values in “buckets,” it is possible to retrieve all keys satisfying any input query derived from a subset of fields by a single access to an index file, although each bucket may be used for many combinations of values and a combination of buckets may be required for a given query.The method which, in its degenerate case, becomes the conventional secondary index filing scheme works similarly but has the following advantages: (1) the elimination of multiple accesses in many cases; (2) the elimination of false drops; (3) the elimination of computer time to perform intersection of key sets each qualified for one secondary index field only; and (4) the avoidance of long strings of keys when an index field appearing in a query has very few possible values. Redundancy, in some cases, is the same as the secondary indexing method. In the general case, trade-off between the number of accesses for query and redundancy exists.

...read moreread less

86 citations