scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Database Systems in 1979"


Journal ArticleDOI
TL;DR: In this paper, the relational model is extended to support atomic and molecular semantics, which is a synthesis of many ideas from the published work in semantic modeling plus the introduction of new rules for insertion, update, and deletion.
Abstract: During the last three or four years several investigators have been exploring “semantic models” for formatted databases. The intent is to capture (in a more or less formal way) more of the meaning of the data so that database design can become more systematic and the database system itself can behave more intelligently. Two major thrusts are clear. (1) the search for meaningful units that are as small as possible—atomic semantics; (2) the search for meaningful units that are larger than the usual n-ary relation—molecular semantics. In this paper we propose extensions to the relational model to support certain atomic and molecular semantics. These extensions represent a synthesis of many ideas from the published work in semantic modeling plus the introduction of new rules for insertion, update, and deletion, as well as new algebraic operators.

1,489 citations


Journal ArticleDOI
TL;DR: A “majority consensus” algorithm which represents a new solution to the update synchronization problem for multiple copy databases is presented and can function effectively in the presence of communication and database site outages.
Abstract: A “majority consensus” algorithm which represents a new solution to the update synchronization problem for multiple copy databases is presented. The algorithm embodies distributed control and can function effectively in the presence of communication and database site outages. The correctness of the algorithm is demonstrated and the cost of using it is analyzed. Several examples that illustrate aspects of the algorithm operation are included in the Appendix.

1,136 citations


Journal ArticleDOI
TL;DR: This work studies, by analysis and simulation, the performance of extendible hashing and indicates that it provides an attractive alternative to other access methods, such as balanced trees.
Abstract: Extendible hashing is a new access technique, in which the user is guaranteed no more than two page faults to locate the data associated with a given unique identifier, or key. Unlike conventional hashing, extendible hashing has a dynamic structure that grows and shrinks gracefully as the database grows and shrinks. This approach simultaneously solves the problem of making hash tables that are extendible and of making radix search trees that are balanced. We study, by analysis and simulation, the performance of extendible hashing. The results indicate that extendible hashing provides an attractive alternative to other access methods, such as balanced trees.

709 citations


Journal ArticleDOI
TL;DR: The chase operates on tableaux similar to those of Aho, Sagiv, and Ullman so it is possible to test implication of join dependencies and functional dependencies by a set of dependencies.
Abstract: Presented is a computation method—the chase—for testing implication of data dependencies by a set of data dependencies The chase operates on tableaux similar to those of Aho, Sagiv, and Ullman The chase includes previous tableau computation methods as special cases By interpreting tableaux alternately as mappings or as templates for relations, it is possible to test implication of join dependencies (including multivalued dependencies) and functional dependencies by a set of dependencies

575 citations


Journal ArticleDOI
TL;DR: This work emphasizes the distinction between two different interpretations of the query language—the external one, which refers the queries directly to the real world modeled in an incomplete way by the system, and the internal one, under which the queries refer to the system's information about this world, rather than to the world itself.
Abstract: Various approaches to interpreting queries in a database with incomplete information are discussed. A simple model of a database is described, based on attributes which can take values in specified attribute domains. Information incompleteness means that instead of having a single value of an attribute, we have a subset of the attribute domain, which represents our knowledge that the actual value, though unknown, is one of the values in this subset. This extends the idea of Codd's null value, corresponding to the case when this subset is the whole attribute domain. A simple query language to communicate with such a system is described and its various semantics are precisely defined. We emphasize the distinction between two different interpretations of the query language—the external one, which refers the queries directly to the real world modeled in an incomplete way by the system, and the internal one, under which the queries refer to the system's information about this world, rather than to the world itself. Both external and internal interpretations are provided with the corresponding sets of axioms which serve as a basis for equivalent transformations of queries. The technique of equivalent transformations of queries is then extensively exploited for evaluating the interpretation of (i.e. the response to) a query.

529 citations


Journal ArticleDOI
TL;DR: It is shown that most interesting algorithmic questions about Boyce-Codd normal form and keys are NP-complete and are therefore probably not amenable to fast algorithmic solutions.
Abstract: Problems related to functional dependencies and the algorithmic design of relational schemas are examined. Specifically, the following results are presented: (1) a tree model of derivations of functional dependencies from other functional dependencies; (2) a linear-time algorithm to test if a functional dependency is in the closure of a set of functional dependencies; (3) a quadratic-time implementation of Bernstein's third normal form schema synthesis algorithm.Furthermore, it is shown that most interesting algorithmic questions about Boyce-Codd normal form and keys are NP-complete and are therefore probably not amenable to fast algorithmic solutions.

461 citations


Journal ArticleDOI
TL;DR: An efficient algorithm to determine whether the join of several relations is semantically meaningful (lossless) and an efficient algorithmto determine whether a set of relations has a subset with a lossy join are given.
Abstract: Answering queries in a relational database often requires that the natural join of two or more relations be computed. However, the result of a join may not be what one expects. In this paper we give efficient algorithms to determine whether the join of several relations has the intuitively expected value (is lossless) and to determine whether a set of relations has a subset with a lossy join. These algorithms assume that all data dependencies are functional. We then discuss the extension of our techniques to the case where data dependencies are multivalued.

423 citations


Journal ArticleDOI
TL;DR: New hardware is described which allows the rapid execution of queries demanding the joining of physically stored relations, with the main feature of the hardware being a special store which can rapidly remember or recall data.
Abstract: New hardware is described which allows the rapid execution of queries demanding the joining of physically stored relations. The main feature of the hardware is a special store which can rapidly remember or recall data. This data might be pointers from one file to another, in which case the memory helps with queries on joins of files. Alternatively, the memory can help remove redundant data during projection, giving a considerable speed advantage over conventional hardware.

258 citations


Journal ArticleDOI
TL;DR: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers, and the complexity of protecting a database against this technique is discussed here.
Abstract: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers. The complexity of protecting a database against this technique is discussed here.

234 citations


Journal ArticleDOI
William Kent1
TL;DR: Record structures are generally efficient, familiar, and easy to use for most current data processing applications, but they are not complete in their ability to represent information, nor are they fully self-describing.
Abstract: Record structures are generally efficient, familiar, and easy to use for most current data processing applications. But they are not complete in their ability to represent information, nor are they fully self-describing.

212 citations


Journal ArticleDOI
TL;DR: It is shown that the compromise of small query sets can in fact almost always be accomplished with the help of characteristic formulas called trackers, and security is not guaranteed by the lack of a general tracker.
Abstract: The query programs of certain databases report raw statistics for query sets, which are groups of records specified implicitly by a characteristic formula. The raw statistics include query set size and sums of powers of values in the query set. Many users and designers believe that the individual records will remain confidential as long as query programs refuse to report the statistics of query sets which are too small. It is shown that the compromise of small query sets can in fact almost always be accomplished with the help of characteristic formulas called trackers. Schlorer's individual tracker is reviewed; it is derived from known characteristics of a given individual and permits deducing additional characteristics he may have. The general tracker is introduced: It permits calculating statistics for arbitrary query sets, without requiring preknowledge of anything in the database. General trackers always exist if there are enough distinguishable classes of individuals in the database, in which case the trackers have a simple form. Almost all databases have a general tracker, and general trackers are almost always easy to find. Security is not guaranteed by the lack of a general tracker.

Journal ArticleDOI
TL;DR: A scheme is presented in which alerters may be placed on a complex query involving a relational database, and a method is demonstrated for reducing the amount of computation involved in checking whether an alerter should be triggered.
Abstract: An alerter is a program which monitors a database and reports to some user or program when a specified condition occurs. It may be that the condition is a complicated expression involving several entities in the database; in this case the evaluation of the expression may be computationally expensive. A scheme is presented in which alerters may be placed on a complex query involving a relational database, and a method is demonstrated for reducing the amount of computation involved in checking whether an alerter should be triggered.

Journal ArticleDOI
TL;DR: The difficulty of optimizing queries based on the relational algebra operations select, project, and join is discussed, and a polynomial time algorithm exists to optimize tableaux that correspond to an important subclass of queries.
Abstract: The design of several database query languages has been influenced by Codd's relational algebra. This paper discusses the difficulty of optimizing queries based on the relational algebra operations select, project, and join. A matrix, called a tableau, is proposed as a useful device for representing the value of a query, and optimization of queries is couched in terms of finding a minimal tableau equivalent to a given one. Functional dependencies can be used to imply additional equivalences among tableaux. Although the optimization problem is NP-complete, a polynomial time algorithm exists to optimize tableaux that correspond to an important subclass of queries.

Journal ArticleDOI
TL;DR: A model of database storage and access that represents many evaluation algorithms as special cases, and helps to break a complex algorithm into simple access operations, yields an optimal access algorithm which can be synthesized by a query subsystem whose design is based on the modular access operations.
Abstract: A model of database storage and access is presented. The model represents many evaluation algorithms as special cases, and helps to break a complex algorithm into simple access operations. Generalized access cost equations associated with the model are developed and analyzed. Optimization of these cost equations yields an optimal access algorithm which can be synthesized by a query subsystem whose design is based on the modular access operations.

Journal ArticleDOI
TL;DR: This paper considers the design of a system to answer partial-match queries from a file containing a collection of records, each record consisting of a sequence of fields.
Abstract: This paper considers the design of a system to answer partial-match queries from a file containing a collection of records, each record consisting of a sequence of fields. A partial-match query is a specification of values for zero or more fields of a record, and the answer to a query is a listing of all records in the file whose fields match the specified values. A design is considered in which the file is stored in a set of bins. A formula is derived for the optimal number of bits in a bin address to assign to each field, assuming the probability that a given field is specified in a query is independent of what other fields are specified. Implications of the optimality criterion on the size of bins are also discussed.

Journal ArticleDOI
TL;DR: In this paper, an analytic cost expression for processing conjunctive, disjunctive, and batched queries is developed and an effective heuristic for minimizing query processing costs is presented.
Abstract: A transposed file is a collection of nonsequential files called subfiles. Each subfile contains selected attribute data for all records. It is shown that transposed file performance can be enhanced by using a proper strategy to process queries. Analytic cost expressions for processing conjunctive, disjunctive, and batched queries are developed and an effective heuristic for minimizing query processing costs is presented. Formulations of the problem of optimally processing queries for a particular family or transposed files are shown to be NP-complete. Query processing performance comparisons of multilist, inverted, and nonsequential files with transposed files are also considered.

Journal ArticleDOI
TL;DR: In this paper, the authors modified the assumptions concerning the placement of the locks on the database with respect to the accessing transactions, and extended the simulation to model a lock hierarchy where large transactions use large locks and small transactions use small locks.
Abstract: Locking granularity refers to the size and hence the number of locks used to ensure the consistency of a database during multiple concurrent updates. In an earlier simulation study we concluded that coarse granularity, such as area or file locking, is to be preferred to fine granularity such as individual page or record locking. However, alternate assumptions than those used in the original paper can change that conclusion. First, we modified the assumptions concerning the placement of the locks on the database with respect to the accessing transactions. In the original model the locks were assumed to be well placed. Under worse case and random placement assumptions when only very small transactions access the database, fine granularity is preferable. Second, we extended the simulation to model a lock hierarchy where large transactions use large locks and small transactions use small locks. In this scenario, again under the random and worse case lock placement assumptions, fine granularity is preferable if all transactions accessing more than 1 percent of the database use large locks. Finally, the simulation was extended to model a “claim as needed” locking strategy together with the resultant possibility of deadlock. In the original study all locks were claimed in one atomic operation at the beginning of a transaction. The claim as needed strategy does not change the conclusions concerning the desired granularity.

Journal ArticleDOI
TL;DR: An axiomatic method is introduced for specifying data abstractions that gives precise meaning to familiar notions such as data model, data type, and database schema and permits the formulation of interrelationships between arbitrary operations.
Abstract: Data abstractions were originally conceived as a specification tool in programming. They also appear to be useful for exploring and explaining the capabilities and shortcomings of the data definition and manipulation facilities of present-day database systems. Moreover they may lead to new approaches to the design of these facilities. In the first section the paper introduces an axiomatic method for specifying data abstractions and, on that basis, gives precise meaning to familiar notions such as data model, data type, and database schema. In a second step the various possibilities for specifying data types within a given data model are examined and illustrated. It is shown that data types prescribe the individual operations that are allowed within a database. Finally, some additions to the method are discussed which permit the formulation of interrelationships between arbitrary operations.

Journal ArticleDOI
TL;DR: This work shows how compromise can be achieved, how the allowable queries are linear, that is, weighted sums of data elements, and it describes the maximal initial information permitted of a user in a secure system.
Abstract: A database is compromised if a user can determine the data elements associated with keys which he did not know previously. If it is possible, compromise can be achieved by posing a finite set of queries over sets of data elements and employing initial information to solve the resulting system of equations. Assuming the allowable queries are linear, that is, weighted sums of data elements, we show how compromise can be achieved and we characterize the maximal initial information permitted of a user in a secure system. When compromise is possible, the initial information and the number of queries required to achieve it is surprisingly small.

Journal ArticleDOI
TL;DR: Theseus as discussed by the authors is a very high-level programming language extending EUCLID, which includes relations and a-sets, a generalization of records, and is designed to facilitate the writing of well-structured programs for database applications and serve as a vehicle for research in automatic program optimization.
Abstract: Theseus, a very high-level programming language extending EUCLID, is described. Data objects in Theseus include relations and a-sets, a generalization of records. The primary design goals of Theseus are to facilitate the writing of well-structured programs for database applications and to serve as a vehicle for research in automatic program optimization.

Journal ArticleDOI
TL;DR: This paper investigates several heuristics for reordering attributes, and derives bounds on the sizes of the worst tries produced by them in terms of the underlying file, and shows that for most applications, &Ogr;-tries are smaller than other implementations of tries, even when heuristic for improving storage requirements are employed.
Abstract: A trie is a digital search tree in which leaves correspond to records in a file. Searching proceeds from the root to a leaf, where the edge taken at each node depends on the value of an attribute in the query. Trie implementations have the advantage of being fast, but the disadvantage of achieving that speed at great expense in storage space. Of primary concern in making a trie practical, therefore, is the problem of minimizing storage requirements. One method for reducing the space required is to reorder attribute testing. Unfortunately, the problem of finding an ordering which guarantees a minimum-size trie is NP-complete. In this paper we investigate several heuristics for reordering attributes, and derive bounds on the sizes of the worst tries produced by them in terms of the underlying file. Although the analysis is presented for a binary file, extensions to files of higher degree are shown. Another alternative for reducing the space required by a trie is an implementation, called an O-trie, in which the order of attribute testing is contained in the trie itself. We show that for most applications, O-tries are smaller than other implementations of tries, even when heuristics for improving storage requirements are employed.

Journal ArticleDOI
TL;DR: The retrieval effectiveness of an automatic method that uses relevance judgments for the determination of positive as well as negative relationships between terms is evaluated and the importance attached to the term relationship components relative to that of term match component is found to have a substantial effect on the retrieval performance.
Abstract: The retrieval effectiveness of an automatic method that uses relevance judgments for the determination of positive as well as negative relationships between terms is evaluated. The term relationships are incorporated into the retrieval process by using a generalized similarity function that has a term match component, a positive term relationship component, and a negative term relationship component. Two strategies, query partitioning and query clustering, for the evaluation of the effectiveness of the term relationships are investigated. The latter appears to be more attractive from linguistic as well as economic points of view. The positive and the negative relationships are verified to be effective both when used individually, and in combination. The importance attached to the term relationship components relative to that of term match component is found to have a substantial effect on the retrieval performance. The usefulness of discriminant analysis as a technique for determining the relative importance of these components is investigated.

Journal ArticleDOI
TL;DR: File designs suitable for retrieval from a file of k-field records when queries may be partially specified are examined and storage redundancy is introduced to obtain improved worst-case and average-case performances.
Abstract: File designs suitable for retrieval from a file of k-field records when queries may be partially specified are examined. Storage redundancy is introduced to obtain improved worst-case and average-case performances. The resulting storage schemes are appropriate for replicated distributed database environments; it is possible to improve the overall average and worst-case behavior for query response as well as provide an environment with very high reliability. Within practical systems it will be possible to improve the query response time performance as well as reliability over comparable systems without replication.

Journal ArticleDOI
TL;DR: The proposed HUBMFS2 (Hiroshima University Balanced Multiple-valued File-organization Scheme of order two) has the least redundancy among all possible BMFS2's having the same parameters and that it can be constructed for a less restrictive set of parameters.
Abstract: A new balanced file-organization scheme of order two for multiple-valued records is presented. This scheme is called HUBMFS2 (Hiroshima University Balanced Multiple-valued File-organization Scheme of order two). It is assumed that records are characterized by m attributes having n possible values each, and the query set consists of queries which specify values of two attributes. It is shown that the redundancy of the bucket (the probability of storing a record in the bucket) is minimized if and only if the structure of the bucket is a partite-claw. A necessary and sufficient condition for the existence of an HUBMFS2, which is composed exclusively of partite-claw buckets, is given. A construction algorithm is also given. The proposed HUBMFS2 is superior to existing BMFS2 (Balanced Multiple-valued File-organization Schemes of order two) in that it has the least redundancy among all possible BMFS2's having the same parameters and that it can be constructed for a less restrictive set of parameters.

Journal ArticleDOI
TL;DR: This paper addresses a number of issues pertinent to a pipelining architecture, including algorithms for resolving deadlock situations which can arise, and partitioning the process graph to achieve an optimal schedule for executing the restructuring steps.
Abstract: In the past several years much attention has been given to the problem of data translation. The focus has been mainly on methodologies and specification languages for accomplishing this task. Recently, several prototype systems have emerged, and now the issues of implementation and performance must be addressed. In general, a data restructuring specification may contain multiple source and target files. This specification can be viewed as a “process graph” which is a network of restructuring operations subject to precedence constraints. One technique used to achieve good performance is that of pipelining data in the process graph. In this paper we address a number of issues pertinent to a pipelining architecture. Specifically, we give algorithms for resolving deadlock situations which can arise, and partitioning the process graph to achieve an optimal schedule for executing the restructuring steps. In addition, we discuss how pipelining has influenced the design of the restructuring operations and the file structures used in an actual system.

Journal ArticleDOI
TL;DR: Theorem 3.1 states that a schedule is serializable only if whenever two processes have conflicting actions, ail pairs of conflicting actions appear in the same order.
Abstract: r<[U] means “process i reads variable u”; wi[u] means “process i writes into variable v.” Sequences of r’ s and w’ s denote schedules. Process Pi” initializes the database state and PO,, reads the final database state. PI, Pz, and Pa are user processes. Theorem 3.1 states that a schedule is serializable only if whenever two processes have conflicting actions, ail pairs of conflicting actions appear in the same order. (Two actions conflict if they operate on the same variable and one of them is a write action.) Consider schedule S: