scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Database Systems in 1992"


Journal ArticleDOI
Chandrasekaran Mohan1, Don Haderle1, Bruce G. Lindsay1, Hamid Pirahesh1, Peter Schwarz1 
TL;DR: ARIES as discussed by the authors is a database management system applicable not only to database management systems but also to persistent object-oriented languages, recoverable file systems and transaction-based operating systems.
Abstract: DB2TM, IMS, and TandemTM systems. ARIES is applicable not only to database management systems but also to persistent object-oriented languages, recoverable file systems and transaction-based operating systems. ARIES has been implemented, to varying degrees, in IBM's OS/2TM Extended Edition Database Manager, DB2, Workstation Data Save Facility/VM, Starburst and QuickSilver, and in the University of Wisconsin's EXODUS and Gamma database machine.

1,083 citations


Journal ArticleDOI
TL;DR: This thesis develops a new family of algorithms for scheduling real-time transactions and proposes new techniques for handling requests without deadlines and requests with deadlines simultaneously, finding that real- time disk scheduling algorithms can perform better than conventional algorithms.
Abstract: This thesis has six chapters. Chapter 1 motivates the thesis by describing the characteristics of real-time database systems and the problems of scheduling transactions with deadlines. We also present a short survey of related work and discuss how this thesis has contributed to the state of the art. In Chapter 2 we develop a new family of algorithms for scheduling real-time transactions. Our algorithms have four components: a policy to manage overloads, a policy for scheduling the CPU, a policy for scheduling access to data, i.e., concurrency control and a policy for scheduling I/O requests on a disk device. In Chapter 3, our scheduling algorithms are evaluated via simulation. Our chief result is that real-time scheduling algorithms can perform significantly better than a conventional non real-time algorithm. In particular, the Least Slack (static evaluation) policy for scheduling the CPU, combined with the Wait Promote policy for concurrency control, produces the best overall performance. In Chapter 4 we develop a new set of algorithms for scheduling disk I/O requests with deadlines. Our model assumes the existence of a real-time database system which assigns deadlines to individual read and write requests. We also propose new techniques for handling requests without deadlines and requests with deadlines simultaneously. This approach greatly improves the performance of the algorithms and their ability to minimize missed deadlines. In Chapter 5 we evaluate the I/O scheduling algorithms using detailed simulation. Our chief result is that real-time disk scheduling algorithms can perform better than conventional algorithms. In particular, our algorithm FD-SCAN was found to be very effective across a wide range of experiments. Finally, in Chapter 6 we summarize our conclusions and discuss how this work has contributed to the state of the art. Also, we briefly explore some interesting new directions for continuing this research.

575 citations


Journal ArticleDOI
TL;DR: To ensure the serializability of transactions, the recoverability relationship between transactions is forced to be acyclic, which can be used to decrease the delay involved in processing non-commuting operations while still avoiding cascading aborts.
Abstract: The concurrency of transactions executing on atomic data types can be enhanced through the use of semantic information about operations defined on these types. Hitherto, commutativity of operations has been exploited to provide enchanced concurrency while avoiding cascading aborts. We have identified a property known as recoverability which can be used to decrease the delay involved in processing noncommuting operations while still avoiding cascading aborts. When an invoked operation is recoverable with respect to an uncommitted operation, the invoked operation can be executed by forcing a commit dependency between the invoked operation and the uncommitted operation; the transaction invoking the operation will not have to wait for the uncommitted operation to abort or commit. Further, this commit dependency only affects the order in which the operations should commit, if both commit; if either operation aborts, the other can still commit thus avoiding cascading aborts. To ensure the serializability of transactions, we force the recoverability relationship between transactions to be acyclic. Simulation studies, based on the model presented by Agrawal et al. [1], indicate that using recoverability, the turnaround time of transactions can be reduced. Further, our studies show enchancement in concurrency even when resource constraints are taken into consideration. The magnitude of enchancement is dependent on the resource contention; the lower the resource contention, the higher the improvement.

212 citations


Journal ArticleDOI
TL;DR: This paper provides a careful specification of the OO1 benchmark, shows how it can be implemented on database systems, and presents evidence that more than an order of magnitude difference in performance can result from a DBMS implementation quite different from current products.
Abstract: Performance is a major issue in the acceptance of object-oriented and relational database systems aimed at engineering applications such as computer-aided software engineering (CASE) and computer-aided design (CAD). Because traditional database systems benchmarks are inapproriate to measure performance for operations on engineering objects, we designed a new benchmark Object Operations version 1 (OO1) to focus on important characteristics of these applications. OO1 is descended from an earlier benchmark for simple database operations and is based on several years experience with that benchmark. In this paper we describe the OO1 benchmark and results we obtained running it on a variety of database systems. We provide a careful specification of the benchmark, show how it can be implemented on database systems, and present evidence that more than an order of magnitude difference in performance can result from a DBMS implementation quite different from current products; minimizing overhead per database call, offloading database server functionality to workstations, taking advantage of large main memories, and using link-based methods.

181 citations


Journal ArticleDOI
TL;DR: The effectiveness of taxonomic reasoning techniques as an active support to knowledge acquisition and conceptual schema design is shown and an extended formalism and taxonomic inference algorithms for models giving prominence to attributes are given.
Abstract: Taxonomic reasoning is a typical task performed by many AI knowledge representation systems In this paper, the effectiveness of taxonomic reasoning techniques as an active support to knowledge acquisition and conceptual schema design is shown The idea developed is that by extending conceptual models with defined concepts and giving them rigorous logic semantics, it is possible to infer isa relationships between concepts on the basis of their descriptions From a theoretical point of view, this approach makes it possible to give a formal definition for consistency and minimality of a conceptual schema From a pragmatic point of view it is possible to develop an active environment that allows automatic classification of a new concept in the right position of a given taxonomy, ensuring the consistency and minimality of a conceptual schema A formalism that includes the data semantics of models giving prominence to type constructors (E/R, TAXIS, GALILEO) and algorithms for taxonomic inferences are presented: their soundness, completeness, and tractability properties are proved Finally, an extended formalism and taxonomic inference algorithms for models giving prominence to attributes (FDM, IFO) are given

130 citations


Journal ArticleDOI
TL;DR: This work defines criteria for both evaluating the correctness of and characterizing the relationship between alternative relational representations of EER schemas and splits the translation into four stages corresponding to the four aspects mentioned above.
Abstract: A common approach to database design is to describe the structures and constraints of the database application in terms of a semantic data model, and then represent the resulting schema using the data model of a commercial database management system. Often, in practice, Extended Entity-Relationship (EER) schemas are translated into equivalent relational schemas. This translation involves different aspects: representing the EER schema using relational constructs, assigning names to relational attributes, normalization, and merging relations. Considering these aspects together, as is usually done in the design methodologies proposed in the literature, is confusing and leads to inaccurate results. We propose to treat separately these aspects and split the translation into four stages (modules) corresponding to the four aspects mentioned above. We define criteria for both evaluating the correctness of and characterizing the relationship between alternative relational representations of EER schemas.

119 citations


Journal ArticleDOI
TL;DR: A logical tree structure is imposed on the set of copies of an object and a protocol that uses the information available in the logical structure to reduce the communication requirements for read and write operations is developed.
Abstract: In this paper, we present a low-cost fault-tolerant protocol for managing replicated data. We impose a logical tree structure on the set of copies of an object and develop a protocol that uses the information available in the logical structure to reduce the communication requirements for read and write operations. The tree quorum protocol is a generalization of the static voting protocol with two degrees of freedom for choosing quorums. In general, this results in significantly lower communication costs for comparable data availability. The protocol exhibits the property of graceful degradation, i.e., communication costs for executing operations are minimal in a failure-free environment but may increase as failures occur. This approach in designing distributed systems is desirable since it provides fault-tolerance without imposing unnecessary costs on the failure-free mode of operations.

114 citations


Journal ArticleDOI
TL;DR: The richer expressiveness of this more general functional dependency for semantic data models that derives from their common feature in which the separate notions of domain and relation in the relational model are combined into a single notion of class is proved.
Abstract: We propose a more general form of functional dependency for semantic data models that derives from their common feature in which the separate notions of domain and relation in the relational model are combined into a single notion of class. This usually results in a richer terminological component for their query languages, whereby terms may navigate through any number of properties, including none. We prove the richer expressiveness of this more general functional dependency, and exhibit a sound and complete set of inference axioms. Although the general problem of decidability of their logical implication remains open at this time, we present decision procedures for cases in which the dependencies included in a schema correspond to keys, or in which the schema itself is acyclic. The theory is then extended to include a form of conjunctive query. Of particular significance is that the query becomes an additional source of functional dependency. Finally, we outline several applications of the theory to various problems in physical design and in query optimization. The applications derive from an ability to predict when a query can have at most one solution.

104 citations


Journal ArticleDOI
TL;DR: It is shown that the flat relational algebra is rich enough to extract the same “flat information” from a flat database as the nested algebra does, which implies that recursive queries such as the transitive closure of a binary relation cannot be expressed in the nestedgebra.
Abstract: Nested relations generalize ordinary flat relations by allowing tuple values to be either atomic or set valued. The nested algebra is a generalization of the flat relational algebra to manipulate nested relations. In this paper we study the expressive power of the nested algebra relative to its operation on flat relational databases. We show that the flat relational algebra is rich enough to extract the same “flat information” from a flat database as the nested algebra does. Theoretically, this result implies that recursive queries such as the transitive closure of a binary relation cannot be expressed in the nested algebra. Practically, this result is relevant to (flat) relational query optimization.

96 citations


Journal ArticleDOI
TL;DR: A number of concurrency control concepts and transaction scheduling techniques that are applicable to high contention environments, and that do not rely on database semantics to reduce contention are considered.
Abstract: Future transaction processing systems may have substantially higher levels of concurrency due to reasons which include: (1) increasing disparity between processor speeds and data access latencies, (2) large numbers of processors, and (3) distributed databases. Another influence is the trend towards longer or more complex transactions. A possible consequence is substantially more data contention, which could limit total achievable throughput. In particular, it is known that the usual locking method of concurrency control is not well suited to environments where data contention is a significant factor.Here we consider a number of concurrency control concepts and transaction scheduling techniques that are applicable to high contention environments, and that do not rely on database semantics to reduce contention. These include access invariance and its application to prefetching of data, approximations to essential blocking such as wait depth limited scheduling, and phase dependent control. The performance of various concurrency control methods based on these concepts are studied using detailed simulation models. The results indicate that the new techniques can offer substantial benefits for systems with high levels of data contention.

94 citations


Journal ArticleDOI
TL;DR: This paper describes the architecture of a system having two interrelated components: a combined conventional/semantic query optimizer, and an automatic rule deriver, and shows how semantic query optimization is an extension of conventional optimization in this context.
Abstract: The use of inference rules to support intelligent data processing is an increasingly important tool in many areas of computer science. In database systems, rules are used in semantic query optimization as a method for reducing query processing costs. The savings is dependent on the ability of experts to supply a set of useful rules and the ability of the optimizer to quickly find the appropriate transformations generated by these rules. Unfortunately, the most useful rules are not always those that would or could be specified by an expert. This paper describes the architecture of a system having two interrelated components: a combined conventional/semantic query optimizer, and an automatic rule deriver.Our automatic rule derivation method uses intermediate results from the optimization process to direct the search for learning new rules. Unlike a system employing only user-specified rules, a system with an automatic capability can derive rules that may be true only in the current state of the database and can modify the rule set to reflect changes in the database and its usage pattern.This system has been implemented as an extension of the EXODUS conventional query optimizer generator. We describe the implementation, and show how semantic query optimization is an extension of conventional optimization in this context.

Journal ArticleDOI
TL;DR: This paper shows in particular how the special processing techniques of a geometric database systems, such as spatial join methods and geometric index structures, can be integrated into query processing and optimization of a relational database system.
Abstract: Gral is an extensible database system, based on the formal concept of a many-sorted relational algebra. Many-sorted algebra is used to define any application's query language, its query execution language, and its optimiztion rules. In this paper we describe Gral's optimization component. It provides (1) a sophisticated rule language—rules are transformations of abstract algebra expressions, (2) a general optimization framework under which more specific optimization algorithms can be implemented, and (3) several control mechanisms for the application of rules. An optimization algorithm can be specified as a series of steps. Each step is defined by its own collection of rules together with a selected control strategy.The general facilities are illustrated by the complete design of an example optimizer—in the form of a rule file—for a small nonstandard query language and an associated execution language. The query language includes selection, join, ordering, embedding derived values, aggregate functions, and several geometric operations. The example shows in particular how the special processing techniques of a geometric database systems, such as spatial join methods and geometric index structures, can be integrated into query processing and optimization of a relational database system. A similar, though larger, optimizer is fully functional within the geometric database system implemented as a Gral prototype.

Journal ArticleDOI
TL;DR: TSOS is presented, a system for reasoning about time that can be integrated as a time expert in environments designed for broader problem-solving domains and has the capability to reason about temporal data specified at different time granularities.
Abstract: In many computer-based applications, temporal information has to be stored, retrieved, and related to other temporal information. Several time models have been proposed to manage temporal knowledge in the fields of conceptual modeling, database systems, and artificial intelligence.In this paper we present TSOS, a system for reasoning about time that can be integrated as a time expert in environments designed for broader problem-solving domains. The main intended goal of TSOS is to allow a user to infer further information on the temporal data stored in the database through a set of deduction rules handling various aspects of time. For this purpose, TSOS provides the capability of answering queries about the temporal specifications it has in its temporal database.Distinctive time-modeling features of TSOS are the introduction of temporal modalitites, i.e., the possibility of specifying if a piece of information is always true within a time interval, or if it is only sometimes true, and the capability of answering about the possibility and the necessity of the validity of some information at a given time, the association of temporal knowledge both to instances of data and to types of data, and the development of a time calculus for reasoning on temporal data. Another relevant feature of TSOS is the capability to reason about temporal data specified at different time granularities.

Journal ArticleDOI
C. J. Date, Ronald Fagin1
TL;DR: It is shown that if a relation schema is in third normal form and every key is simple, then it is in projection-join normal form (sometimes called fifth normal form), the ultimate normal form with respect to projections and joins.
Abstract: A key is simple if it consists of a single attribute. It is shown that if a relation schema is in third normal form and every key is simple, then it is in projection-join normal form (sometimes called fifth normal form), the ultimate normal form with respect to projections and joins. Furthermore, it is shown that if a relation schema is in Boyce-Codd normal form and some key is simple, then it is in fourth normal form (but not necessarily projection-join normal form). These results give the database designer simple sufficient conditions, defined in terms of functional dependencies alone, that guarantee that the schema being designed is automatically in higher normal forms.

Journal ArticleDOI
TL;DR: The analytical tools developed enable us to see that the cautious waiting algorithm manages to achieve a delicate balance between restart and blocking, and therefore is superior (i.e., has higher throughput to both the no-waiting and general waiting algorithms under a wide range of system parameters).
Abstract: We study a deadlock-free locking-based concurrency control algorithm, called cautious waiting, which allows for a limited form of waiting. The algorithm is very simple to implement. We present an analytical solution to its performance evaluation based on the mean-value approach proposed by Tay et al. [18]. From the modeling point of view, we are able to do away with a major assumption used in Tay's previous work, and therefore capture more accurately both the restart and the blocking rates in the system. We show that to solve for this model we only need to solve for the root of a polynomial. The analytical tools developed enable us to see that the cautious waiting algorithm manages to achieve a delicate balance between restart and blocking, and therefore is superior (i.e., has higher throughput to both the no-waiting (i.e., immediate restart) and the general waiting algorithms under a wide range of system parameters. The study substantiates the argument that balancing restart and blocking is important in locking systems.

Journal ArticleDOI
TL;DR: A new way of generating signatures, the fixed-weight block (FWB) method, is introduced that has a lower false-drop probability than that of the FSB method, but its storage overhead is slightly higher.
Abstract: Previous work on superimposed coding has been characterized by two aspects. First, it is generally assumed that signatures are generated from logical text blocks of the same size; that is, each block contains the same number of unique terms after stopword and duplicate removal. We call this approach the fixed-size block (FSB) method, since each text block has the same size, as measured by the number of unique terms contained in it. Second, with only a few exceptions [6,7,8,9,17], most previous work has assumed that each term in the text contributes the same number of ones to the signature (i.e., the weight of the term signatures is fixed). The main objective of this paper is to derive an optimal weight assignment that assigns weights to document terms according to their occurrence and query frequencies in order to minimize the false-drop probability. The optimal scheme can account for both uniform and nonuniform occurence and query frequencies, and the signature generation method is still based on hashing rather than on table lookup. Furthermore, a new way of generating signatures, the fixed-weight block (FWB) method, is introduced. FWB controls the weight of every signature to a constant, whereas in FSB, only the expected signature weight is constant. We have shown that FWB has a lower false-drop probability than that of the FSB method, but its storage overhead is slightly higher. Other advantages of FWB are that the optimal weight assignment can be obtained analytically without making unrealistic assumptions and that the formula for computing the term signature weights is simple and efficient.

Journal ArticleDOI
TL;DR: An efficient heuristic to select a near optimal set of page-answers and page-traces to populate the main memory has been developed, implemented, and tested and quantitative measurements of performance benefits are reported.
Abstract: In this paper a new method to improve the utilization of main memory systems is presented. The new method is based on prestoring in main memory a number of query answers, each evaluated out of a single memory page. To this end, the ideas of page-answers and page-traces are formally described and their properties analyzed. The query model used here allows for selection, projection, join, recursive queries as well as arbitrary combinations. We also show how to apply the approach under update traffic. This concept is especially useful in managing the main memories of an important class of applications. This class includes the evaluation of triggers and alerters, performance improvement of rule-based systems, integrity constraint checking, and materialized views. These applications are characterized by the existence at compile time of a predetermined set of queries, by a slow but persistent update traffic, and by their need to repetitively reevaluate the query set. The new approach represents a new type of intelligent database caching, which contrasts with traditional caching primarily in that the cache elements are derived data and as a consequence, they overlap arbitrarily and do not have a fixed length. The contents of the main memory cache are selected based on the data distribution within the database, the set of fixed queries to preprocess, and the paging characteristics. Page-answers and page-traces are used as the smallest indivisible units in the cache. An efficient heuristic to select a near optimal set of page-answers and page-traces to populate the main memory has been developed, implemented, and tested. Finally, quantitative measurements of performance benefits are reported.

Journal ArticleDOI
TL;DR: The issues encountered in the extended algebra and calculus languages for nested relations defined by Roth, Korth, and Silberschatz are discussed, including the issue of keying problems and the use of extended set operations.
Abstract: We discuss the issues encountered in the extended algebra and calculus languages for nested relations defined by Roth, Korth, and Silberschatz.[4]. Their equivalence proof between algebra and calculus fails because of the keying problems and the use of extended set operations. Extended set operations also have unintended side effects. Furthermore, their calculus seems to allow the generation of power sets, thus making it more powerful than their algebra.

Journal ArticleDOI
TL;DR: The problem of updating databases through interfaces based on the weak instance model is studied, thus extending previous proposals that considered them only from the query point of view.
Abstract: The problem of updating databases through interfaces based on the weak instance model is studied, thus extending previous proposals that considered them only from the query point of view. Insertions and deletions of tuples are considered.As a preliminary tool, a lattice on states is defined, based on the information content of the various states.Potential results of an insertion are states that contain at least the information in the original state and that in the new tuple. Sometimes there is no potential result, and in the other cases there may be many of them. We argue that the insertion is deterministic if the state that contains the information common to all the potential results (the greatest lower bound, in the lattice framework) is a potential result itself. Effective characterizations for the various cases exist.A symmetric approach is followed for deletions, with fewer cases, since there are always potential results; determinism is characterized as a consequence.

Journal ArticleDOI
TL;DR: A practically useful algorithm is presented that solves the maintenance problem of all ctm database schemes within a "not too large" bound and shows that non-ctm database schemes are not maintainable in less than a linear time in the state size.
Abstract: The maintenance problem of a database scheme is the following decision problem: Given a consistent database state ρ and a new tuple u over some relation scheme of ρ, is the modified state ρ ∪ {u} still consistent? A database scheme is said to be constant-time-maintainable(ctm) if there exists an algorithm that solves its maintenance problem by making a fixed number of tuple retrievals. We present a practically useful algorithm, called the canonical maintenance algorithm, that solves the maintenance problem of all ctm database schemes within a "not too large" bound. A number of interesting properties are shown for ctm database schemes, among them that non-ctm database schemes are not maintainable in less than a linear time in the state size. A test method is given when only cover embedded functional dependencies (fds) appear. When the given dependencies consist of fds and the join dependency (jd) ⋈ R of the database scheme, testing whether a database scheme is ctm is reduced to the case of cover embedded fds. When dependency-preserving database schemes with only equality-generating dependencies (egds) are considered, it is shown that every ctm database scheme has a set of dependencies that is equivalent to a set of embedded fds, and thus, our test method for the case of embedded fds can be applied. In particular, this includes the important case of lossless database schemes with only egds.