scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Knowledge and Data Engineering in 1992"


Journal ArticleDOI
TL;DR: The authors survey the major memory residence optimizations and briefly discuss some of the MMDBs that have been designed or implemented.
Abstract: Main memory database systems (MMDBs) store their data in main physical memory and provide very high-speed access Conventional database systems are optimized for the particular characteristics of disk storage mechanisms Memory resident systems, on the other hand, use different optimizations to structure and organize data, as well as to make it reliable The authors survey the major memory residence optimizations and briefly discuss some of the MMDBs that have been designed or implemented >

568 citations


Journal ArticleDOI
TL;DR: A data model that includes probabilities associated with the values of the attributes, and the notion of missing probabilities is introduced for partially specified probability distributions, offers a richer descriptive language allowing the database to more accurately reflect the uncertain real world.
Abstract: It is often desirable to represent in a database, entities whose properties cannot be deterministically classified. The authors develop a data model that includes probabilities associated with the values of the attributes. The notion of missing probabilities is introduced for partially specified probability distributions. This model offers a richer descriptive language allowing the database to more accurately reflect the uncertain real world. Probabilistic analogs to the basic relational operators are defined and their correctness is studied. A set of operators that have no counterpart in conventional relational systems is presented. >

501 citations


Journal ArticleDOI
TL;DR: An algorithm for the induction of rules from examples is introduced, which is novel in the sense that it not only learns rules for a given concept, but it simultaneously learns rules relating multiple concepts.
Abstract: An algorithm for the induction of rules from examples is introduced. The algorithm is novel in the sense that it not only learns rules for a given concept (classification), but it simultaneously learns rules relating multiple concepts. This type of learning, known as generalized rule induction, is considerably more general than existing algorithms, which tend to be classification oriented. Initially, it is focused on the problem of determining a quantitative, well-defined rule preference measure. In particular, a quantity called the J-measure is proposed as an information-theoretic alternative to existing approaches. The J-measure quantifies the information content of a rule or a hypothesis. The information theoretic origins of this measure are outlined, and its plausibility as a hypothesis preference measure is examined. The ITRULE algorithm, which uses the measure to learn a set of optimal rules from a set of data samples, is defined. Experimental results on real-world data are analyzed. >

389 citations


Journal ArticleDOI
TL;DR: In the new protocol, transaction processing is shared effectively among nodes storing copies of the data, and both the response time experienced by transactions and the system throughput are improved significantly.
Abstract: A new protocol for maintaining replicated data that can provide both high data availability and low response time is presented. In the protocol, the nodes are organized in a logical grid. Existing protocols are designed primarily to achieve high availability by updating a large fraction of the copies, which provides some (although not significant) load sharing. In the new protocol, transaction processing is shared effectively among nodes storing copies of the data, and both the response time experienced by transactions and the system throughput are improved significantly. The authors analyze the availability of the new protocol and use simulation to study the effect of load sharing on the response time of transactions. They also compare the new protocol with a voting-based scheme. >

271 citations


Journal ArticleDOI
TL;DR: In this article, a conceptual framework for image information systems is presented, and the design issues for the next generation of active image information system are discussed, with a focus on smart images.
Abstract: A conceptual framework for image information systems is presented. Current research topics are surveyed, and application examples are presented. The design issues for the next generation of active image systems are discussed. It is suggested that the next generation of active image information systems should be designed on the basis of notions of generalized icons and active indexes, resulting in smart images. >

253 citations


Journal ArticleDOI
TL;DR: The authors describe how the characteristics of an object-oriented data model, such as object identity, complex object structure, methods, and class hierarchies, have an impact on the design of a query language.
Abstract: The authors describe how the characteristics of an object-oriented data model, such as object identity, complex object structure, methods, and class hierarchies, have an impact on the design of a query language. They also point out major differences with respect to relational query languages. The discussion is supported through the definition of OOPC, a formal object-oriented query language based on predicate calculus, which incorporates in a consistent formal notation most features of existing object-oriented query languages. >

159 citations


Journal ArticleDOI
TL;DR: The authors introduce two reorganization primitives, composition and decomposition, which change the population of agents and the distribution of knowledge in an organization, and develop computational organizational self-design techniques for agents with architectures based on production systems.
Abstract: The authors introduce two reorganization primitives, composition and decomposition, which change the population of agents and the distribution of knowledge in an organization. To create these primitives, they formalize organizational knowledge, which represents knowledge of potential and necessary interactions among agents in an organization. The authors develop computational organizational self-design (OSD) techniques for agents with architectures based on production systems to take advantage of the well-understood body of theory and practice. They first extend parallel production systems, where global control exists, into distributed production systems, where problems are solved by a society of agents using distributed control. Then they introduce OSD into distributed production systems to provide adaptive work allocation. Simulation results demonstrate the effectiveness of the approach in adapting to changing environmental demands. The approach affects production system design and improves the ability of build production systems that can adapt to changing real-time constraints. >

143 citations


Journal ArticleDOI
TL;DR: The integration of rule systems into database management systems is explored and the focus is on prototype systems that have been completely specified and the implementation issues encountered.
Abstract: The integration of rule systems into database management systems is explored. Research activities in this area over the past decade are surveyed. The focus is on prototype systems that have been completely specified and the implementation issues encountered. A research agenda which should be addressed by the research community over the next few years is presented. >

134 citations


Journal ArticleDOI
TL;DR: The PRISMA/DB as mentioned in this paper is a parallel, main memory relational database management system (DBMS) that uses parallelism for query processing and main memory storage of the entire database.
Abstract: PRISMA/DB, a full-fledged parallel, main memory relational database management system (DBMS) is described. PRISMA/DB's high performance is obtained by the use of parallelism for query processing and main memory storage of the entire database. A flexible architecture for experimenting with functionality and performance is obtained using a modular implementation of the system in an object-oriented programming language. The design and implementation of PRISMA/DB are described in detail. A performance evaluation of the system shows that the system is comparable to other state-of-the-art database machines. The prototype implementation of the system runs on a 100-node parallel multiprocessor. >

101 citations


Journal ArticleDOI
TL;DR: The authors address the processing of the declarative OO language WS-OSQL, provided by the fully operational prototype OODB called WS-IRIS, and performs about as fast as current OODBs with procedural interfaces only and is much faster than known relationally complete systems.
Abstract: Object-oriented database systems (OODBs) have created a demand for relationally complete, extensible, and declarative object-oriented query languages. Until now, the runtime performance of such languages was far behind that of procedural OO interfaces. One reason is the internal use of a relational engine with magnetic disk resident databases. The authors address the processing of the declarative OO language WS-OSQL, provided by the fully operational prototype OODB called WS-IRIS. A WS-IRIS database is main memory (MM) resident. The system architecture, data structures, and optimization techniques are designed accordingly. WS-OSQL queries are compiled into an OO extension of Datalog called ObjectLog, providing for objects, typing, overloading, and foreign predicates for extensibility. Cost-based optimizations in WS-IRIS using ObjectLog are presented. Performance tests show that WS-IRIS is about as fast as current OODBs with procedural interfaces only and is much faster than known relationally complete systems. These results would not be possible for a traditional disk-based implementation. However, MM residency of a database appears to be only a necessary condition for better performance. An efficient optimization is of crucial importance as well. >

97 citations


Journal ArticleDOI
E. Levy1, A. Silberschatz
TL;DR: An incremental scheme for performing recovery in main memory database systems (MMDBs), in parallel with transaction execution, is presented and a page-based incremental restart algorithm that enables the resumption of transaction processing as soon as the system is up is proposed.
Abstract: Recovery activities, like checkpointing and restart, in traditional database management systems are performed in a quiescent state where no transactions are active. This approach impairs the performance of online transaction processing systems, especially when a large volatile memory is used. An incremental scheme for performing recovery in main memory database systems (MMDBs), in parallel with transaction execution, is presented. A page-based incremental restart algorithm that enables the resumption of transaction processing as soon as the system is up is proposed. Pages are recovered individually and according to the demands of the post-crash transactions. A method for propagating updates from main memory to the backup database on disk is also provided. The emphasis is on decoupling the I/O activities related to the propagation to disk from the forward transaction execution in memory. The authors also construct a high-level recovery manager based on operation logging on top of the page-based algorithms. The proposed algorithms are motivated by the characteristics of large MMDBs, and exploit the technology of nonvolatile RAM. >

Journal ArticleDOI
TL;DR: The central ideal is to compute so-called assumption predicates that express suggested similarities between structures in two schemas to be integrated, and then have a human integrator confirm or reject them.
Abstract: Some of the shortcomings of current view integration methodologies, namely, a low emphasis on full-scale automated systems, a lack of algorithmic specifications of the integration activities, inattention to the design of databases with new properties such as databases for computer-aided design, and insufficient experience with data models with a rich set of type and abstraction mechanisms, are attacked simultaneously. The focus is on design databases for software engineering applications. The approach relies on a semantic model based on structural object-orientation with various features tailored to these applications. The expressiveness of the model is used to take the first steps toward algorithmic solutions, and it is demonstrated how corresponding tools could be embedded methodically within the view integration process and technically within a database design environment. The central ideal is to compute so-called assumption predicates that express suggested similarities between structures in two schemas to be integrated, and then have a human integrator confirm or reject them. The basic method is exemplified for the CERM data model that includes molecular aggregation, generalization, and versioning. >

Journal ArticleDOI
TL;DR: The results show that a memory resident storage component can perform significantly better than a disk-oriented storage component, even when the disk- oriented storage component has all of its data cached in memory.
Abstract: As part of the Starburst extensible database project, the authors have designed and implemented a memory resident storage component that can coexist along side traditional disk-oriented storage components. The memory resident storage component shares the code of Starburst's common services, such as query optimization, plan generation, query evaluation, record manipulation, and transaction management. The design of Starburst's memory resident storage component is discussed, contrasted with Starburst's default disk-oriented storage component, and compared to the performance of the two storage components using the Wisconsin Benchmarks. The results show that a memory resident storage component can perform significantly better than a disk-oriented storage component, even when the disk-oriented storage component has all of its data cached in memory. The benchmark results show that, by using memory resident techniques, overall query execution can be improved by up to a factor of four. >

Journal ArticleDOI
TL;DR: The design and a prototypical implementation of COMPLEX, which is a logic-based system extended with concepts from the object-oriented paradigm and is intended as a tool for the development of knowledge-based applications, are described.
Abstract: The design and a prototypical implementation of COMPLEX, which is a logic-based system extended with concepts from the object-oriented paradigm and is intended as a tool for the development of knowledge-based applications, are described. The system supports a logic language, called Complex-Datalog (C-Datalog), enhanced by semantic constructs to provide facility for data abstraction. Its implementation is based on a bottom-up computational model that guarantees a fully declarative style of programming. However, the user is also given the possibility of running a query using a top-down model of computation. Efficiency of execution is the result of the integration of different novel technologies for the compilation and the execution of queries. >

Journal ArticleDOI
TL;DR: A superimposed coding method, frame-sliced signature file, is proposed, and the performance of this method is studied and compared with that of other signature file methods.
Abstract: A superimposed coding method, frame-sliced signature file, is proposed, and the performance of this method is studied and compared with that of other signature file methods. The response time of the method is improved due to its ability to effectively partition the signature file so that fewer random disk accesses are required on both retrieval and insertion, while the good characteristics of conventional square file, i.e., low space overhead, low maintenance cost, and the write-once property, are retained. The generalized version of the method is shown to be a unified framework for several popular signature file methods including the sequential signature file (SSF) method, bit-sliced signature file (BSSF) method, and its enhanced version of B'SSF. A prototype system was implemented on UNIX workstations with the C language. Experimental results on a 2.8-Mb database consisting of 2800 technical reports and a 28-Mb database with 28000 technical reports are presented. >

Journal ArticleDOI
TL;DR: The semantics of set operations are not adequate for the richer data models of object-based database systems that include object-oriented and semantic data modeling concepts, so a framework for executing set theoretic operations on the class construct is proposed.
Abstract: The semantics of set operations are not adequate for the richer data models of object-based database systems that include object-oriented and semantic data modeling concepts. The reason is that precise semantics of set operations on complex objects require a clear distinction between the dual notions of a set and a type, both of which are present in the class construct found in object-based data models. This gap is filled here by a framework for executing set theoretic operations on the class construct. The proposed set operations, including set difference, union, intersection and symmetric difference, determine both the type description of the derived class as well as its set membership. For the former, inheritance rules are developed for property characteristics such as single-valued versus multivalued and required versus optional. For the latter, the object identity concept is developed if borrowed from data modeling research. The framework allows for property inheritance among classes that are not necessarily IS-A related. >

Journal ArticleDOI
TL;DR: The proposed protocols use the minimum number of message exchanges and can tolerate the maximum number of allowable faulty components to make each fault-free processor reach a common agreement for the cases of processor failure, link failure, or processor and link failure.
Abstract: Traditionally, the problems of Byzantine agreement, consensus, and interactive consistency are studied in a fully connected network with processors in malicious failure only. Such problems are reexamined with the assumption of malicious faults on both processors and links. The proposed protocols use the minimum number of message exchanges and can tolerate the maximum number of allowable faulty components to make each fault-free processor reach a common agreement for the cases of processor failure, link failure, or processor and link failure. >

Journal ArticleDOI
TL;DR: A basic set of principles from concurrent engineering is synthesized, and the combined model is applied to the system requirement phase, and a framework for software process reengineering is suggested.
Abstract: A basic set of principles from concurrent engineering is synthesized. These principles, when coupled with COSMOS, a management model, can be a very powerful tool in helping to reengineer the software development process. The combined model is applied to the system requirement phase, and a framework for software process reengineering is suggested. >

Journal ArticleDOI
TL;DR: Evaluation using a model of a distributed database indicates that the heuristic strategies prepared in a background mode are near optimal, and suggests that it is usually correct to abort creation of an intermediate relation which is much larger than predicted.
Abstract: Most algorithms for determining query processing strategies in distributed databases are static in nature; that is, the strategy is completely determined on the basis of a priori estimates of the size of intermediate results, and it remains unchanged throughout its execution. The static approach may be far from optimal because it denies the opportunity to reschedule operations if size estimates are found to be inaccurate. Adaptive query execution may be used to alleviate this problem. A low overhead delay method is proposed to decide when to correct a strategy. Sampling is used to estimate the size of relations, and alternative heuristic strategies prepared in a background mode are used to decide when to correct. Evaluation using a model of a distributed database indicates that the heuristic strategies are near optimal. Moreover, it also suggests that it is usually correct to abort creation of an intermediate relation which is much larger than predicted. >

Journal ArticleDOI
Scott Danforth1, Patrick Valduriez
TL;DR: The paper provides an overall description of FAD, and discusses the design rationale behind a number of its distinguishing features, and Comparisons with other database programming languages are provided.
Abstract: FAD is a strongly typed database programming language designed for uniformly manipulating transient and persistent data on Bubba, a parallel database system developed at MCC. The paper provides an overall description of FAD, and discusses the design rationale behind a number of its distinguishing features. Comparisons with other database programming languages are provided. >

Journal ArticleDOI
TL;DR: A stochastic learning algorithm based on simulated annealing in weight space is presented and the authors verify the convergence properties and feasibility of the algorithm.
Abstract: The authors discuss the requirements of learning for generalization, where the traditional methods based on gradient descent have limited success. A stochastic learning algorithm based on simulated annealing in weight space is presented. The authors verify the convergence properties and feasibility of the algorithm. An implementation of the algorithm and validation experiments are described. >

Journal ArticleDOI
TL;DR: Based on the paradigm of collective learning systems, ALIAS (adaptive learning image analysis system) is an adaptive image-processing engine specifically designed to detect anomalies in otherwise normal images and signals.
Abstract: Based on the paradigm of collective learning systems, ALIAS (adaptive learning image analysis system) is an adaptive image-processing engine specifically designed to detect anomalies in otherwise normal images and signals. To accomplish this, ALIAS requires only one pass through a training set, which typically consists of less than 100 samples. The original version of ALIAS

Journal ArticleDOI
W. Sull1, R.L. Kashyap1
TL;DR: An overall scheme for schema translation and schema integration with an object-oriented data model as common data model is proposed, and it is shown that integratedschemata can be maintained effortlessly by propagating updates in local schemata to integrated schemea unambiguously.
Abstract: The self-organizing knowledge representation aspects in heterogeneous information environments involving object-oriented databases, relational databases, and rulebases are investigated. The authors consider a facet of self-organizability which sustains the structural semantic integrity of an integrated schemea regardless of the dynamic nature of local schemata. To achieve this objective, they propose an overall scheme for schema translation and schema integration with an object-oriented data model as common data model, and it is shown that integrated schemata can be maintained effortlessly by propagating updates in local schemata to integrated schemata unambiguously. >

Journal ArticleDOI
T.-H. Chang, E. Sciore1
TL;DR: A new data model that incorporates standard concepts from semantic data models such as entities, aggregations, and ISA hierarchies is introduced and it is shown how nonnavigational queries and updates can be interpreted in this model.
Abstract: Two important features of modern database models are support for complex data structures and support for high-level data retrieval and update. The first issue has been studied by the development of various semantic data models; the second issue has been studied through universal relation data models. How the advantages of these two approaches can be combined is presently examined. A new data model that incorporates standard concepts from semantic data models such as entities, aggregations, and ISA hierarchies is introduced. It is then shown how nonnavigational queries and updates can be interpreted in this model. The main contribution is to demonstrate how universal relation techniques can be extended to a more powerful data model. Moreover, the semantic constructs of the model allow one to eliminate many of the limitations of previous universal relation models. >

Journal ArticleDOI
TL;DR: A learning model for designing heuristics automatically under resource constraints, based on testing a population of competing HMs for an application problem, and switches from one to another dynamically, depending on the outcome of previous tests is studied.
Abstract: A learning model for designing heuristics automatically under resource constraints is studied. The focus is on improving performance-related heuristic methods (HMs) in knowledge-lean application domains. It is assumed that learning is episodic, that the performance measures of an episode are dependent only on the final state reached in evaluating the corresponding test case, and that the aggregate performance measures of the HMs involved are independent of the order of evaluation of test cases. The learning model is based on testing a population of competing HMs for an application problem, and switches from one to another dynamically, depending on the outcome of previous tests. Its goal is to find a good HM within the resource constraints, with proper tradeoff between cost and quality. It extends existing work on classifier systems by addressing issues related to delays in feedback, scheduling of tests of HMs under limited resources, anomalies in performance evaluation, and scalability of HMs. Experience in applying the learning method is described. >

Journal ArticleDOI
TL;DR: The Genstring, a linear iconic index which can be used to represent two-, three-, or higher-dimensional scenes, is introduced and provides a compact, unambiguous representation of a three-dimensional scene.
Abstract: Several iconic indexes for representing three-dimensional scenes are presented. The approach extends previous work in iconic indexing of two-dimensional scenes in a unified manner. Good characteristics for iconic indexes are also pointed out. OPP2 and OPP3, two-dimensional iconic indexes for three-dimensional scenes, are introduced. The problem of ambiguity in the OPP2 and OPP3 representations of three-dimensional scenes is studied in detail and a class of images for which they are unambiguous is identified. The Genstring, a linear iconic index which can be used to represent two-, three-, or higher-dimensional scenes, is introduced. It provides a compact, unambiguous representation of a three-dimensional scene. The Genstring takes advantage of previous work, thus providing fast pattern matching for higher-dimensional scenes. In fact, the pattern matching algorithm given for k-dimensional scenes is as fast as that previously given for two-dimensional scenes. >

Journal ArticleDOI
TL;DR: A data model that allows for the storage of detailed change history in so-called backlog relations and its extended relational algebra, in conjunction with the extended data structures, provides a powerful tool for the retrieval of patterns and exceptions in change history.
Abstract: A data model that allows for the storage of detailed change history in so-called backlog relations is described. Its extended relational algebra, in conjunction with the extended data structures, provides a powerful tool for the retrieval of patterns and exceptions in change history. An operator, Sigma , based on the notion of compact active domain is introduced. It groups data not in predefined groups but in groups that fit the data. This operator further expands the retrieval capabilities of the algebra. The expressive power of the algebra is demonstrated by examples, some of which show how patterns and exceptions in change history can be detected. Sample applications of this work are statistical and scientific databases, monitoring (of databases, manufacturing plants, power plants, etc.), CAD, and CASE. >

Journal ArticleDOI
X. Wu1, T. Ichikawa1
TL;DR: A knowledge-based database assistant (KDA) which integrates a natural language query system with a skeleton-based query guiding facility and a semantic network model, S-Net, is introduced to represent the knowledge fornatural language query processing and skeleton generation.
Abstract: A knowledge-based database assistant (KDA) which integrates a natural language query system with a skeleton-based query guiding facility is provided. When a user works with the KDA natural language query system, the query guiding facility can supply several kinds of skeletons to guide users in performing database retrieval tasks. A semantic network model, S-Net, is introduced to represent the knowledge for natural language query processing and skeleton generation. Methods for implementing the system are discussed. >

Journal ArticleDOI
A.J. Pasik1
TL;DR: By creating constrained copies of culprit rules and distributing them to their own processors, more parallelism is achieved, as evidenced by increased speed up, shown to be specific to rule-based systems with certain characteristics.
Abstract: Rule-based systems have been hypothesized to contain only minimal parallelism. However, techniques to extract more parallelism from existing systems are being investigated. Among these methods, it is desirable to find those which balance the work being performed in parallel evenly among the rules, while decreasing the amount of work being performed sequentially in each cycle. The automatic transformation of creating constrained copies of culprit rules accomplishes both of the above goals. Rule-based systems are plagued by occasional rules which slow slow down the entire execution. These culprit rules require much more processing than others, causing other processors to idle while they continue to match. By creating constrained copies of culprit rules and distributing them to their own processors, more parallelism is achieved, as evidenced by increased speed up. This effect is shown to be specific to rule-based systems with certain characteristics. These characteristics are identified as being common within an important class of rule-based systems: expert database systems. >

Journal ArticleDOI
TL;DR: The authors suggest that the software engineering tools of the future will have to rely on: deep representation to capture a sufficiently large part of knowledge about programming in general and particular programs; inspection methods to deal with complexity; and intelligent assistance.
Abstract: Most software engineering tools use a shallow representation of software objects and manipulate this representation using procedural methods. This approach allows one to get off to a fast start and quickly provides a tool that delivers benefits. However, a point will be reached where more knowledge-intensive approaches will be needed to achieve significantly higher levels of capability. The authors suggest that the software engineering tools of the future will have to rely on: deep representation to capture a sufficiently large part of knowledge about programming in general and particular programs; inspection methods to deal with complexity; and intelligent assistance. >