Showing papers in "IEEE Transactions on Knowledge and Data Engineering in 1993"

PDF

Open Access

Journal Article•DOI•

Database mining: a performance perspective

[...]

Rakesh Agrawal¹, Tomasz Imielinski¹, Arun N. Swami¹•Institutions (1)

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented and an algorithm for classification obtained by combining the basic rule discovery operations is given.

...read moreread less

Abstract: The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented. Three classes of database mining problems involving classification, associations, and sequences are described. It is argued that these problems can be uniformly viewed as requiring discovery of rules embedded in massive amounts of data. A model and some basic operations for the process of rule discovery are described. It is shown how the database mining problems considered map to this model, and how they can be solved by using the basic operations proposed. An example is given of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm is efficient in discovering classification rules and has accuracy comparable to ID3, one of the best current classifiers. >

...read moreread less

1,539 citations

Journal Article•DOI•

Data-driven discovery of quantitative rules in relational databases

[...]

Jiawei Han¹, Y. Cai¹, Nick Cercone¹•Institutions (1)

Simon Fraser University¹

01 Feb 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: It is shown that attribute-oriented induction provides an efficient and effective mechanism for learning various kinds of knowledge rules from relational databases.

...read moreread less

Abstract: A quantitative rule is a rule associated with quantitative information which assesses the representativeness of the rule in the database. An efficient induction method is developed for learning quantitative rules in relational databases. With the assistance of knowledge about concept hierarchies, data relevance, and expected rule forms, attribute-oriented induction can be performed on the database, which integrates database operations with the learning process and provides a simple, efficient way of learning quantitative rules from large databases. The method involves the learning of both characteristic rules and classification rules. Quantitative information facilitates quantitative reasoning, incremental learning, and learning in the presence of noise. Moreover, learning qualitative rules can be treated as a special case of learning quantitative rules. It is shown that attribute-oriented induction provides an efficient and effective mechanism for learning various kinds of knowledge rules from relational databases. >

...read moreread less

425 citations

Journal Article•DOI•

OVID: design and implementation of a video-object database system

[...]

E. Oomoto¹, Katsumi Tanaka•Institutions (1)

Kyoto Sangyo University¹

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The OVID system offers a browsing/inspection tool called VideoChart, and adhoc query facility called VideoSQL, and a video-object definition tool.

...read moreread less

Abstract: A video-object data model and the design and implementation of a prototype video-object database system named OVID based on the model are described. Notable features of the video-object data model are a mechanism to share common descriptional data among video-objects, called interval-inclusion based inheritance, and operations to composite video-objects. The OVID system offers a browsing/inspection tool called VideoChart, and adhoc query facility called VideoSQL, and a video-object definition tool. >

...read moreread less

400 citations

Journal Article•DOI•

Interval-based conceptual models for time-dependent multimedia data

[...]

Thomas D. C. Little¹, Arif Ghafoor²•Institutions (2)

Boston University¹, University College West²

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: N-ary and reverse temporal relations are introduced and defined along with their temporal constraints to ensure a property of monotonically increasing playout deadlines to facilitate both real-time deadline-driven playout scheduling or optimistic interval-based process playout.

...read moreread less

Abstract: Multimedia data often have time dependencies that must be satisfied at presentation time. To support a general-purpose multimedia information system, these timing relationships must be managed to provide utility to both the data presentation system and the multimedia author. New conceptual models for capturing these timing relationships, and managing them as part of a database are proposed. Specifically, n-ary and reverse temporal relations are introduced and defined along with their temporal constraints. These new relations are a generalization of earlier temporal models and establish the basis for conceptual database structures and temporal access control algorithms to facilitate forward, reverse, and partial-interval evaluation during multimedia object playout. The proposed relations are defined to ensure a property of monotonically increasing playout deadlines to facilitate both real-time deadline-driven playout scheduling or optimistic interval-based process playout. A translation of the conceptual models to a structure suitable for a relational database is presented. >

...read moreread less

323 citations

Journal Article•DOI•

Systems for knowledge discovery in databases

[...]

Christopher J. Matheus, Philip K. Chan¹, Gregory Piatetsky-Shapiro•Institutions (1)

Columbia University¹

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems and is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench.

...read moreread less

Abstract: Knowledge-discovery systems face challenging problems from real-world databases, which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. These problems are addressed and some techniques for handling them are described. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research. >

...read moreread less

278 citations

Journal Article•DOI•

Efficient storage techniques for digital continuous multimedia

[...]

P.V. Rangan¹, Harrick M. Vin¹•Institutions (1)

University of California, San Diego¹

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The problem of collocational storage of media strands, which are sequences of continuously recorded audio samples or video frames, on disk to support the integration of storage and transmission of multimedia data with computing is examined and a model that relates disk and device characteristics to the playback rates ofMedia strands and derives storage patterns so as to guarantee continuous retrieval of media strand is presented.

...read moreread less

Abstract: The problem of collocational storage of media strands, which are sequences of continuously recorded audio samples or video frames, on disk to support the integration of storage and transmission of multimedia data with computing is examined. A model that relates disk and device characteristics to the playback rates of media strands and derives storage patterns so as to guarantee continuous retrieval of media strands is presented. To efficiently utilize the disk space, mechanisms for merging storage patterns of multiple media strands by filling the gaps between media blocks of one strand with media blocks of other strands are developed. Both an online algorithm suitable for merging a new media strand into a set of already stored strands and an offline merging algorithm that can be applied a priori to the storage of a set of media strands before any of them have been stored on disk are proposed. As a consequence of merging, storage patterns of media strands may become perturbed slightly. To compensate for this read-ahead and buffering are required so that continuity of retrieval remains satisfied are also presented. >

...read moreread less

234 citations

Journal Article•DOI•

Continuous retrieval of multimedia data using parallelism

[...]

Shahram Ghandeharizadeh¹, Luis Ramos¹•Institutions (1)

University of Southern California¹

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A parallel multimedia information system and the key technical ideas that enable it to support a real-time display of multimedia objects are described, and two alternative approaches are described to support simultaneous display of several multimedia objects for different users.

...read moreread less

Abstract: Most implementations of workstation-based multimedia information systems cannot support a continuous display of high resolution audio and video data and suffer from frequent disruptions and delays termed hiccups. This is due to the low I/O bandwidth of the current disk technology, the high bandwidth requirement of multimedia objects, and the large size of these objects, which requires them to be almost always disk resident. A parallel multimedia information system and the key technical ideas that enable it to support a real-time display of multimedia objects are described. In this system, a multimedia object across several disk drives is declustered, enabling the system to utilize the aggregate bandwidth of multiple disks to retrieve an object in real-time. Then, the workload of an application is distributed evenly across the disk drives to maximize the processing capability of the system. To support simultaneous display of several multimedia objects for different users, two alternative approaches are described. The first approach multitasks a disk drive among several requests while the second replicates the data and dedicates resources to each individual request. The trade-offs associated with each approach are investigated using a simulation model. >

...read moreread less

146 citations

Journal Article•DOI•

Eliciting knowledge and transferring it effectively to a knowledge-based system

[...]

Brian R. Gaines¹, Mildred L. G. Shaw¹•Institutions (1)

University of Calgary¹

01 Feb 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The knowledge acquisition bottleneck impeding theDevelopment of expert systems is being alleviated by the development of computer-based knowledge acquisition tools, which work directly with experts to elicit knowledge, and structure it appropriately to operate as a decision support tool within an expert system.

...read moreread less

Abstract: The knowledge acquisition bottleneck impeding the development of expert systems is being alleviated by the development of computer-based knowledge acquisition tools. These work directly with experts to elicit knowledge, and structure it appropriately to operate as a decision support tool within an expert system. However, the elicitation of expert knowledge and its effective transfer to a useful knowledge-based system is complex and involves diverse activities. The complete development of a decision support system using knowledge acquisition tools is illustrated. The example is simple enough to be completely analyzed but exhibits enough real-world characteristics to give significant insights into the processes and problems of knowledge engineering. >

...read moreread less

135 citations

Journal Article•DOI•

A visual information management system for the interactive retrieval of faces

[...]

Jeffrey Bach¹, S. Paul¹, Ramesh Jain¹•Institutions (1)

University of Michigan¹

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A general architecture for visual information-management systems (VIMS), which combine the strengths of both approaches, is presented, and a VIMS developed for face-image retrieval is presented to demonstrate these ideas.

...read moreread less

Abstract: The complex nature of two-dimensional image data has presented problems for traditional information systems designed strictly for alphanumeric data. Systems aimed at effectively managing image data have generally approached the problem from two different views: They either possess a strong database component with little image understanding, or they serve as an image repository for computer vision applications, with little emphasis on the image retrieval process. A general architecture for visual information-management systems (VIMS), which combine the strengths of both approaches, is presented. The system utilizes computer vision routines for both insertion and retrieval and allows easy query-by-example specifications. The vision routines are used to segment and evaluate objects based on domain-knowledge describing the objects and their attributes. The vision system can then assign feature values to be used for similarity-measures and image retrieval. A VIMS developed for face-image retrieval is presented to demonstrate these ideas. >

...read moreread less

132 citations

Journal Article•DOI•

APPROXIMATE-a query processor that produces monotonically improving approximate answers

[...]

Susan V. Vrbsky¹, Jialu Liu²•Institutions (2)

University of Alabama¹, University of Illinois at Urbana–Champaign²

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: APPROXIMATE, a query processor that makes approximate answers available if part of the database is unavailable, or if there is not enough time to produce an exact answer, is described.

...read moreread less

Abstract: APPROXIMATE, a query processor that makes approximate answers available if part of the database is unavailable, or if there is not enough time to produce an exact answer, is described. The processor implements approximate query processing, and the accuracy of the approximate result produced improves monotonically with the amount of data retrieved to produce the result. The monotone query processing algorithm of APPROXIMATE works within a standard relational algebra framework. APPROXIMATE maintains semantic information for approximate query processing at an underlying level, and can be implemented on a relational database system with little change to the relational architecture. It is shown how APPROXIMATE is implemented to make effective use of the semantic support. The additional overhead required by APPROXIMATE is described. >

...read moreread less

121 citations

Journal Article•DOI•

The knowledge-based object-oriented PICQUERY/sup +/ language

[...]

Alfonso F. Cardenas, I.T. Ieong, Ricky K. Taira, R. Barker, C.M. Breant - Show less +1 more

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The PICQUERY/sup +/ language and its underlying stacked image data model are enhanced with major advances that include convenient specification of the data domain space among a multimedia database federation, visualization of underlying data models, knowledge-based hierarchies, and domain rules.

...read moreread less

Abstract: PICQUERY/sup +/, a high-level domain-independent query language for pictorial and alphanumeric database management, is introduced. The PICQUERY/sup +/ language and its underlying stacked image data model are enhanced with major advances that include: convenient specification of the data domain space among a multimedia database federation, visualization of underlying data models, knowledge-based hierarchies, and domain rules, understanding of high-level abstract data types, ability to perform data object matches based on imprecise or fuzzy descriptors, imprecise relational correlators, and temporal and object evolutionary events, specification of alphanumeric and image processing algorithms on data, and specification of alphanumeric and image visualization methods for user presentation. The power of PICQUERY/sup +/ is illustrated using examples drawn from the medical imaging domain. A graphical menu-driven user interface is demonstrated for this domain as an example of the menu interface capabilities of PICQUERY/sup +/. >

...read moreread less

Journal Article•DOI•

Abstract-driven pattern discovery in databases

[...]

Vasant Dhar¹, A. Tuzhulin¹•Institutions (1)

New York University¹

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Methods are presented for discovering interesting patterns based on abstracts which are summaries of the data expressed in the language of the user, including relational views and classification hierarchies.

...read moreread less

Abstract: The problem of discovering interesting patterns in large volumes of data is studied. Patterns can be expressed not only in terms of the database schema but also in user-defined terms, such as relational views and classification hierarchies. The user-defined terminology is stored in a data dictionary that maps it into the language of the database schema. A pattern is defined as a deductive rule expressed in user-defined terms that has a degree of uncertainty associated with it. Methods are presented for discovering interesting patterns based on abstracts which are summaries of the data expressed in the language of the user. >

...read moreread less

Journal Article•DOI•

Investigating the applicability of Petri nets for rule-based system verification

[...]

Derek L. Nazareth¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jun 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An approach to RBS verification in which the system is modeled as a Petri net on which error detection is performed is presented, and a set of propositions are formulated to locate errors of redundancy, conflict, circularity, and gaps in domain knowledge.

...read moreread less

Abstract: It is suggested that as rule-based system (RBS) technology gains wider acceptance, the need to create and maintain large knowledge bases will assume greater importance. Demonstrating a rule base to be free from error remains one of the obstacles to the adoption of this technology. An approach to RBS verification in which the system is modeled as a Petri net on which error detection is performed is presented. A set of propositions is formulated to locate errors of redundancy, conflict, circularity, and gaps in domain knowledge. Rigorous proofs of these propositions are provided. Difficulties in implementing a Petri net-based verifier and the potential restrictions of the applicability of this approach are discussed. >

...read moreread less

Journal Article•DOI•

Delay compensation protocols for synchronization of multimedia data streams

[...]

Kaliappa Ravindran¹, V. Bansal¹•Institutions (1)

Kansas State University¹

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: End-to-end transport protocols that compensate for the data skew that may arise due to data loss/delay is described and major uses of the protocols are in broadband ISDNs and metropolitan area networks (MANs).

...read moreread less

Abstract: Temporal synchronization of various data streams in multimedia information supporting voice, video, graphics and text, that are exchanged between users over a high speed network are discussed. During delivery of such data, maintaining the required association between data units across various streams in real-time is necessary to sustain quality of service in the presence of data loss and/or delay in the network. Solving this synchronization problem requires framing of data streams whereby various points in the data streams deliverable simultaneously to a user are identified. A solution in which the temporal axis of an application is segmented into intervals in which each interval is a unit of synchronization and holds a data frame is presented. Simultaneous data delivery involves delivering all data segments belonging to an interval within a certain real-time delay, as specifiable by the application. Based on the approach, end-to-end transport protocols that compensate for the data skew that may arise due to data loss/delay is described. Simulation experiments confirm the viability of the protocols. Major uses of the protocols are in broadband ISDNs and metropolitan area networks (MANs). >

...read moreread less

Journal Article•DOI•

Aggregates in the temporal query language TQuel

[...]

Richard T. Snodgrass¹, S. Gomez, L.E. McKenzie•Institutions (1)

University of Arizona¹

01 Oct 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper defines new constructs to support aggregation in the temporal query language TQuel and presents their formal semantics in the tuple relational calculus, demonstrating that implementation is straightforward and efficient.

...read moreread less

Abstract: This paper defines new constructs to support aggregation in the temporal query language TQuel and presents their formal semantics in the tuple relational calculus. A formal semantics for Quel aggregates is defined in the process. Multiple aggregates; aggregates appearing in the where, when, and valid clauses; nested aggregation; and instantaneous, cumulative, moving window, and unique variants are supported. These aggregates provide a rich set of statistical functions that range over time, while requiring minimal additions to TQuel and its semantics. We show how the aggregates may be supported in an historical algebra, both in a batch and in an incremental fashion, demonstrating that implementation is straightforward and efficient. >

...read moreread less

Journal Article•DOI•

Efficient indexing methods for temporal relations

[...]

H. Gunadhi¹, A. Segev¹•Institutions (1)

University of California, Berkeley¹

01 Jun 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: It is shown that a nested index could be a very efficient structure in this context and is preferable to a composite B-tree or an index that involves linear lists of historical tuples.

...read moreread less

Abstract: The primary issues that affect the design of indexing methods are examined, and several structures and algorithms for specific cases are proposed. The append-only tree (AP-tree) structure indexes data for append-only databases to help event-join optimization and queries that can exploit the inherent time ordering of such databases. Two variable indexing for the surrogate and time is discussed. It is shown that a nested index could be a very efficient structure in this context and is preferable to a composite B-tree or an index that involves linear lists of historical tuples. The problems of indexing time intervals, as related to nonsurrogate joint-indexing, are discussed. Several algorithms to partition the time line are introduced. A two-variable AT index based on nested indexing is outlined. >

...read moreread less

Journal Article•DOI•

Learning transformation rules for semantic query optimization: a data-driven approach

[...]

S. Shekar¹, B. Hamidzadeh², A. Kohli¹, M. Coyle¹•Institutions (2)

University of Minnesota¹, Hong Kong University of Science and Technology²

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An approach to learning query-transformation rules based on analyzing the existing data in the database and a framework and a closure algorithm for learning rules from a given data distribution are described.

...read moreread less

Abstract: An approach to learning query-transformation rules based on analyzing the existing data in the database is proposed. A framework and a closure algorithm for learning rules from a given data distribution are described. The correctness, completeness, and complexity of the proposed algorithm are characterized and a detailed example is provided to illustrate the framework. >

...read moreread less

Journal Article•DOI•

Fuzzy database query languages and their relational completeness theorem

[...]

Y. Takahashi

01 Feb 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Two fuzzy database query languages are proposed that are used to query fuzzy databases that are enhanced from relational databases in such a way that fuzzy sets are allowed in both attribute values and truth values.

...read moreread less

Abstract: Two fuzzy database query languages are proposed. They are used to query fuzzy databases that are enhanced from relational databases in such a way that fuzzy sets are allowed in both attribute values and truth values. A fuzzy calculus query language is constructed based on the relational calculus, and a fuzzy algebra query language is also constructed based on the relational algebra. In addition, a fuzzy relational completeness theorem such that the languages have equivalent expressive power is proved. >

...read moreread less

Journal Article•DOI•

Multiple prefetch adaptive disk caching

[...]

K.S. Grimsrud¹, James Archibald¹, Brent Nelson•Institutions (1)

Brigham Young University¹

01 Feb 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A new disk caching algorithm is presented that uses an adaptive prefetching scheme to reduce the average service time for disk references, and the range of parameters of this scheme is explored, and its performance is evaluated through trace-driven simulation.

...read moreread less

Abstract: A new disk caching algorithm is presented that uses an adaptive prefetching scheme to reduce the average service time for disk references. Unlike schemes which simply prefetch the next sector or group of sectors, this method maintains information about the order of past disk accesses which is used to accurately predict future access sequences. The range of parameters of this scheme is explored, and its performance is evaluated through trace-driven simulation, using traces obtained from three different UNIX minicomputers. Unlike disk trace data previously described in the literature, the traces used include time stamps for each reference. With this timing information-essential for evaluating any prefetching scheme-it is shown that a cache with the adaptive prefetching mechanism can reduce the average time to service a disk request by a factor of up to three, relative to an identical disk cache without prefetching. >

...read moreread less

Journal Article•DOI•

A visual query language for graphical interaction with schema-intensive databases

[...]

L. Mohan¹, R.L. Kashyap²•Institutions (2)

Sun Microsystems¹, Purdue University²

01 Oct 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Perhaps the most powerful feature of VQL is its ability to provide high semantic expressibility (in being able to specify highly complex queries) while maintaining simplicity in the user's query formulation process.

...read moreread less

Abstract: This paper presents a visual query language called VQL for interacting with an object-oriented schema-intensive data model. VQL allows convenient access to the various types of knowledge captured by the semantic model. It consists of a set of "graphical primitives" along with a combination grammar for creating graphical queries. The visual language is internally supported by a prolog-like predicate based query language. The formal grammar underlying the predicate based language is also presented. Apart from being able to create simple queries that can be specified in SQL or QBE, VQL can be used for making queries on any object-oriented data model including the generalization of the E-R model. VQL also handles complicated, indirect queries, specially those that require a reasoning system for query interpretation and response generation. Further, recursive queries on graph structures such as finding transitive closures of graphs may be easily specified. Perhaps the most powerful feature of VQL is its ability to provide high semantic expressibility (in being able to specify highly complex queries) while maintaining simplicity in the user's query formulation process. VQL is embedded in an object-oriented graphical database interaction environment that supports schema creation and manipulation in addition to database querying and updation. The prototype has been implemented in Smalltalk-80 running on a Sun 3/60 workstation. All the illustrations of visual interaction presented are taken from actual interaction sessions. >

...read moreread less

Journal Article•DOI•

Evaluating performance and quality of knowledge-based systems: foundation and methodology

[...]

G. Guida¹, Giancarlo Mauri²•Institutions (2)

Brescia University¹, University of Milan²

01 Apr 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An approach to KBS evaluation that comprises a precise definition of the concepts of performance and quality, a general evaluation methodology, and a set of criteria to support its practical application is presented.

...read moreread less

Abstract: A survey of knowledge-based system (KBS) evaluation methods is presented. The authors argue that these methods are partial, poorly systematic, and not easily applicable. An approach to KBS evaluation that comprises a precise definition of the concepts of performance and quality, a general evaluation methodology, and a set of criteria to support its practical application is presented. The proposed approach has been tried only partially and with rather simple test cases. >

...read moreread less

Journal Article•DOI•

An efficient algorithm for matching multiple patterns

[...]

Jang-Jong Fan¹, Keh-Yih Su¹•Institutions (1)

National Tsing Hua University¹

01 Apr 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The match algorithm combines the concept of deterministic finite state automata (DFSA) and the Boyer-Moore algorithm to achieve better performance and Experimental results indicate that in the average case, the algorithm is able to perform pattern match operations sublinearly.

...read moreread less

Abstract: An efficient algorithm for performing multiple pattern match in a string is described. The match algorithm combines the concept of deterministic finite state automata (DFSA) and the Boyer-Moore algorithm to achieve better performance. Experimental results indicate that in the average case, the algorithm is able to perform pattern match operations sublinearly, i.e. it does not need to inspect every character of the string to perform pattern match operations. The analysis shows that the number of characters to be inspected decreases as the length of patterns increases, and increases slightly as the total number of patterns increases. To match an eight-character pattern in an English string using the algorithm, only about 17% of all characters of the strong and 33% of all characters of the string, when the number of patterns is seven, are inspected. In an actual testing, the algorithm running on SUN 3/160 takes only 3.7 s to search seven eight-character patterns in a 1.4-Mbyte English text file. >

...read moreread less

Journal Article•DOI•

A balanced hierarchical data structure for multidimensional data with highly efficient dynamic characteristics

[...]

Yasuaki Nakamura¹, Shigeru Abe¹, Y. Ohsawa², M. Sakauchi³•Institutions (3)

Mitsubishi Electric¹, Saitama University², University of Tokyo³

01 Aug 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: By the series of simulation tests, the results indicate that storage utilization is more than 80% in practice, and that retrieval performance and dynamic characteristics are superior to conventional methods.

...read moreread less

Abstract: A new multidimensional data structure, multidimensional tree (MD-tree), is proposed. The MD-tree is developed by extending the concept of the B-tree to the multidimensional data, so that the MD-tree is a height balanced tree similar to the B-tree. The theoretical worst-case storage utilization is guaranteed to hold more than 66.7% (2/3) of full capacity. The structure of the MD-tree and the algorithms to perform the insertion, deletion, and spatial searching are described. By the series of simulation tests, the performances of the MD-tree and conventional methods are compared. The results indicate that storage utilization is more than 80% in practice, and that retrieval performance and dynamic characteristics are superior to conventional methods. >

...read moreread less

Journal Article•DOI•

Association algebra: a mathematical foundation for object-oriented databases

[...]

Stanley Y. W. Su¹, M. Guo, Herman Lam•Institutions (1)

University of Florida¹

01 Oct 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An association algebra (A-algebra) is presented to serve as a mathematical foundation for processing O-O databases, which is analogous to the relational algebra used for processing relational databases.

...read moreread less

Abstract: The application of the object-oriented (O-O) paradigm in the database management field has gained much attention in recent years. Several experimental and commercial O-O database management systems have become available. However, the existing O-O DBMSs still lack a solid mathematical foundation for the manipulation of O-O databases, the optimization of queries, and the design and selection of storage structures for supporting O-O database manipulations. This paper presents an association algebra (A-algebra) to serve as a mathematical foundation for processing O-O databases, which is analogous to the relational algebra used for processing relational databases. In this algebra, objects and their associations in an O-O database are uniformly represented by association patterns which are manipulated by a number of operators to produce other association patterns. Different from the relational algebra, in which set operations operate on relations with union-compatible structures, the A-algebra operators can operate on association patterns of homogeneous and heterogeneous structures. Different from the traditional record-based relational processing, the A-algebra allows very complex patterns of object associations to be directly manipulated. The pattern-based query formulation and the A-algebra operators are described. Some mathematical properties of the algebraic operators are presented together with their application in query decomposition and optimization. The completeness of the A-algebra is also defined and proven. The A-algebra has been used as the basis for the design and implementation of an object-oriented query language, OQL, which is the query language used in a prototype Knowledge Base Management System OSAM*.KBMS. >

...read moreread less

Journal Article•DOI•

Logical inference of Horn clauses in Petri net models

[...]

Chuang Lin, Abhijit Chaudhury¹, Andrew B. Whinston, Dan C. Marinescu¹•Institutions (1)

University of Texas at Austin¹

01 Jun 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Algorithms for computing T-invariants of Petri net models of logical inference systems are investigated, based on the idea of resolution and exploit the presence of one-literal, pure- literal, and splitting clauses to lead to faster computation.

...read moreread less

Abstract: Petri net models for the Horn clause form of propositional logic and of first-order predicate logic are studied. A net model for logical inconsistency check is proposed. Algorithms for computing T-invariants of Petri net models of logical inference systems are investigated. The algorithms are based on the idea of resolution and exploit the presence of one-literal, pure-literal, and splitting clauses to lead to faster computation. Algorithms for computing T-invariants of high-level Petri net (HLPN) models of predicate logic are presented. >

...read moreread less

Journal Article•DOI•

A possible world semantics for disjunctive databases

[...]

E.P.F. Chan¹•Institutions (1)

University of Waterloo¹

01 Apr 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: It is shown that for propositional databases with no negative clauses, the problem of determining if a negative ground literal is inferred under the GCWA is co-NP-hard, while the same problem can be solved efficiently under the DDR and PWS.

...read moreread less

Abstract: The fundamental problem that arises when a ground atom in a disjunctive database is assumed false is discussed. There are basically two different approaches for inferring negative information for disjunctive databases: J. Minker's (1982) generalized closed world assumption (GCWA) and K.A. Ross and R.W. Topor's (1988) disjunctive database rule (DDR). It is argued that neither approach is satisfactory. A database semantics called PWS is proposed. It is shown that for propositional databases with no negative clauses, the problem of determining if a negative ground literal is inferred under the GCWA is co-NP-hard, while the same problem can be solved efficiently under the DDR and PWS. However, in the general case, the problem becomes co-NP-complete for the DDR and PWS. Relationships among GCWA, DDR, and PWS are highlighted. In general, disjunctive clauses are interpreted inclusively under the DDR and unpredictably under the GCWA. >

...read moreread less

Journal Article•DOI•

Combining joint and semi-join operations for distributed query processing

[...]

Ming-Syan Chen¹, P.S. Yu•Institutions (1)

IBM¹

01 Jun 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Two important concepts that occur with the use of join operations as reducers in query processing, namely, gainful semi-joins and pure joint attributes, are used.

...read moreread less

Abstract: The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as reducers in query processing, namely, gainful semi-joins and pure joint attributes, are used. Some semi-joint, though not profitable themselves, may benefit the execution of subsequent join operations as reducers. Such a semi-join is termed a gainful semi-join. In addition, join attributes that are not part of the output attributes are referred to as pure join attributes. They exploit the usefulness of gainful semi-joins and use the removability of pure join attributes to reduce the amount of data transmission required for query processing. Heuristic searches are developed to determine a sequence of join and semi-join reducers for query processing. Results indicate the importance of the approach to combining joins and semi-joins for distributed query processing. >

...read moreread less

Journal Article•DOI•

A framework for knowledge discovery and evolution in databases

[...]

Jong P. Yoon¹, Larry Kerschberg¹•Institutions (1)

George Mason University¹

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A tool for characterizing the exceptions in databases and evolving knowledge as a database evolves is developed, which includes using a database query to discover new rules.

...read moreread less

Abstract: A concept for knowledge discovery and evolution in databases is described. The key issues include: using a database query to discover new rules; using not only positive examples (answer to a query), but also negative examples to discover new rules; and harmonizing existing rules with the new rules. A tool for characterizing the exceptions in databases and evolving knowledge as a database evolves is developed. >

...read moreread less

Journal Article•DOI•

Coupling production systems and database systems: a homogeneous approach

[...]

Timos Sellis¹, Chih-Chen Lin¹, Louiqa Raschid¹•Institutions (1)

University of Maryland, College Park¹

01 Apr 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Methods for storing and manipulating large rule bases using a relational database management systems (DBMS) and an approach to decomposing and storing the condition elements in the antecedents of rules such as those used in production rule-based systems are presented.

...read moreread less

Abstract: Methods for storing and manipulating large rule bases using a relational database management systems (DBMS) are discussed. An approach to decomposing and storing the condition elements in the antecedents of rules such as those used in production rule-based systems is presented. A set-oriented approach, DBCond, which uses a special data structure that is implemented using relations is proposed. A matching algorithm for DBCond uses the relational structures to efficiently identify rules whose antecedents are satisfied. The performance of DBCond is compared with that of DBRete, a DBMS implementation of the Rete match algorithm developed for use with the production rule language OPS5. DBCond is also compared with DBQuery, a method that is based on evaluating queries corresponding to the conditions in the antecedents of the rules. Improvements to the data structure and the algorithms of the DBCond method are described. An advantage of DBCond is that it is fully parallelizable, thus making it attractive for parallel computing environments. >

...read moreread less

Journal Article•DOI•

Database concurrency control in multilevel secure database management systems

[...]

T.F. Keefe¹, Wei-Tek Tsai, J. Srivastava•Institutions (1)

Pennsylvania State University¹

01 Dec 1993-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Data conflict security, (DC-security), a property that implies a system is free of covert channels due to contention for access to data, is introduced and a definition of DC-security based on noninterference is presented.

...read moreread less

Abstract: Concurrent execution of transactions in database management systems (DBMSs) may lead to contention for access to data, which in a multilevel secure DBMS (MLS/DBMS) may lead to insecurity. Security issues involved in database concurrency control for MLS/DBMSs are examined, and it is shown how a scheduler can affect security. Data conflict security, (DC-security), a property that implies a system is free of covert channels due to contention for access to data, is introduced. A definition of DC-security based on noninterference is presented. Two properties that constitute a necessary condition for DC-security are introduced along with two simpler necessary conditions. A class of schedulers called output-state-equivalent is identified for which another criterion implies DC-security. The criterion considers separately the behavior of the scheduler in response to those inputs that cause rollback and those that do not. The security properties of several existing scheduling protocols are characterized. Many are found to be insecure. >

...read moreread less